Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/tutel issues and pull requests
#247 - Question: Dictionary of Optimal Parallelism & Pipelining
Issue -
State: closed - Opened by hikettei 25 days ago
- 2 comments
#246 - How to convert checkpoint files that adapt to different distributed world sizes
Issue -
State: open - Opened by swjtulinxi 26 days ago
- 1 comment
#245 - fix llama_ffn forward function
Pull Request -
State: closed - Opened by pingzhili 28 days ago
- 1 comment
#244 - Implementation of Llama FFN
Issue -
State: closed - Opened by pingzhili 28 days ago
- 2 comments
#243 - Add custom data path to cifar10
Pull Request -
State: closed - Opened by anirudhprabhakaran3 about 1 month ago
#242 - fix scripts to support Tutel CPU on Mac OS X
Pull Request -
State: closed - Opened by ghostplant about 2 months ago
#241 - Make it compatible with ROCm >= 6.0
Pull Request -
State: closed - Opened by ghostplant about 2 months ago
#240 - Question regarding the load importance loss calculation
Issue -
State: open - Opened by wangyirui 3 months ago
- 1 comment
#239 - How about the cost of TUTEL features?
Issue -
State: open - Opened by fyang064 4 months ago
- 1 comment
#238 - fix(fast_dispatch): saving input tensor using ctx.save_for_backward
Pull Request -
State: closed - Opened by KimmiShi 4 months ago
- 1 comment
#237 - Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch
Issue -
State: closed - Opened by KimmiShi 4 months ago
- 1 comment
#236 - How to use Megablocks in MoE training
Issue -
State: open - Opened by CSCYQJ 4 months ago
- 1 comment
#235 - add built-in llama_ffn; add helloworld_custom_expert_sharded;
Pull Request -
State: closed - Opened by ghostplant 4 months ago
- 1 comment
#234 - update README.md for v0.3.2
Pull Request -
State: closed - Opened by ghostplant 5 months ago
#233 - Can tutel support Pipeline Parallel?
Issue -
State: closed - Opened by xcwanAndy 5 months ago
- 1 comment
#232 - [Question] Comparison to FasterMoE
Issue -
State: open - Opened by Guodanding 5 months ago
- 4 comments
#231 - using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable
Pull Request -
State: closed - Opened by ghostplant 5 months ago
#229 - replace unnecessary zeros -> empty
Pull Request -
State: closed - Opened by ghostplant 6 months ago
#228 - enable message size larger than 4GB for all_to_all_v/all_gather_v
Pull Request -
State: closed - Opened by ghostplant 6 months ago
#227 - add tutel.examples.helloworld_demo based on custom experts
Pull Request -
State: closed - Opened by ghostplant 6 months ago
- 1 comment
#226 - How to create a custom expert with tutel?
Issue -
State: open - Opened by zws98 6 months ago
- 19 comments
#225 - update online setup instructions
Pull Request -
State: closed - Opened by ghostplant 7 months ago
#224 - Add option to install for CPU only: export NO_CUDA=1
Pull Request -
State: closed - Opened by ghostplant 7 months ago
#223 - add device initialization for ops on non-default devices
Pull Request -
State: closed - Opened by ghostplant 8 months ago
#222 - add example files for NCCL all_to_all_v/all_gather_v
Pull Request -
State: closed - Opened by ghostplant 9 months ago
#221 - add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()
Pull Request -
State: closed - Opened by ghostplant 9 months ago
#220 - [Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.
Issue -
State: open - Opened by cicirori 9 months ago
- 1 comment
#219 - How to implement Fairseq-MoE training checkpoint like Swin-MoE?
Issue -
State: open - Opened by withinmiaov 11 months ago
- 1 comment
#218 - Non-surface function utilities only work for contiguous input data
Issue -
State: open - Opened by lyd126 11 months ago
- 12 comments
#217 - fill zeros with warning for params not defined in state_dict
Pull Request -
State: closed - Opened by ghostplant 11 months ago
#216 - Enable running without bias and update ffn instantiation
Pull Request -
State: closed - Opened by vchiley 12 months ago
- 4 comments
#215 - RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED
Issue -
State: closed - Opened by jd730 about 1 year ago
- 3 comments
#214 - tutel is slower than the naive p2p using 2DH for small scale
Issue -
State: open - Opened by DongyuXu77 about 1 year ago
- 3 comments
#213 - What is the difference between this and deepspeed-moe?
Issue -
State: closed - Opened by Hap-Zhang about 1 year ago
- 2 comments
#212 - update tutel pipeline and setup deps
Pull Request -
State: closed - Opened by ghostplant about 1 year ago
#211 - numpy not in requirements
Issue -
State: closed - Opened by 152334H about 1 year ago
- 5 comments
#210 - updt init
Pull Request -
State: open - Opened by vchiley about 1 year ago
- 7 comments
#209 - fix a few casts
Pull Request -
State: closed - Opened by vchiley about 1 year ago
- 1 comment
#208 - always use torch.distributed.run in new torch versions
Pull Request -
State: closed - Opened by ghostplant about 1 year ago
#207 - how to use tutel on Megatron Deepspeed
Issue -
State: open - Opened by wangyuxin87 about 1 year ago
- 4 comments
#206 - Can this package support the one-gpu machine
Issue -
State: open - Opened by momo1986 over 1 year ago
- 5 comments
#205 - add more comment in helloworld_ddp example
Pull Request -
State: closed - Opened by ghostplant over 1 year ago
#204 - Training with Data and Expert Parallelism
Issue -
State: open - Opened by santurini over 1 year ago
- 5 comments
#203 - INTERNAL ASSERT FAILED
Issue -
State: open - Opened by Qicheng-WANG over 1 year ago
- 5 comments
#201 - about compute_location and locations
Issue -
State: open - Opened by adverbial03 over 1 year ago
- 1 comment
#199 - add tutel.examples.helloworld_switch
Pull Request -
State: closed - Opened by ghostplant over 1 year ago
#198 - ImportError: cannot import name 'tutel_custom_kernel' from 'tutel.impls.jit_compiler'
Issue -
State: open - Opened by zhaojiancheng007 over 1 year ago
- 12 comments
Labels: environmental issue
#197 - [Bug]The function func_fwd is calculated inconsistent on the cpu and gpu
Issue -
State: closed - Opened by starkhu over 1 year ago
- 1 comment
Labels: invalid
#196 - tutel/jit_kernels/sparse.py torch.float16 There is a bug in the calculation: the cuda calculation result is inconsistent with the CPU calculation result and the array is out of bounds
Issue -
State: open - Opened by WsqRichards1 over 1 year ago
- 1 comment
Labels: invalid
#195 - All2All precision always in fp32
Issue -
State: open - Opened by vchiley over 1 year ago
- 1 comment
#194 - add reset_parameters fn; updt .to() fn; enable device and dtype pass thru
Pull Request -
State: closed - Opened by vchiley over 1 year ago
- 1 comment
#193 - Fix tutel compatibility in torch 2.0
Pull Request -
State: closed - Opened by ghostplant over 1 year ago
#192 - How the experts' gradients are handled under data parallelism?
Issue -
State: open - Opened by yzs981130 over 1 year ago
- 1 comment
#191 - removed logit_scale without device casting
Pull Request -
State: closed - Opened by Harsh-Sensei almost 2 years ago
- 1 comment
#190 - RuntimeError: No such operator tutel_ops::cumsum
Issue -
State: open - Opened by sharkdrop almost 2 years ago
- 10 comments
#189 - [installation errors] fatal error: nccl.h: No such file or directory
Issue -
State: open - Opened by qianyuzqy almost 2 years ago
- 1 comment
#188 - fix typos and old pytorch compatibility
Pull Request -
State: closed - Opened by ghostplant almost 2 years ago
#187 - Multi-nodes training is much more slower than single node
Issue -
State: open - Opened by YingqingHe almost 2 years ago
- 1 comment
#186 - New Tutel checkpoint loading is incompatible with old models
Issue -
State: closed - Opened by jinga-lala about 2 years ago
- 7 comments
#185 - NCCL Asynchronous update timeout crash with Tutel MoE
Issue -
State: open - Opened by jinga-lala about 2 years ago
- 5 comments
#184 - extend parallel_type for adaptive:n
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#183 - extend parallel_type to use dp without a2a
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#182 - My code seems to hang when skip_remainder_batch=False.
Issue -
State: open - Opened by Fragile-azalea about 2 years ago
- 7 comments
Labels: application patch
#181 - support tutel.checkpoint.* for issue #177
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#180 - Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?
Issue -
State: open - Opened by Alex-Songs about 2 years ago
- 1 comment
Labels: environmental issue
#179 - Pretrained MoE model
Issue -
State: open - Opened by Luodian about 2 years ago
- 2 comments
Labels: question
#178 - Example on saving experts to one model when using distributed training
Issue -
State: open - Opened by Luodian about 2 years ago
- 2 comments
Labels: duplicate
#177 - Error when doing deepcopy of the model
Issue -
State: open - Opened by yzxing87 about 2 years ago
- 5 comments
Labels: enhancement
#176 - add tensor save/load in numpy format
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#175 - [installation errors] fatal error: nccl.h: No such file or directory
Issue -
State: closed - Opened by Luodian about 2 years ago
- 1 comment
#174 - a bunch of fixes for #167 and #173
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#173 - Is simple_all_reduce also required for capacity_factor > 0 cases?
Issue -
State: closed - Opened by Fragile-azalea about 2 years ago
- 6 comments
Labels: bug
#172 - The output of nccl_all_to_all_scatter_async may be incomplete when num_local_experts>1.
Issue -
State: closed - Opened by Fragile-azalea about 2 years ago
- 11 comments
Labels: wontfix
#171 - Cannot Import JIT optimized kernels?
Issue -
State: closed - Opened by Luodian about 2 years ago
- 11 comments
#170 - handle Windows Pytorch compatibility
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#169 - how can I install this pack on conda environment??
Issue -
State: open - Opened by Lurnco about 2 years ago
- 11 comments
Labels: setup
#168 - typo fix
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#167 - Error in load_importance_loss
Issue -
State: open - Opened by Luodian about 2 years ago
- 7 comments
Labels: enhancement
#166 - update TutelDistributedOptimizer
Pull Request -
State: open - Opened by zeliu98 about 2 years ago
#165 - refine fairseq_moe configuration
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#164 - allow CUDA_HOME to specify CUDA SDK location
Pull Request -
State: closed - Opened by ghostplant about 2 years ago
#163 - Cannot compile tutel kernels and got runtime error
Issue -
State: closed - Opened by hyhuang00 about 2 years ago
- 10 comments
#162 - Add Feature - Port overlapping from v0.1.x and support per-layer overlapping degree
Pull Request -
State: closed - Opened by yzygitzh about 2 years ago
#161 - bp of shared parameters and experts
Issue -
State: open - Opened by a157801 over 2 years ago
- 7 comments
Labels: question
#160 - 100x slower when using 4nodes than 1node to run the helloworld_ddp example
Issue -
State: closed - Opened by a157801 over 2 years ago
- 12 comments
Labels: libnccl issue
#159 - update test case in pipeline
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#158 - update README.md
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#157 - add error reasons for installation
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#156 - AttributeError: module 'tutel_custom_kernel' has no attribute 'inject_source'
Issue -
State: closed - Opened by s-kodge over 2 years ago
- 3 comments
#155 - add cosine router; add load loss and importance loss
Pull Request -
State: closed - Opened by zeliu98 over 2 years ago
#154 - Add example for fairseq moe with tutel support
Pull Request -
State: closed - Opened by EricWangCN over 2 years ago
#153 - add examples: tutel.examples.helloworld_ddp_tutel
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#152 - add examples: tutel.examples.helloworld_tutel_ddp
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#151 - add simple patch for Fairseq using MoE
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#150 - move `fp32_gate` checking from moe_layer to top
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#149 - allow moe.moe_layer to use custom expert
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#148 - add net.barrier()
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#147 - refine argument list in custom gate module
Pull Request -
State: closed - Opened by ghostplant over 2 years ago
#136 - What is the purpose of the "use_2dh" option?
Issue -
State: closed - Opened by ymjiang over 2 years ago
- 4 comments
Labels: question