microsoft/tutel issues and pull requests

#247 - Question: Dictionary of Optimal Parallelism & Pipelining

Issue - State: closed - Opened by hikettei 25 days ago - 2 comments

#246 - How to convert checkpoint files that adapt to different distributed world sizes

Issue - State: open - Opened by swjtulinxi 26 days ago - 1 comment

#245 - fix llama_ffn forward function

Pull Request - State: closed - Opened by pingzhili 28 days ago - 1 comment

#244 - Implementation of Llama FFN

Issue - State: closed - Opened by pingzhili 28 days ago - 2 comments

#243 - Add custom data path to cifar10

Pull Request - State: closed - Opened by anirudhprabhakaran3 about 1 month ago

#242 - fix scripts to support Tutel CPU on Mac OS X

Pull Request - State: closed - Opened by ghostplant about 2 months ago

#241 - Make it compatible with ROCm >= 6.0

Pull Request - State: closed - Opened by ghostplant about 2 months ago

#240 - Question regarding the load importance loss calculation

Issue - State: open - Opened by wangyirui 3 months ago - 1 comment

#239 - How about the cost of TUTEL features?

Issue - State: open - Opened by fyang064 4 months ago - 1 comment

#238 - fix(fast_dispatch): saving input tensor using ctx.save_for_backward

Pull Request - State: closed - Opened by KimmiShi 4 months ago - 1 comment

#237 - Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch

Issue - State: closed - Opened by KimmiShi 4 months ago - 1 comment

#236 - How to use Megablocks in MoE training

Issue - State: open - Opened by CSCYQJ 4 months ago - 1 comment

#235 - add built-in llama_ffn; add helloworld_custom_expert_sharded;

Pull Request - State: closed - Opened by ghostplant 4 months ago - 1 comment

#234 - update README.md for v0.3.2

Pull Request - State: closed - Opened by ghostplant 5 months ago

#233 - Can tutel support Pipeline Parallel?

Issue - State: closed - Opened by xcwanAndy 5 months ago - 1 comment

#232 - [Question] Comparison to FasterMoE

Issue - State: open - Opened by Guodanding 5 months ago - 4 comments

#231 - using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable

Pull Request - State: closed - Opened by ghostplant 5 months ago

#230 - Qs

Issue - State: open - Opened by zws98 5 months ago - 3 comments

#229 - replace unnecessary zeros -> empty

Pull Request - State: closed - Opened by ghostplant 6 months ago

#228 - enable message size larger than 4GB for all_to_all_v/all_gather_v

Pull Request - State: closed - Opened by ghostplant 6 months ago

#227 - add tutel.examples.helloworld_demo based on custom experts

Pull Request - State: closed - Opened by ghostplant 6 months ago - 1 comment

#226 - How to create a custom expert with tutel?

Issue - State: open - Opened by zws98 6 months ago - 19 comments

#225 - update online setup instructions

Pull Request - State: closed - Opened by ghostplant 7 months ago

#224 - Add option to install for CPU only: export NO_CUDA=1

Pull Request - State: closed - Opened by ghostplant 7 months ago

#223 - add device initialization for ops on non-default devices

Pull Request - State: closed - Opened by ghostplant 8 months ago

#222 - add example files for NCCL all_to_all_v/all_gather_v

Pull Request - State: closed - Opened by ghostplant 9 months ago

#221 - add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()

Pull Request - State: closed - Opened by ghostplant 9 months ago

#220 - [Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.

Issue - State: open - Opened by cicirori 9 months ago - 1 comment

#219 - How to implement Fairseq-MoE training checkpoint like Swin-MoE?

Issue - State: open - Opened by withinmiaov 11 months ago - 1 comment

#218 - Non-surface function utilities only work for contiguous input data

Issue - State: open - Opened by lyd126 11 months ago - 12 comments

#217 - fill zeros with warning for params not defined in state_dict

Pull Request - State: closed - Opened by ghostplant 11 months ago

#216 - Enable running without bias and update ffn instantiation

Pull Request - State: closed - Opened by vchiley 12 months ago - 4 comments

#215 - RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED

Issue - State: closed - Opened by jd730 about 1 year ago - 3 comments

#214 - tutel is slower than the naive p2p using 2DH for small scale

Issue - State: open - Opened by DongyuXu77 about 1 year ago - 3 comments

#213 - What is the difference between this and deepspeed-moe?

Issue - State: closed - Opened by Hap-Zhang about 1 year ago - 2 comments

#212 - update tutel pipeline and setup deps

Pull Request - State: closed - Opened by ghostplant about 1 year ago

#211 - numpy not in requirements

Issue - State: closed - Opened by 152334H about 1 year ago - 5 comments

#210 - updt init

Pull Request - State: open - Opened by vchiley about 1 year ago - 7 comments

#209 - fix a few casts

Pull Request - State: closed - Opened by vchiley about 1 year ago - 1 comment

#208 - always use torch.distributed.run in new torch versions

Pull Request - State: closed - Opened by ghostplant about 1 year ago

#207 - how to use tutel on Megatron Deepspeed

Issue - State: open - Opened by wangyuxin87 about 1 year ago - 4 comments

#206 - Can this package support the one-gpu machine

Issue - State: open - Opened by momo1986 over 1 year ago - 5 comments

#205 - add more comment in helloworld_ddp example

Pull Request - State: closed - Opened by ghostplant over 1 year ago

#204 - Training with Data and Expert Parallelism

Issue - State: open - Opened by santurini over 1 year ago - 5 comments

#203 - INTERNAL ASSERT FAILED

Issue - State: open - Opened by Qicheng-WANG over 1 year ago - 5 comments

#201 - about compute_location and locations

Issue - State: open - Opened by adverbial03 over 1 year ago - 1 comment

#199 - add tutel.examples.helloworld_switch

Pull Request - State: closed - Opened by ghostplant over 1 year ago

#198 - ImportError: cannot import name 'tutel_custom_kernel' from 'tutel.impls.jit_compiler'

Issue - State: open - Opened by zhaojiancheng007 over 1 year ago - 12 comments
Labels: environmental issue

#197 - [Bug]The function func_fwd is calculated inconsistent on the cpu and gpu

Issue - State: closed - Opened by starkhu over 1 year ago - 1 comment
Labels: invalid

#196 - tutel/jit_kernels/sparse.py torch.float16 There is a bug in the calculation: the cuda calculation result is inconsistent with the CPU calculation result and the array is out of bounds

Issue - State: open - Opened by WsqRichards1 over 1 year ago - 1 comment
Labels: invalid

#195 - All2All precision always in fp32

Issue - State: open - Opened by vchiley over 1 year ago - 1 comment

#194 - add reset_parameters fn; updt .to() fn; enable device and dtype pass thru

Pull Request - State: closed - Opened by vchiley over 1 year ago - 1 comment

#193 - Fix tutel compatibility in torch 2.0

Pull Request - State: closed - Opened by ghostplant over 1 year ago

#192 - How the experts' gradients are handled under data parallelism?

Issue - State: open - Opened by yzs981130 over 1 year ago - 1 comment

#191 - removed logit_scale without device casting

Pull Request - State: closed - Opened by Harsh-Sensei almost 2 years ago - 1 comment

#190 - RuntimeError: No such operator tutel_ops::cumsum

Issue - State: open - Opened by sharkdrop almost 2 years ago - 10 comments

#189 - [installation errors] fatal error: nccl.h: No such file or directory

Issue - State: open - Opened by qianyuzqy almost 2 years ago - 1 comment

#188 - fix typos and old pytorch compatibility

Pull Request - State: closed - Opened by ghostplant almost 2 years ago

#187 - Multi-nodes training is much more slower than single node

Issue - State: open - Opened by YingqingHe almost 2 years ago - 1 comment

#186 - New Tutel checkpoint loading is incompatible with old models

Issue - State: closed - Opened by jinga-lala about 2 years ago - 7 comments

#185 - NCCL Asynchronous update timeout crash with Tutel MoE

Issue - State: open - Opened by jinga-lala about 2 years ago - 5 comments

#184 - extend parallel_type for adaptive:n

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#183 - extend parallel_type to use dp without a2a

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#182 - My code seems to hang when skip_remainder_batch=False.

Issue - State: open - Opened by Fragile-azalea about 2 years ago - 7 comments
Labels: application patch

#181 - support tutel.checkpoint.* for issue #177

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#180 - Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?

Issue - State: open - Opened by Alex-Songs about 2 years ago - 1 comment
Labels: environmental issue

#179 - Pretrained MoE model

Issue - State: open - Opened by Luodian about 2 years ago - 2 comments
Labels: question

#178 - Example on saving experts to one model when using distributed training

Issue - State: open - Opened by Luodian about 2 years ago - 2 comments
Labels: duplicate

#177 - Error when doing deepcopy of the model

Issue - State: open - Opened by yzxing87 about 2 years ago - 5 comments
Labels: enhancement

#176 - add tensor save/load in numpy format

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#175 - [installation errors] fatal error: nccl.h: No such file or directory

Issue - State: closed - Opened by Luodian about 2 years ago - 1 comment

#174 - a bunch of fixes for #167 and #173

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#173 - Is simple_all_reduce also required for capacity_factor > 0 cases?

Issue - State: closed - Opened by Fragile-azalea about 2 years ago - 6 comments
Labels: bug

#172 - The output of nccl_all_to_all_scatter_async may be incomplete when num_local_experts>1.

Issue - State: closed - Opened by Fragile-azalea about 2 years ago - 11 comments
Labels: wontfix

#171 - Cannot Import JIT optimized kernels?

Issue - State: closed - Opened by Luodian about 2 years ago - 11 comments

#170 - handle Windows Pytorch compatibility

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#169 - how can I install this pack on conda environment??

Issue - State: open - Opened by Lurnco about 2 years ago - 11 comments
Labels: setup

#168 - typo fix

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#167 - Error in load_importance_loss

Issue - State: open - Opened by Luodian about 2 years ago - 7 comments
Labels: enhancement

#166 - update TutelDistributedOptimizer

Pull Request - State: open - Opened by zeliu98 about 2 years ago

#165 - refine fairseq_moe configuration

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#164 - allow CUDA_HOME to specify CUDA SDK location

Pull Request - State: closed - Opened by ghostplant about 2 years ago

#163 - Cannot compile tutel kernels and got runtime error

Issue - State: closed - Opened by hyhuang00 about 2 years ago - 10 comments

#162 - Add Feature - Port overlapping from v0.1.x and support per-layer overlapping degree

Pull Request - State: closed - Opened by yzygitzh about 2 years ago

#161 - bp of shared parameters and experts

Issue - State: open - Opened by a157801 over 2 years ago - 7 comments
Labels: question

#160 - 100x slower when using 4nodes than 1node to run the helloworld_ddp example

Issue - State: closed - Opened by a157801 over 2 years ago - 12 comments
Labels: libnccl issue

#159 - update test case in pipeline

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#158 - update README.md

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#157 - add error reasons for installation

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#156 - AttributeError: module 'tutel_custom_kernel' has no attribute 'inject_source'

Issue - State: closed - Opened by s-kodge over 2 years ago - 3 comments

#155 - add cosine router; add load loss and importance loss

Pull Request - State: closed - Opened by zeliu98 over 2 years ago

#154 - Add example for fairseq moe with tutel support

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#153 - add examples: tutel.examples.helloworld_ddp_tutel

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#152 - add examples: tutel.examples.helloworld_tutel_ddp

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#151 - add simple patch for Fairseq using MoE

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#150 - move `fp32_gate` checking from moe_layer to top

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#149 - allow moe.moe_layer to use custom expert

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#148 - add net.barrier()

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#147 - refine argument list in custom gate module

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#136 - What is the purpose of the "use_2dh" option?

Issue - State: closed - Opened by ymjiang over 2 years ago - 4 comments
Labels: question

GitHub / microsoft/tutel issues and pull requests