microsoft/tutel issues and pull requests

#102 - Add performance figures

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#101 - Add performance figures

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#100 - Merge A2A FFN overlapping and 2DH A2A

Pull Request - State: closed - Opened by yzygitzh over 2 years ago

#99 - handle occupancy compat for rocm4.2

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#98 - why Deepspeed MoE Top-2 Gate dosen't integrate Tutel acceleration

Issue - State: closed - Opened by Satan012 over 2 years ago - 1 comment
Labels: invalid

#97 - simplify all different usages into top-k usage

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#96 - Error：Exception: MoE JIT is designed to work on sample size = 800, while receiving sample size = 1600 (> 800)

Issue - State: open - Opened by Satan012 over 2 years ago - 2 comments
Labels: question

#95 - support TopKGate properties: is_postnorm

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#94 - add ffn_allreduce_range_size for data parallel

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#93 - Determine kernel max occupancy in JIT

Pull Request - State: closed - Opened by abuccts over 2 years ago

#92 - split jit_activate out of jit_execute

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#91 - Fix Issue #90: cast constant size to int

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#90 - fast_cumsum_sub_one fails when the module is wrapped by ORTModule

Issue - State: closed - Opened by foreveronehundred over 2 years ago - 7 comments

#89 - Can DistributedDataParallel be added into helloworld_deepspeed.py ?

Issue - State: closed - Opened by Satan012 over 2 years ago - 2 comments
Labels: invalid

#88 - add save_load_checkpoint option in helloworld

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#87 - how to save checkpoint when use data parallel and moe expert

Issue - State: open - Opened by Satan012 over 2 years ago - 7 comments
Labels: question

#86 - add api for group creation

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#85 - fix typos

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#84 - Fix Bug - Fix various bugs in all-to-all FFN overlapping

Pull Request - State: closed - Opened by yzygitzh over 2 years ago

#83 - Enable JIT compilation to support torch.distributed.pipeline environment

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#82 - Problem from applying pipeline parallel with Tutel's cumsum

Issue - State: closed - Opened by foreveronehundred over 2 years ago - 1 comment

#81 - Add 2D Hierarchical AlltoAll Algorithm

Pull Request - State: closed - Opened by abuccts over 2 years ago

#80 - Add cpu support

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#79 - add a fp64 test case

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#78 - reset seeding in distributed synthetic data

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#77 - change args of custom kernels' compiler

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#76 - INTERNAL ASSERT FAILED at custom_kernel.cpp

Issue - State: closed - Opened by foreveronehundred over 2 years ago - 1 comment

#75 - fix type of capacity

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#74 - add helloworld_amp

Pull Request - State: closed - Opened by EricWangCN over 2 years ago

#73 - not using CUDA_VISIBLE_DEVICES

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#72 - add fp64 option in examples; enhance launcher compat;

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#71 - support handling multi-gate options

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#70 - Question about multi-gate refer to multi-task learning

Issue - State: open - Opened by Tokkiu over 2 years ago - 5 comments
Labels: question

#69 - add fast launch usage for openmpi

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#68 - Add Feature - Support overlapping all-to-all with FFN computation in MoE layer

Pull Request - State: closed - Opened by yzygitzh over 2 years ago

#67 - enhance logging & TUTEL_CUDA_SANDBOX option

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#66 - simplify example codes

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#65 - using init_data_model_parallel() to initialize proc

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#64 - fix nvrtc compatibility in some environments

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#63 - fix nvrtc compatibilty in some envs

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#62 - support using mpiexec for distributed launch

Pull Request - State: closed - Opened by ghostplant over 2 years ago

#61 - Error met when using multi nodes

Issue - State: closed - Opened by Lechatelia over 2 years ago - 5 comments

#60 - Upgrade docker image for UT.

Pull Request - State: closed - Opened by guoshzhao over 2 years ago

#59 - add a new test case

Pull Request - State: closed - Opened by EricWangCN almost 3 years ago

#58 - Add --gpus=all option for test pipeline.

Pull Request - State: closed - Opened by guoshzhao almost 3 years ago

#57 - add unit test

Pull Request - State: closed - Opened by EricWangCN almost 3 years ago

#56 - change the initialization of input of helloworld

Pull Request - State: closed - Opened by EricWangCN almost 3 years ago