Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/Megatron-DeepSpeed issues and pull requests
#54 - "RuntimeError: trying to initialize the default process group twice!" error with pretrain_gpt example script
Issue -
State: closed - Opened by rraminen over 2 years ago
- 2 comments
#53 - Adding GPT pretraining distillation and quantization examples
Pull Request -
State: closed - Opened by minjiaz over 2 years ago
- 1 comment
#52 - Add Codeowner
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#51 - [BUG] the gpt model cannot run in specified container
Issue -
State: closed - Opened by starkhu over 2 years ago
- 4 comments
#50 - Add support for DS comms
Pull Request -
State: open - Opened by Quentin-Anthony over 2 years ago
- 1 comment
#49 - [OLD] Support DeepSpeed Comms
Pull Request -
State: closed - Opened by Quentin-Anthony over 2 years ago
#48 - MoE support
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#47 - MoE support
Pull Request -
State: closed - Opened by jeffra over 2 years ago
#46 - How efficient is the BERT and T5 code?
Issue -
State: closed - Opened by StellaAthena over 2 years ago
- 1 comment
#45 - how can I use the cpu_offload?
Issue -
State: closed - Opened by cudaMancpy over 2 years ago
#44 - Update eval readme
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#43 - Cannot run the pretrain_gpt example using moe branch
Issue -
State: open - Opened by getao over 2 years ago
- 3 comments
#42 - Fix grad accum double scaling bug under no pp mode
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#41 - Fix typo
Pull Request -
State: closed - Opened by mrm8488 over 2 years ago
#40 - ModuleNotFoundError: No module named 'lm_eval.datasets.coqa'
Issue -
State: open - Opened by xwuShirley over 2 years ago
- 2 comments
#39 - ds_pretrain_gpt_125M_MoE64.sh didn't convergence, loss fly after 3k steps?
Issue -
State: closed - Opened by jerryli1981 over 2 years ago
- 5 comments
#38 - Minjiaz/mos release
Pull Request -
State: closed - Opened by minjiazhang over 2 years ago
- 1 comment
#37 - Merged MoS staging to MoE
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#36 - PR-MoE client changes to use the new DS-MoE API
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#35 - PR-MoE changes to match new DS-MoE API
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#34 - Eval harness for dense and MoE model, plus several feature/fixes for dense/MoE training
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#33 - fix MoE save interval
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#32 - Adding MoS support for Mixture-of-Experts in DeepSpeed
Pull Request -
State: closed - Opened by minjiaz over 2 years ago
#31 - Support for MoS
Pull Request -
State: closed - Opened by minjiaz over 2 years ago
- 1 comment
#30 - How to merge the model partition that use both optimization about megatron's mp and deepspeed's zero 1?
Issue -
State: closed - Opened by Tecmus over 2 years ago
#29 - Load moe checkpoint in generate_text.sh
Issue -
State: open - Opened by Ag2S1 almost 3 years ago
- 5 comments
#28 - Checkpoint for the MoE version
Issue -
State: open - Opened by BDHU almost 3 years ago
#27 - DeepSpeed to DeepSpeed converter for changing tp/pp
Pull Request -
State: closed - Opened by tjruwase almost 3 years ago
- 5 comments
#26 - draft MoE training
Pull Request -
State: closed - Opened by awan-10 almost 3 years ago
- 1 comment
#25 - unpack list into a tuple constructor for python-3.7
Pull Request -
State: closed - Opened by adammoody almost 3 years ago
- 2 comments
#24 - Invalid syntax error when unpacking *moe_losses in python-3.7
Issue -
State: closed - Opened by adammoody almost 3 years ago
- 3 comments
#23 - [checkpoint conversion] meg-ds to meg-ds topology reshaping
Issue -
State: open - Opened by stas00 almost 3 years ago
- 1 comment
#22 - Fixing the MoE training when using model-parallelism
Pull Request -
State: closed - Opened by RezaYazdaniAminabadi almost 3 years ago
- 1 comment
#21 - Sync with Megatron-LM
Pull Request -
State: closed - Opened by tjruwase almost 3 years ago
- 1 comment
#20 - How to run bert with deepspeed?
Issue -
State: closed - Opened by MagiaSN almost 3 years ago
- 2 comments
#19 - make CL not truncate eval data
Pull Request -
State: closed - Opened by conglongli about 3 years ago
#18 - CL script update
Pull Request -
State: closed - Opened by conglongli about 3 years ago
#17 - Curriculum learning support
Pull Request -
State: closed - Opened by conglongli about 3 years ago
#16 - LM Evaluation Harness Integration
Issue -
State: closed - Opened by StellaAthena about 3 years ago
#15 - Convert meg ds to hf
Pull Request -
State: closed - Opened by tjruwase about 3 years ago
- 2 comments
#14 - Checkpoint conversion tools
Pull Request -
State: closed - Opened by tjruwase about 3 years ago
- 17 comments
#13 - Make attention mask boolean
Pull Request -
State: closed - Opened by tjruwase about 3 years ago
#12 - syncing with the upstream?
Issue -
State: open - Opened by stas00 about 3 years ago
#11 - merging the fix from downstream
Issue -
State: closed - Opened by stas00 over 3 years ago
#10 - Use new zero.Init() API
Pull Request -
State: closed - Opened by tjruwase over 3 years ago
#9 - Pass mpu in zero.Init()
Pull Request -
State: closed - Opened by tjruwase over 3 years ago
- 1 comment
#8 - query deepspeed global grad norm
Pull Request -
State: closed - Opened by ShadenSmith over 3 years ago
#7 - zero.Init() with mpu
Pull Request -
State: closed - Opened by tjruwase over 3 years ago
- 1 comment
#6 - use pp engine even for pp=1
Pull Request -
State: closed - Opened by jeffra over 3 years ago
#5 - improve DS integration docs + evaluation + logging
Pull Request -
State: closed - Opened by ShadenSmith over 3 years ago
#4 - fix failure on restart after round 1 train and no eval
Pull Request -
State: closed - Opened by stas00 over 3 years ago
#3 - fix failure on restart after round 1 train and no eval
Pull Request -
State: closed - Opened by stas00 over 3 years ago
#2 - fix failure on restart after round 1 train and no eval
Pull Request -
State: closed - Opened by stas00 over 3 years ago
- 1 comment
#1 - Megatron + DeepSpeed + Pipeline Parallelism
Pull Request -
State: closed - Opened by jeffra over 3 years ago