Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/Megatron-DeepSpeed issues and pull requests
#450 - [Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.
Pull Request -
State: open - Opened by ys950902 about 2 months ago
- 2 comments
#449 - Model conversion problem
Issue -
State: open - Opened by yuanzhiyong1999 about 2 months ago
#448 - [Bug]Fix init issue for rms_norm in sequence_parallel.
Pull Request -
State: open - Opened by ys950902 about 2 months ago
- 1 comment
#447 - Async allreduce for tensor-parallel
Issue -
State: open - Opened by drcanchi about 2 months ago
#446 - [TRACKER] Customer support related PR tracker for Intel devices
Issue -
State: open - Opened by delock about 2 months ago
#445 - fix moe tflops
Pull Request -
State: open - Opened by ranzhejiang about 2 months ago
#444 - how to calcuate the training throughput
Issue -
State: open - Opened by bigtree2020 2 months ago
#441 - Adding the new feature of FPDT
Pull Request -
State: open - Opened by YJHMITWEB 3 months ago
- 4 comments
#440 - Optimizer problem when using finetune_llama.sh
Issue -
State: open - Opened by Kaiizx 3 months ago
- 3 comments
#429 - Enable Sequence Parallelism
Pull Request -
State: closed - Opened by polisettyvarma 4 months ago
- 10 comments
#428 - [Bug] grad_weight can't be NoneType when running with DeepSpeed on Zero3.
Pull Request -
State: closed - Opened by ys950902 4 months ago
- 8 comments
#379 - AttributeError: 'Namespace' object has no attribute 'deepspeed_config_dict'. Did you mean: 'deepspeed_config'? && batch = next(self.data_iterator)
Issue -
State: open - Opened by hi20240217 7 months ago
- 2 comments
#100 - DeepSpeed Data Efficiency Library pretraining examples
Pull Request -
State: closed - Opened by conglongli almost 2 years ago
- 1 comment
#100 - DeepSpeed Data Efficiency Library pretraining examples
Pull Request -
State: closed - Opened by conglongli almost 2 years ago
- 1 comment
#99 - Fix generate_text.sh Megatron text-generation example working w/ DS inference
Pull Request -
State: closed - Opened by lekurile almost 2 years ago
#99 - Fix generate_text.sh Megatron text-generation example working w/ DS inference
Pull Request -
State: closed - Opened by lekurile almost 2 years ago
#98 - The FLOPS per GPU reported for the Megatron GPT model by the DeepSpeed Flops Profiler is much lower than that reported in the logs when we run pretrain_gpt.py
Issue -
State: open - Opened by shrutiramesh1988 almost 2 years ago
- 1 comment
#98 - The FLOPS per GPU reported for the Megatron GPT model by the DeepSpeed Flops Profiler is much lower than that reported in the logs when we run pretrain_gpt.py
Issue -
State: open - Opened by shrutiramesh1988 almost 2 years ago
- 1 comment
#97 - AttributeError: module 'transformer_inference' has no attribute 'layer_norm_fp16'
Issue -
State: open - Opened by ranggihwang almost 2 years ago
- 1 comment
#97 - AttributeError: module 'transformer_inference' has no attribute 'layer_norm_fp16'
Issue -
State: open - Opened by ranggihwang almost 2 years ago
- 1 comment
#96 - Fix the bug of FusedLayerNorm on ROCm
Pull Request -
State: closed - Opened by hubertlu-tw almost 2 years ago
- 2 comments
#96 - Fix the bug of FusedLayerNorm on ROCm
Pull Request -
State: closed - Opened by hubertlu-tw almost 2 years ago
- 2 comments
#95 - Layer Norm kernel fails for ROCm
Issue -
State: closed - Opened by NouamaneTazi almost 2 years ago
- 3 comments
#95 - Layer Norm kernel fails for ROCm
Issue -
State: closed - Opened by NouamaneTazi almost 2 years ago
- 3 comments
#94 - If I just want to pretrain a simple gpt model without these characteristics, which script should I refer to?
Issue -
State: open - Opened by AQA6666 about 2 years ago
- 1 comment
#94 - If I just want to pretrain a simple gpt model without these characteristics, which script should I refer to?
Issue -
State: open - Opened by AQA6666 about 2 years ago
- 1 comment
#93 - The process is stuck at this step:compiling and loading fused kernels ...
Issue -
State: open - Opened by AQA6666 about 2 years ago
- 1 comment
#93 - The process is stuck at this step:compiling and loading fused kernels ...
Issue -
State: open - Opened by AQA6666 about 2 years ago
- 1 comment
#92 - Modifying loss checking to support bf16.
Pull Request -
State: closed - Opened by jomayeri about 2 years ago
#92 - Modifying loss checking to support bf16.
Pull Request -
State: closed - Opened by jomayeri about 2 years ago
#91 - deepspeed to megatron - mismatch in function definition and call
Issue -
State: open - Opened by MatejUlcar about 2 years ago
#90 - Vocab size mismatch for T5
Issue -
State: open - Opened by ShivanshuPurohit about 2 years ago
#90 - Vocab size mismatch for T5
Issue -
State: open - Opened by ShivanshuPurohit about 2 years ago
#89 - How to run use moe on T5?
Issue -
State: closed - Opened by YijiaZhao about 2 years ago
- 2 comments
#89 - How to run use moe on T5?
Issue -
State: closed - Opened by YijiaZhao about 2 years ago
- 2 comments
#88 - Updated to Curated acpt env and removed deepspeed install from github
Pull Request -
State: closed - Opened by savitamittal1 about 2 years ago
#88 - Updated to Curated acpt env and removed deepspeed install from github
Pull Request -
State: closed - Opened by savitamittal1 about 2 years ago
#87 - Fix a bug for gpt pre-training.
Pull Request -
State: closed - Opened by FeixLiu about 2 years ago
- 2 comments
#87 - Fix a bug for gpt pre-training.
Pull Request -
State: closed - Opened by FeixLiu about 2 years ago
- 2 comments
#86 - Does Deepspeed compatible with megatron3.0 ?
Issue -
State: open - Opened by pangsg about 2 years ago
#86 - Does Deepspeed compatible with megatron3.0 ?
Issue -
State: open - Opened by pangsg about 2 years ago
#85 - MoE Checkpoint size
Issue -
State: open - Opened by yunoJ about 2 years ago
#85 - MoE Checkpoint size
Issue -
State: open - Opened by yunoJ about 2 years ago
#84 - GeLU approximation differs from paper, BERT
Issue -
State: closed - Opened by yieldthought about 2 years ago
- 1 comment
#84 - GeLU approximation differs from paper, BERT
Issue -
State: closed - Opened by yieldthought about 2 years ago
- 1 comment
#83 - Issue generating text with GPT: "KeyError: 50284"
Issue -
State: open - Opened by gcunhase about 2 years ago
#83 - Issue generating text with GPT: "KeyError: 50284"
Issue -
State: open - Opened by gcunhase about 2 years ago
#82 - Issue loading GPT2 checkpoint: "torch.nn.modules.module.ModuleAttributeError: 'ParallelTransformerLayer' object has no attribute 'self_attention'"
Issue -
State: open - Opened by gcunhase about 2 years ago
- 1 comment
#82 - Issue loading GPT2 checkpoint: "torch.nn.modules.module.ModuleAttributeError: 'ParallelTransformerLayer' object has no attribute 'self_attention'"
Issue -
State: open - Opened by gcunhase about 2 years ago
- 1 comment
#81 - megatron-deepspeed layernorm has different output compare with megatron-lm?
Issue -
State: open - Opened by Kite0011 about 2 years ago
#81 - megatron-deepspeed layernorm has different output compare with megatron-lm?
Issue -
State: open - Opened by Kite0011 about 2 years ago
#80 - BERT QQP and RACE fine-tune examples
Pull Request -
State: closed - Opened by conglongli about 2 years ago
#80 - BERT QQP and RACE fine-tune examples
Pull Request -
State: closed - Opened by conglongli about 2 years ago
#79 - integrate ort
Pull Request -
State: closed - Opened by prathikr about 2 years ago
- 2 comments
#79 - integrate ort
Pull Request -
State: closed - Opened by prathikr about 2 years ago
- 2 comments
#78 - attempt at pipelining
Pull Request -
State: open - Opened by siddharth9820 about 2 years ago
#78 - attempt at pipelining
Pull Request -
State: open - Opened by siddharth9820 about 2 years ago
#77 - fix throughput_calculator
Pull Request -
State: closed - Opened by conglongli about 2 years ago
#77 - fix throughput_calculator
Pull Request -
State: closed - Opened by conglongli about 2 years ago
#76 - pretrain_gpt_125M_MoE freezes during compilation
Issue -
State: closed - Opened by yunoJ over 2 years ago
- 1 comment
#76 - pretrain_gpt_125M_MoE freezes during compilation
Issue -
State: closed - Opened by yunoJ over 2 years ago
- 1 comment
#75 - BERT example
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#75 - BERT example
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#74 - BERT example staging v1
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#74 - BERT example staging v1
Pull Request -
State: closed - Opened by conglongli over 2 years ago
#73 - This repo is missing important files
Issue -
State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago
#73 - This repo is missing important files
Issue -
State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago
#72 - Adding Microsoft SECURITY.MD
Pull Request -
State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago
#72 - Adding Microsoft SECURITY.MD
Pull Request -
State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago
#71 - Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW
Pull Request -
State: closed - Opened by hibagus over 2 years ago
- 3 comments
#71 - Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW
Pull Request -
State: closed - Opened by hibagus over 2 years ago
- 3 comments
#70 - gpt_6.7B_PR-MoE16: CUDA out of memory
Issue -
State: open - Opened by fighterhit over 2 years ago
- 1 comment
#70 - gpt_6.7B_PR-MoE16: CUDA out of memory
Issue -
State: open - Opened by fighterhit over 2 years ago
- 1 comment
#69 - add checkpoint throughput measurement
Pull Request -
State: closed - Opened by GuanhuaWang over 2 years ago
- 1 comment
#69 - add checkpoint throughput measurement
Pull Request -
State: closed - Opened by GuanhuaWang over 2 years ago
- 1 comment
#68 - Enable Megatron-LM workload on ROCm
Pull Request -
State: closed - Opened by rraminen over 2 years ago
- 3 comments
#68 - Enable Megatron-LM workload on ROCm
Pull Request -
State: closed - Opened by rraminen over 2 years ago
- 3 comments
#67 - Question for usage of DeepSpeed transformer kernels
Issue -
State: closed - Opened by delock over 2 years ago
#67 - Question for usage of DeepSpeed transformer kernels
Issue -
State: closed - Opened by delock over 2 years ago
#66 - add changes for enabling AML run
Pull Request -
State: closed - Opened by msp8955 over 2 years ago
- 2 comments
#66 - add changes for enabling AML run
Pull Request -
State: closed - Opened by msp8955 over 2 years ago
- 2 comments
#65 - Merge azure branch manually
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#65 - Merge azure branch manually
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#64 - Azure draft PR -- to be closed after discussion/review
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#64 - Azure draft PR -- to be closed after discussion/review
Pull Request -
State: closed - Opened by awan-10 over 2 years ago
#63 - Tensor parallelism for Mixture of Experts
Pull Request -
State: closed - Opened by siddharth9820 over 2 years ago
- 2 comments
#63 - Tensor parallelism for Mixture of Experts
Pull Request -
State: closed - Opened by siddharth9820 over 2 years ago
- 2 comments
#62 - [fix]Solve checkpoint loading err with megatron bert_model
Pull Request -
State: closed - Opened by kisseternity over 2 years ago
- 2 comments
#62 - [fix]Solve checkpoint loading err with megatron bert_model
Pull Request -
State: closed - Opened by kisseternity over 2 years ago
- 2 comments
#61 - [Bug]Load checkpoint err using pretrain_bert.py with Megatron
Issue -
State: closed - Opened by kisseternity over 2 years ago
- 2 comments
#61 - [Bug]Load checkpoint err using pretrain_bert.py with Megatron
Issue -
State: closed - Opened by kisseternity over 2 years ago
- 2 comments
#60 - Minjiaz/compression gpt
Pull Request -
State: closed - Opened by minjiaz over 2 years ago
#60 - Minjiaz/compression gpt
Pull Request -
State: closed - Opened by minjiaz over 2 years ago
#59 - Debug
Pull Request -
State: closed - Opened by rayzzq over 2 years ago
- 1 comment
#59 - Debug
Pull Request -
State: closed - Opened by rayzzq over 2 years ago
- 1 comment
#58 - GPT-2 with pipeline parallel and bfloat16 doesn't work
Issue -
State: open - Opened by assij over 2 years ago
- 4 comments
#58 - GPT-2 with pipeline parallel and bfloat16 doesn't work
Issue -
State: open - Opened by assij over 2 years ago
- 4 comments
#57 - AzureML: initial changes for benchmarking
Pull Request -
State: closed - Opened by msp8955 over 2 years ago
#56 - Add Zero-offload support
Pull Request -
State: closed - Opened by siddharth9820 over 2 years ago
#55 - Add ZeRO-offload support
Pull Request -
State: closed - Opened by siddharth9820 over 2 years ago