Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / bigscience-workshop/Megatron-DeepSpeed issues and pull requests
#404 - Why pretrain_llama_distributed.sh use pretrain_gpt.py ?
Issue -
State: closed - Opened by BrucePeng92 3 months ago
#403 - How can I set recomputation-granularity,like selective or full?
Issue -
State: open - Opened by LordEdison 7 months ago
#402 - Bump black from 21.4b0 to 24.3.0
Pull Request -
State: open - Opened by dependabot[bot] 8 months ago
Labels: dependencies
#401 - Hello, what version of the megatron-lm library is your code modified?
Issue -
State: open - Opened by 4thGardenOfQMH 9 months ago
#400 - Is this assertion for mask wrong?
Issue -
State: open - Opened by yinfangchen 9 months ago
- 1 comment
#399 - Feature/tigerbot
Pull Request -
State: closed - Opened by i4never about 1 year ago
#398 - Hello, can Megatron-DeepSpeed pre-train llama2?
Issue -
State: open - Opened by 13416157913 about 1 year ago
#397 - Cannot run 3D parallelism with tp == 1 dp == 3 pp == 2 degrees
Issue -
State: closed - Opened by Heelim-Hong about 1 year ago
#396 - the traing log like this is Normal? I do not find loss in the logs, and what does the "grad norm: nan" mean?
Issue -
State: open - Opened by alphanlp about 1 year ago
#395 - The difference between zero-3 and megatron with zero-2
Issue -
State: open - Opened by nicosouth about 1 year ago
#394 - Question about the implementation of mpu.cross_entropy when using tensor parallel
Issue -
State: open - Opened by robin087 over 1 year ago
#393 - Feature/tigerbot
Pull Request -
State: closed - Opened by i4never over 1 year ago
#392 - questions about inconsistent evaluation result
Issue -
State: open - Opened by coorful over 1 year ago
#391 - stage3 error: IndexError: list index out of range
Issue -
State: closed - Opened by PhdShi over 1 year ago
- 1 comment
#390 - ModuleNotFoundError: No module named 'packaging' when install apex
Issue -
State: closed - Opened by SeekPoint over 1 year ago
- 3 comments
#389 - ModuleNotFoundError: No module named 'torch' when run 'pip install -e .', but pytorch exists
Issue -
State: closed - Opened by SeekPoint over 1 year ago
- 2 comments
#388 - Question about ds to universal
Issue -
State: open - Opened by saxh over 1 year ago
#387 - RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'
Issue -
State: open - Opened by zll0000 over 1 year ago
- 1 comment
#386 - hello, I meet a problem
Issue -
State: open - Opened by etoilestar over 1 year ago
- 8 comments
#385 - How to properly use Flops Profiler with pipelined parallelism?
Issue -
State: open - Opened by flyingdown over 1 year ago
#384 - Fix/dataloader error
Pull Request -
State: closed - Opened by EastInsure over 1 year ago
#383 - pip install -e . failed with ModuleNotFoundError: No module named 'torch'
Issue -
State: open - Opened by SeekPoint over 1 year ago
- 2 comments
#382 - Help me, I'm dying soon,error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error
Issue -
State: open - Opened by listwebit over 1 year ago
#381 - Megatron-DeepSpeed only applies to specific models?
Issue -
State: open - Opened by Bob-cby over 1 year ago
#380 - Universal checkpoints and MP states
Issue -
State: closed - Opened by aitorormazabal over 1 year ago
- 2 comments
#379 - The given group does not exist pytorch
Issue -
State: open - Opened by germanjke over 1 year ago
- 2 comments
#378 - upgrade megatron-lm
Issue -
State: open - Opened by dz1iang over 1 year ago
#377 - How can we access to the gradients while the model is training?
Issue -
State: open - Opened by BilgehanSel over 1 year ago
#376 - how to do prompt learning with bloom?
Issue -
State: open - Opened by moseshu over 1 year ago
#375 - how to frozen some layers of GPT, only fintune last k layers?
Issue -
State: open - Opened by joan126 over 1 year ago
#374 - How to convert model weights(e.g., bigscience/bloomz-560m-optimizer-states) to Hugging Face model.bin file?
Issue -
State: closed - Opened by qazwsx042 over 1 year ago
- 1 comment
#373 - Can I use python only apex for gpt_pretrain?
Issue -
State: open - Opened by Luoyang144 over 1 year ago
#372 - how to pretrain t5-lm adapted?
Issue -
State: open - Opened by nanyyyyyy over 1 year ago
#371 - How to preprocess data for t5 model?
Issue -
State: open - Opened by xiu-ze over 1 year ago
#370 - Add xPos embeddings
Pull Request -
State: open - Opened by janEbert over 1 year ago
#369 - Exception: cuda rng state model-parallel-rng is not added
Issue -
State: open - Opened by 520jefferson over 1 year ago
- 1 comment
#368 - 适配DCU
Pull Request -
State: closed - Opened by hepj987 over 1 year ago
#367 - Fix various small problems
Pull Request -
State: open - Opened by janEbert over 1 year ago
#366 - How to continue pre-training Bloom?
Issue -
State: open - Opened by ShinoharaHare over 1 year ago
- 2 comments
#365 - Bloom model training with AML
Pull Request -
State: open - Opened by savitamittal1 over 1 year ago
#364 - Are there any other layer norm functions, such as RMSNorm or DeepNorm
Issue -
State: open - Opened by lvcc2018 over 1 year ago
#363 - Is there any script for pretraining/funting Bloom?
Issue -
State: open - Opened by drxmy almost 2 years ago
#362 - Bsevalharness
Pull Request -
State: closed - Opened by Muennighoff almost 2 years ago
#361 - Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?
Issue -
State: closed - Opened by drxmy almost 2 years ago
#360 - Fatal error: cuda_fp16.h: No such file or directory on ROCm
Issue -
State: open - Opened by lvcc2018 almost 2 years ago
- 1 comment
#359 - fintuning bloom 176b with bitfit
Issue -
State: closed - Opened by drxmy almost 2 years ago
- 2 comments
#358 - Add UL2 data sampling and pretraining
Pull Request -
State: open - Opened by janEbert almost 2 years ago
- 3 comments
#357 - Add FlashAttention
Pull Request -
State: open - Opened by NouamaneTazi almost 2 years ago
- 3 comments
#356 - User Warnings for accessing grad attribute of non-leaf Tensors thrown with TP=1 and PP>1
Issue -
State: open - Opened by chelseajohn almost 2 years ago
- 3 comments
#355 - deepspeed_to_megatron several issues
Issue -
State: open - Opened by MatejUlcar about 2 years ago
- 4 comments
#354 - Distill BLOOM - tentative 2
Pull Request -
State: open - Opened by younesbelkada about 2 years ago
#353 - Enable rocm-support
Pull Request -
State: open - Opened by luukkonenr about 2 years ago
#352 - Distill megatron - test Draft WIP
Pull Request -
State: closed - Opened by younesbelkada about 2 years ago
#351 - Distill megatron - WIP draft code
Pull Request -
State: closed - Opened by younesbelkada about 2 years ago
#350 - Load Bloom Optimizer State (i.e. Bloom 1B1)
Issue -
State: open - Opened by philippmtk about 2 years ago
- 2 comments
#349 - Encoding checkpoint reshaping guide
Pull Request -
State: open - Opened by tjruwase about 2 years ago
- 1 comment
#348 - Slower inference results for BLOOM fp16 on identical hardware
Issue -
State: open - Opened by sarthaklangde about 2 years ago
- 5 comments
#347 - grad norm increase strangely
Issue -
State: open - Opened by misska1 about 2 years ago
- 12 comments
#346 - How to inference GPT2 with DeepSpeed?
Issue -
State: closed - Opened by cdj0311 about 2 years ago
- 1 comment
#345 - [bloom inference scripts] improvements
Pull Request -
State: closed - Opened by stas00 about 2 years ago
#344 - [Bloom inference] further improvements
Pull Request -
State: closed - Opened by stas00 about 2 years ago
- 1 comment
#343 - About reshape deepspeed checkpoint
Issue -
State: open - Opened by henan991201 about 2 years ago
- 20 comments
#342 - Installing Apex on Windows
Issue -
State: open - Opened by gordicaleksa about 2 years ago
- 1 comment
#341 - pretrain_gpt_distributed.sh ERROR!
Issue -
State: closed - Opened by cdj0311 about 2 years ago
#340 - [ds-inference bloom] tweaks
Pull Request -
State: closed - Opened by stas00 about 2 years ago
- 4 comments
#339 - Followup PR for adding generation-server
Pull Request -
State: closed - Opened by mayank31398 about 2 years ago
- 12 comments
#338 - About convert deepspeed to deepspeed checkpoint
Issue -
State: open - Opened by henan991201 about 2 years ago
- 4 comments
#337 - Finetuning BLOOM
Issue -
State: open - Opened by AnaRhisT94 about 2 years ago
- 5 comments
#336 - Add multiple evaluation compat
Pull Request -
State: open - Opened by Muennighoff about 2 years ago
#335 - Changing a single example affects forward pass for other examples in a batch
Issue -
State: closed - Opened by mayank31398 about 2 years ago
- 4 comments
Labels: bug
#334 - Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels
Issue -
State: open - Opened by CloudedLeopard17 about 2 years ago
- 4 comments
#333 - About convert DS checkpoint to Transformers
Issue -
State: closed - Opened by misska1 about 2 years ago
- 2 comments
#332 - disable CI
Pull Request -
State: closed - Opened by stas00 about 2 years ago
- 1 comment
#331 - merge main
Pull Request -
State: closed - Opened by Muennighoff about 2 years ago
#330 - DeepSpeed inference support for int8 parameters on BLOOM?
Issue -
State: closed - Opened by pai4451 about 2 years ago
- 6 comments
#329 - how to convert huggingface model to megatron-deepspeed?
Issue -
State: closed - Opened by yayaQAQ over 2 years ago
- 8 comments
#328 - Add generation server scripts using HF accelerate and DS-inference
Pull Request -
State: closed - Opened by mayank31398 over 2 years ago
- 46 comments
#327 - [checkpoints] replace bf16 with fp32 checkpoint weights
Pull Request -
State: open - Opened by stas00 over 2 years ago
- 3 comments
#326 - Add option to normalize loss per target
Pull Request -
State: closed - Opened by Muennighoff over 2 years ago
#325 - Add generation server scripts
Pull Request -
State: closed - Opened by mayank31398 over 2 years ago
- 1 comment
#324 - Errors in generation (Bloom) when changing options sampling/use_cache
Issue -
State: open - Opened by thies1006 over 2 years ago
- 29 comments
#323 - Question about downloading checkpoints of 6.3B,2.5B,1.3B
Issue -
State: open - Opened by misska1 over 2 years ago
- 3 comments
#322 - add args_deepspeed_gpt.sh
Pull Request -
State: closed - Opened by xyn1201 over 2 years ago
#321 - Generation server using HF accelerate and DS inference
Pull Request -
State: closed - Opened by mayank31398 over 2 years ago
- 19 comments
#320 - "Mask is silently ignored due to the use of a custom kernel" with pretrain_gpt_single_node.sh
Issue -
State: open - Opened by tianjianjiang over 2 years ago
- 4 comments
#319 - where can I download the 176B checkpoint in deepspeed format?
Issue -
State: open - Opened by xuyifan-0731 over 2 years ago
- 17 comments
#318 - Multi-node inference with Bloom: Unhandled CUDA error in ProcessGroupNCCL.cpp (called from all_reduce in torch)
Issue -
State: open - Opened by asaparov over 2 years ago
- 31 comments
#314 - How to run generation?
Issue -
State: closed - Opened by mayank31398 over 2 years ago
- 1 comment
#313 - Prefix LM Eval
Pull Request -
State: open - Opened by Muennighoff over 2 years ago
- 4 comments
#311 - Add Bitfit
Pull Request -
State: open - Opened by Muennighoff over 2 years ago
#309 - Enable loading ckpt for t0 finetuning
Pull Request -
State: open - Opened by Muennighoff over 2 years ago
#308 - BLOOM Inference via DeepSpeed-Inference, Accelerate and DeepSpeed-ZeRO
Pull Request -
State: closed - Opened by stas00 over 2 years ago
- 46 comments
#291 - BigScience Eval Harness
Pull Request -
State: open - Opened by Muennighoff over 2 years ago
#284 - MLM adaptation and Multitask Finetuning
Pull Request -
State: closed - Opened by lintangsutawika over 2 years ago
- 4 comments
#226 - Make sure deepspeed powered models are equivalent with their non deepspeed version
Issue -
State: open - Opened by thomasw21 almost 3 years ago
- 2 comments
Labels: Good First Issue
#163 - [Tensorboard] Log text prediction in evaluation
Issue -
State: open - Opened by thomasw21 about 3 years ago
- 14 comments
Labels: Good First Issue
#118 - Corby's numerically more stable self attn version
Pull Request -
State: closed - Opened by stas00 about 3 years ago
#114 - Add checks to confirm that the checkpoint conversion script works perfectly correct
Issue -
State: closed - Opened by ibeltagy about 3 years ago
- 8 comments
Labels: Good First Issue
#100 - Import issues when using evaluation scripts : `module 'megatron' has no attribute 'model'`
Issue -
State: closed - Opened by RomanCast about 3 years ago
#99 - Double counts in parameter count
Issue -
State: open - Opened by TevenLeScao about 3 years ago
- 2 comments