Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/DeepSpeedExamples issues and pull requests
#435 - gpt ppo training error
Issue -
State: open - Opened by lljjgg about 1 year ago
- 1 comment
Labels: deespeed chat
#429 - [ERROR]In Step3,load reward Model failed which trainged with zero-stage 3
Issue -
State: open - Opened by Clitost about 1 year ago
Labels: deespeed chat
#429 - [ERROR]In Step3,load reward Model failed which trainged with zero-stage 3
Issue -
State: open - Opened by Clitost about 1 year ago
Labels: deespeed chat
#428 - Step 3 1.3b Running process stuck
Issue -
State: open - Opened by awelldone about 1 year ago
- 3 comments
Labels: deespeed chat
#425 - Bug: incorrect metrics evaluating for step two
Issue -
State: open - Opened by s-isaev about 1 year ago
- 3 comments
Labels: deespeed chat
#423 - Bug: Numerically unstable loss at reward model
Issue -
State: closed - Opened by s-isaev about 1 year ago
- 8 comments
Labels: deespeed chat
#419 - step3 failed actor opt_1.3b critic opt_350m Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run
Issue -
State: open - Opened by BaiStone2017 about 1 year ago
- 1 comment
Labels: deespeed chat
#418 - Current loss scale already at minimum - cannot decrease scale anymore. Exiting run
Issue -
State: closed - Opened by HaixHan about 1 year ago
- 6 comments
#417 - Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat
Issue -
State: open - Opened by caoyu-noob about 1 year ago
- 2 comments
Labels: deespeed chat
#412 - if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?
Issue -
State: closed - Opened by janelu9 about 1 year ago
- 6 comments
Labels: deespeed chat
#406 - wandb support and evaluation
Issue -
State: open - Opened by DanqingZ about 1 year ago
- 1 comment
Labels: deespeed chat
#403 - Error after changing the model from opt to gpt
Issue -
State: open - Opened by lljjgg about 1 year ago
Labels: deespeed chat
#403 - Error after changing the model from opt to gpt
Issue -
State: open - Opened by lljjgg about 1 year ago
Labels: deespeed chat
#392 - Step 2 exited with non-zero status 2
Issue -
State: closed - Opened by awelldone about 1 year ago
- 4 comments
Labels: deespeed chat
#385 - Step 3: RuntimeError: CUDA error: misaligned address
Issue -
State: open - Opened by EikeKohl about 1 year ago
- 5 comments
Labels: deespeed chat
#379 - AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group' in step 3.
Issue -
State: open - Opened by Arain-sh about 1 year ago
- 6 comments
Labels: deespeed chat
#379 - AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group' in step 3.
Issue -
State: open - Opened by Arain-sh about 1 year ago
- 6 comments
Labels: deespeed chat
#377 - how to use zero-offload?
Issue -
State: open - Opened by xdnjust about 1 year ago
- 5 comments
#374 - Missing key(s) in state_dict for bias in attention blocks
Issue -
State: open - Opened by EikeKohl about 1 year ago
- 1 comment
Labels: deespeed chat
#373 - When running Stage-3 scripts with enable_hybrid_engine encountered errors
Issue -
State: open - Opened by fakegao about 1 year ago
- 9 comments
Labels: deespeed chat
#373 - When running Stage-3 scripts with enable_hybrid_engine encountered errors
Issue -
State: open - Opened by fakegao about 1 year ago
- 9 comments
Labels: deespeed chat
#365 - how to save model, i cant load saved llama7b model
Issue -
State: open - Opened by Pattaro about 1 year ago
- 1 comment
Labels: deespeed chat
#365 - how to save model, i cant load saved llama7b model
Issue -
State: open - Opened by Pattaro about 1 year ago
- 1 comment
Labels: deespeed chat
#361 - Run 1.6 billon demo is much slow than description on A100 GPU?
Issue -
State: open - Opened by tcluoct about 1 year ago
- 4 comments
Labels: deespeed chat
#359 - map[i] = val_or_map.get(i, Std.NONE) AttributeError: 'NoneType' object has no attribute 'get'
Issue -
State: open - Opened by SeekPoint about 1 year ago
- 1 comment
Labels: bug
#359 - map[i] = val_or_map.get(i, Std.NONE) AttributeError: 'NoneType' object has no attribute 'get'
Issue -
State: open - Opened by SeekPoint about 1 year ago
- 1 comment
Labels: bug
#358 - Error(s) in loading state_dict for RewardModel:size mismatch
Issue -
State: closed - Opened by MAJIN123 about 1 year ago
- 5 comments
#358 - Error(s) in loading state_dict for RewardModel:size mismatch
Issue -
State: closed - Opened by MAJIN123 about 1 year ago
- 5 comments
#356 - use bloom-350m to train reward model in step2
Issue -
State: open - Opened by panxb833 about 1 year ago
- 2 comments
Labels: deespeed chat
#355 - it took much time to initial:Initializing TorchBackend in DeepSpeed with backend nccl. Could you please help me to eliminate the consumed time.thx
Issue -
State: open - Opened by Modas-Li about 1 year ago
- 2 comments
#353 - When training GPT-2 with Zero-3, some parameters will be missing when saving the model
Issue -
State: open - Opened by koking0 about 1 year ago
- 1 comment
Labels: deespeed chat
#353 - When training GPT-2 with Zero-3, some parameters will be missing when saving the model
Issue -
State: open - Opened by koking0 about 1 year ago
- 1 comment
Labels: deespeed chat
#350 - About release date for Llama system support
Issue -
State: open - Opened by rockstone533 about 1 year ago
- 2 comments
Labels: deespeed chat
#349 - Using LLaMA in reward model training
Issue -
State: open - Opened by YingHH1 about 1 year ago
- 5 comments
Labels: deespeed chat
#349 - Using LLaMA in reward model training
Issue -
State: open - Opened by YingHH1 about 1 year ago
- 5 comments
Labels: deespeed chat
#348 - Add snip_momentum structured pruning example with 80% sparsity ratio
Pull Request -
State: closed - Opened by ftian1 about 1 year ago
- 2 comments
#348 - Add snip_momentum structured pruning example with 80% sparsity ratio
Pull Request -
State: closed - Opened by ftian1 about 1 year ago
- 2 comments
#343 - BELLE has supported DeepSpeed-Chat.
Pull Request -
State: closed - Opened by xianghuisun about 1 year ago
#343 - BELLE has supported DeepSpeed-Chat.
Pull Request -
State: closed - Opened by xianghuisun about 1 year ago
#338 - Error when using BLOOMZ for reward model training
Issue -
State: open - Opened by Luoyang144 about 1 year ago
- 15 comments
Labels: deespeed chat
#329 - Training with Llama 7b
Issue -
State: closed - Opened by alibabadoufu about 1 year ago
- 21 comments
Labels: bug, deespeed chat
#313 - run deepspeed_chat example code error
Issue -
State: open - Opened by bestpredicts about 1 year ago
- 4 comments
Labels: bug, deespeed chat
#313 - run deepspeed_chat example code error
Issue -
State: open - Opened by bestpredicts about 1 year ago
- 4 comments
Labels: bug, deespeed chat
#305 - New training: Alpaca-lora-zero3 on 2080Ti
Pull Request -
State: closed - Opened by bigeagle about 1 year ago
- 7 comments
#305 - New training: Alpaca-lora-zero3 on 2080Ti
Pull Request -
State: closed - Opened by bigeagle about 1 year ago
- 7 comments
#304 - If I use a self-improved transformer architecture, can it support?
Issue -
State: open - Opened by liujuncn about 1 year ago
Labels: deespeed chat
#304 - If I use a self-improved transformer architecture, can it support?
Issue -
State: open - Opened by liujuncn about 1 year ago
Labels: deespeed chat
#297 - The step2 scoring looks correct but the step3 model is talking gibberish
Issue -
State: closed - Opened by panganqi about 1 year ago
- 12 comments
Labels: bug, deespeed chat
#297 - The step2 scoring looks correct but the step3 model is talking gibberish
Issue -
State: closed - Opened by panganqi about 1 year ago
- 12 comments
Labels: bug, deespeed chat