Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/DeepSpeedExamples issues and pull requests
#495 - What's default data_path in single_gpu/run_1.3b.sh
Issue -
State: closed - Opened by zy-sunshine over 1 year ago
- 1 comment
Labels: question, deespeed chat
#493 - Does deepspeed-chat support LLama ?
Issue -
State: open - Opened by janelu9 over 1 year ago
Labels: question, deespeed chat, llama
#493 - Does deepspeed-chat support LLama ?
Issue -
State: open - Opened by janelu9 over 1 year ago
Labels: question, deespeed chat, llama
#492 - OPT-13b model not pass step3_rlhf_finetuning
Issue -
State: open - Opened by korlin0110 over 1 year ago
- 3 comments
Labels: bug, deespeed chat
#491 - need a better metrics other than acc and average score for the reward modeling step
Issue -
State: open - Opened by DanqingZ over 1 year ago
- 5 comments
Labels: enhancement, deespeed chat, modeling
#491 - need a better metrics other than acc and average score for the reward modeling step
Issue -
State: open - Opened by DanqingZ over 1 year ago
- 5 comments
Labels: enhancement, deespeed chat, modeling
#490 - Why is it a need to set the pad token to be equal to eos token?
Issue -
State: open - Opened by alibabadoufu over 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#490 - Why is it a need to set the pad token to be equal to eos token?
Issue -
State: open - Opened by alibabadoufu over 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#489 - The reward in step3 seems to be completely random without any noticeable increase.
Issue -
State: open - Opened by laoda513 over 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#488 - Any plans to upgrade triton version in stable-diffusion example?
Issue -
State: open - Opened by alexeigor over 1 year ago
Labels: question
#488 - Any plans to upgrade triton version in stable-diffusion example?
Issue -
State: open - Opened by alexeigor over 1 year ago
Labels: question
#487 - program stop when i run my run_chinese.sh in GPU A100*4 80G
Issue -
State: open - Opened by liyuyuan6969 over 1 year ago
- 3 comments
Labels: deespeed chat
#487 - program stop when i run my run_chinese.sh in GPU A100*4 80G
Issue -
State: open - Opened by liyuyuan6969 over 1 year ago
- 3 comments
Labels: deespeed chat
#483 - Error in step1 finetuning decapoda-research/llama-7b-hf
Issue -
State: open - Opened by yctam over 1 year ago
- 3 comments
Labels: deespeed chat
#482 - A100 40 GB: OOM on step-3 for opt-6.7B
Issue -
State: open - Opened by akashsaravanan-georgian over 1 year ago
- 4 comments
Labels: deespeed chat, system, new-config
#481 - RuntimeError: Error building extension 'transformer_inference' in step3
Issue -
State: closed - Opened by SunQiDong1999 over 1 year ago
- 3 comments
#480 - unable to load 4 7b size model in step3
Issue -
State: open - Opened by Mr-lonely0 over 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#479 - Can not use bloom-560m model in the step2_reward_model_finetuning
Issue -
State: open - Opened by korlin0110 over 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#479 - Can not use bloom-560m model in the step2_reward_model_finetuning
Issue -
State: open - Opened by korlin0110 over 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#478 - inference OOM
Issue -
State: open - Opened by Haoran1234567 over 1 year ago
- 1 comment
Labels: deespeed chat
#478 - inference OOM
Issue -
State: open - Opened by Haoran1234567 over 1 year ago
- 1 comment
Labels: deespeed chat
#475 - DeepspeedExample-Chat training failed with single_node parameter
Issue -
State: closed - Opened by wanbo432503 over 1 year ago
- 11 comments
#474 - PPO training unable to reproduce the training log provided
Issue -
State: open - Opened by REIGN12 over 1 year ago
Labels: deespeed chat, modeling
#474 - PPO training unable to reproduce the training log provided
Issue -
State: open - Opened by REIGN12 over 1 year ago
Labels: deespeed chat, modeling
#472 - [deepspeed-chat] finetuned model can not even overfit a really small dataset with only 11 samples
Issue -
State: open - Opened by valkryhx over 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#472 - [deepspeed-chat] finetuned model can not even overfit a really small dataset with only 11 samples
Issue -
State: open - Opened by valkryhx over 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#470 - Fix to allow hf-generation to generate eos-token
Pull Request -
State: open - Opened by aakutalev over 1 year ago
- 4 comments
#470 - Fix to allow hf-generation to generate eos-token
Pull Request -
State: open - Opened by aakutalev over 1 year ago
- 4 comments
#469 - Training time of DeepSpeed-Chat’s RLHF examples
Issue -
State: open - Opened by zasitonbyl over 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#469 - Training time of DeepSpeed-Chat’s RLHF examples
Issue -
State: open - Opened by zasitonbyl over 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#467 - KeyError: "_name_or_path" in locally loading tokenizer config file for DeepSpeed-Chat
Issue -
State: open - Opened by ccclyu over 1 year ago
- 1 comment
Labels: deespeed chat
#463 - May I run any one of the example of deepseed on one 1080ti, P40 and P100
Issue -
State: closed - Opened by SeekPoint over 1 year ago
- 3 comments
Labels: deespeed chat
#463 - May I run any one of the example of deepseed on one 1080ti, P40 and P100
Issue -
State: closed - Opened by SeekPoint over 1 year ago
- 3 comments
Labels: deespeed chat
#462 - Fix chatbot
Pull Request -
State: open - Opened by yaozhewei over 1 year ago
#458 - Adding two loss from actor will lead to an error " gradient computed twice for this partition"
Issue -
State: open - Opened by piekey1994 over 1 year ago
- 4 comments
Labels: deespeed chat, new-config, modeling
#458 - Adding two loss from actor will lead to an error " gradient computed twice for this partition"
Issue -
State: open - Opened by piekey1994 over 1 year ago
- 4 comments
Labels: deespeed chat, new-config, modeling
#456 - enable_hybrid_engine issue
Issue -
State: open - Opened by llllooong over 1 year ago
- 3 comments
Labels: deespeed chat
#456 - enable_hybrid_engine issue
Issue -
State: open - Opened by llllooong over 1 year ago
- 3 comments
Labels: deespeed chat
#454 - fix step_time
Pull Request -
State: open - Opened by thuzhf over 1 year ago
#454 - fix step_time
Pull Request -
State: open - Opened by thuzhf over 1 year ago
#453 - [BUG]RuntimeError: CUDA error: unknown error
Issue -
State: open - Opened by SH0AN over 1 year ago
- 3 comments
Labels: deespeed chat
#453 - [BUG]RuntimeError: CUDA error: unknown error
Issue -
State: open - Opened by SH0AN over 1 year ago
- 3 comments
Labels: deespeed chat
#452 - cant use zero-offload
Issue -
State: open - Opened by yanqiangmiffy over 1 year ago
- 4 comments
Labels: deespeed chat
#452 - cant use zero-offload
Issue -
State: open - Opened by yanqiangmiffy over 1 year ago
- 4 comments
Labels: deespeed chat
#451 - Finetuning Bloom model in step 3 failed
Issue -
State: open - Opened by cokuehuang over 1 year ago
- 5 comments
Labels: deespeed chat
#448 - [DeepSpeedExamples/applications/DeepSpeed-Chat/] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B
Issue -
State: open - Opened by GxjGit over 1 year ago
- 1 comment
Labels: deespeed chat
#448 - [DeepSpeedExamples/applications/DeepSpeed-Chat/] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B
Issue -
State: open - Opened by GxjGit over 1 year ago
- 1 comment
Labels: deespeed chat
#447 - training 12b model seems to require more memory than expected
Issue -
State: open - Opened by ChaoChungWu-Johnson over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#447 - training 12b model seems to require more memory than expected
Issue -
State: open - Opened by ChaoChungWu-Johnson over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#443 - How to train deepspeed-chat using nccl with multi-nodes?
Issue -
State: open - Opened by SefaZeng over 1 year ago
- 2 comments
Labels: deespeed chat
#443 - How to train deepspeed-chat using nccl with multi-nodes?
Issue -
State: open - Opened by SefaZeng over 1 year ago
- 2 comments
Labels: deespeed chat
#442 - Performance gap between actor ema and actor
Issue -
State: closed - Opened by DanqingZ over 1 year ago
- 9 comments
#435 - gpt ppo training error
Issue -
State: open - Opened by lljjgg over 1 year ago
- 1 comment
Labels: deespeed chat
#429 - [ERROR]In Step3,load reward Model failed which trainged with zero-stage 3
Issue -
State: open - Opened by Clitost over 1 year ago
Labels: deespeed chat
#429 - [ERROR]In Step3,load reward Model failed which trainged with zero-stage 3
Issue -
State: open - Opened by Clitost over 1 year ago
Labels: deespeed chat
#428 - Step 3 1.3b Running process stuck
Issue -
State: open - Opened by awelldone over 1 year ago
- 3 comments
Labels: deespeed chat
#425 - Bug: incorrect metrics evaluating for step two
Issue -
State: open - Opened by s-isaev over 1 year ago
- 3 comments
Labels: deespeed chat
#423 - Bug: Numerically unstable loss at reward model
Issue -
State: closed - Opened by s-isaev over 1 year ago
- 8 comments
Labels: deespeed chat
#419 - step3 failed actor opt_1.3b critic opt_350m Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run
Issue -
State: open - Opened by BaiStone2017 over 1 year ago
- 1 comment
Labels: deespeed chat
#418 - Current loss scale already at minimum - cannot decrease scale anymore. Exiting run
Issue -
State: closed - Opened by HaixHan over 1 year ago
- 18 comments
#417 - Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat
Issue -
State: open - Opened by caoyu-noob over 1 year ago
- 4 comments
Labels: deespeed chat, new-config
#412 - if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?
Issue -
State: closed - Opened by janelu9 over 1 year ago
- 6 comments
Labels: deespeed chat
#406 - wandb support and evaluation
Issue -
State: open - Opened by DanqingZ over 1 year ago
- 1 comment
Labels: deespeed chat
#405 - How to run multinode script in slurm cluster?
Issue -
State: closed - Opened by wang-zerui over 1 year ago
- 2 comments
#403 - Error after changing the model from opt to gpt
Issue -
State: open - Opened by lljjgg over 1 year ago
Labels: deespeed chat
#403 - Error after changing the model from opt to gpt
Issue -
State: open - Opened by lljjgg over 1 year ago
Labels: deespeed chat
#392 - Step 2 exited with non-zero status 2
Issue -
State: closed - Opened by awelldone over 1 year ago
- 4 comments
Labels: deespeed chat
#385 - Step 3: RuntimeError: CUDA error: misaligned address
Issue -
State: open - Opened by EikeKohl over 1 year ago
- 5 comments
Labels: deespeed chat
#379 - AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group' in step 3.
Issue -
State: open - Opened by Arain-sh over 1 year ago
- 9 comments
Labels: deespeed chat, hybrid engine
#379 - AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group' in step 3.
Issue -
State: open - Opened by Arain-sh over 1 year ago
- 9 comments
Labels: deespeed chat, hybrid engine
#377 - how to use zero-offload?
Issue -
State: closed - Opened by xdnjust over 1 year ago
- 7 comments
Labels: enhancement, deespeed chat
#375 - 【BUG】occur error:AttributerError:'DeepSpeedHybridEngine' object has no attribute 'mp_group' whiling run llama7b for step3/rlhf/ppo
Issue -
State: open - Opened by Pattaro over 1 year ago
- 3 comments
Labels: deespeed chat, hybrid engine
#374 - Missing key(s) in state_dict for bias in attention blocks
Issue -
State: open - Opened by EikeKohl over 1 year ago
- 1 comment
Labels: deespeed chat
#373 - When running Stage-3 scripts with enable_hybrid_engine encountered errors
Issue -
State: open - Opened by fakegao over 1 year ago
- 9 comments
Labels: deespeed chat
#373 - When running Stage-3 scripts with enable_hybrid_engine encountered errors
Issue -
State: open - Opened by fakegao over 1 year ago
- 9 comments
Labels: deespeed chat
#367 - Exchange group; 交流群
Issue -
State: open - Opened by yrqUni over 1 year ago
- 8 comments
Labels: question, deespeed chat
#365 - how to save model, i cant load saved llama7b model
Issue -
State: open - Opened by Pattaro over 1 year ago
- 1 comment
Labels: deespeed chat
#365 - how to save model, i cant load saved llama7b model
Issue -
State: open - Opened by Pattaro over 1 year ago
- 1 comment
Labels: deespeed chat
#361 - Run 1.6 billon demo is much slow than description on A100 GPU?
Issue -
State: open - Opened by tcluoct over 1 year ago
- 4 comments
Labels: deespeed chat
#359 - map[i] = val_or_map.get(i, Std.NONE) AttributeError: 'NoneType' object has no attribute 'get'
Issue -
State: open - Opened by SeekPoint over 1 year ago
- 1 comment
Labels: bug
#359 - map[i] = val_or_map.get(i, Std.NONE) AttributeError: 'NoneType' object has no attribute 'get'
Issue -
State: open - Opened by SeekPoint over 1 year ago
- 1 comment
Labels: bug
#358 - Error(s) in loading state_dict for RewardModel:size mismatch
Issue -
State: closed - Opened by MAJIN123 over 1 year ago
- 5 comments
#358 - Error(s) in loading state_dict for RewardModel:size mismatch
Issue -
State: closed - Opened by MAJIN123 over 1 year ago
- 5 comments
#356 - use bloom-350m to train reward model in step2
Issue -
State: open - Opened by panxb833 over 1 year ago
- 2 comments
Labels: deespeed chat
#355 - it took much time to initial:Initializing TorchBackend in DeepSpeed with backend nccl. Could you please help me to eliminate the consumed time.thx
Issue -
State: open - Opened by Modas-Li over 1 year ago
- 2 comments
#353 - When training GPT-2 with Zero-3, some parameters will be missing when saving the model
Issue -
State: closed - Opened by koking0 over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#353 - When training GPT-2 with Zero-3, some parameters will be missing when saving the model
Issue -
State: closed - Opened by koking0 over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#350 - About release date for Llama system support
Issue -
State: open - Opened by rockstone533 over 1 year ago
- 2 comments
Labels: deespeed chat
#349 - Using LLaMA in reward model training
Issue -
State: open - Opened by YingHH1 over 1 year ago
- 6 comments
Labels: deespeed chat, llama
#349 - Using LLaMA in reward model training
Issue -
State: open - Opened by YingHH1 over 1 year ago
- 6 comments
Labels: deespeed chat, llama
#348 - Add snip_momentum structured pruning example with 80% sparsity ratio
Pull Request -
State: closed - Opened by ftian1 over 1 year ago
- 2 comments
#348 - Add snip_momentum structured pruning example with 80% sparsity ratio
Pull Request -
State: closed - Opened by ftian1 over 1 year ago
- 2 comments
#343 - BELLE has supported DeepSpeed-Chat.
Pull Request -
State: closed - Opened by xianghuisun over 1 year ago
#343 - BELLE has supported DeepSpeed-Chat.
Pull Request -
State: closed - Opened by xianghuisun over 1 year ago
#338 - Error when using BLOOMZ for reward model training
Issue -
State: open - Opened by Luoyang144 over 1 year ago
- 15 comments
Labels: deespeed chat
#329 - Training with Llama 7b
Issue -
State: closed - Opened by alibabadoufu over 1 year ago
- 21 comments
Labels: bug, deespeed chat
#325 - RuntimeError: Connection reset by peer
Issue -
State: closed - Opened by qinqinqaq over 1 year ago
- 1 comment
#325 - RuntimeError: Connection reset by peer
Issue -
State: closed - Opened by qinqinqaq over 1 year ago
- 1 comment
#313 - run deepspeed_chat example code error
Issue -
State: open - Opened by bestpredicts over 1 year ago
- 4 comments
Labels: bug, deespeed chat
#313 - run deepspeed_chat example code error
Issue -
State: open - Opened by bestpredicts over 1 year ago
- 4 comments
Labels: bug, deespeed chat