Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/DeepSpeedExamples issues and pull requests
#532 - Hyper-param tuning for PPO
Issue -
State: open - Opened by luzai about 1 year ago
#531 - Error: Current loss scale already at minimum - cannot decrease scale anymore
Issue -
State: open - Opened by GenVr about 1 year ago
#530 - When running step3, the error "CUDA error: misaligned address"?
Issue -
State: open - Opened by EircYangQiXin about 1 year ago
#529 - Much more memory used in step 3 when using multi gpus compared to using single gpu
Issue -
State: open - Opened by cokuehuang about 1 year ago
Labels: deespeed chat, system
#528 - Rewards in ppo seem to be recomputed many times
Issue -
State: open - Opened by dwyzzy about 1 year ago
Labels: deespeed chat, modeling
#527 - Step2 reward model 'chosen_last_scores' is really low, while acc is really high
Issue -
State: open - Opened by LuciusMos about 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#526 - Step 3 issue - TypeError for * : float and NoneType for DeepSpeedChat while generating experience.
Issue -
State: open - Opened by Ankush2k about 1 year ago
- 1 comment
Labels: deespeed chat, new-config, modeling
#525 - [bug]AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group'
Issue -
State: open - Opened by qingchu123 about 1 year ago
- 1 comment
Labels: bug, deespeed chat
#524 - RuntimeError: Error building extension 'transformer_inference'
Issue -
State: closed - Opened by li995495592 about 1 year ago
- 3 comments
Labels: deespeed chat
#523 - Lm workaround
Pull Request -
State: closed - Opened by yaozhewei about 1 year ago
#523 - Lm workaround
Pull Request -
State: closed - Opened by yaozhewei about 1 year ago
#522 - Training gpt-neo-1.3B on GPU
Issue -
State: open - Opened by GenVr about 1 year ago
- 2 comments
Labels: deespeed chat, system
#521 - OOM problem when fine-tune reward model with LLaMA in step 2
Issue -
State: open - Opened by kiseliu about 1 year ago
- 1 comment
Labels: deespeed chat, llama
#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by EthenZhang about 1 year ago
- 1 comment
#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by EthenZhang about 1 year ago
- 1 comment
#519 - fix spelling mistakes & bash script
Pull Request -
State: open - Opened by QixuanAI about 1 year ago
- 2 comments
#518 - RLHF model return '{: {: {:' of every input
Issue -
State: open - Opened by kuangdao about 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#517 - Sequence truncation mistake in step3 training
Issue -
State: open - Opened by puyuanOT about 1 year ago
Labels: deespeed chat, modeling
#517 - Sequence truncation mistake in step3 training
Issue -
State: open - Opened by puyuanOT about 1 year ago
Labels: deespeed chat, modeling
#516 - no GPU resources available
Issue -
State: closed - Opened by wuchaooooo about 1 year ago
- 1 comment
Labels: deespeed chat
#516 - no GPU resources available
Issue -
State: closed - Opened by wuchaooooo about 1 year ago
- 1 comment
Labels: deespeed chat
#515 - RuntimeError: Error building extension 'transformer_inference'
Issue -
State: closed - Opened by wwh5441 about 1 year ago
- 1 comment
#514 - Does the framework support ChatGLM now?
Issue -
State: open - Opened by MAJIN123 about 1 year ago
- 2 comments
Labels: deespeed chat, modeling
#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig
Issue -
State: closed - Opened by Chtholly1 about 1 year ago
- 5 comments
#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig
Issue -
State: closed - Opened by Chtholly1 about 1 year ago
- 5 comments
#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight
Issue -
State: open - Opened by KyrieXu11 about 1 year ago
- 6 comments
Labels: deespeed chat, llama
#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight
Issue -
State: open - Opened by KyrieXu11 about 1 year ago
- 6 comments
Labels: deespeed chat, llama
#511 - DeepSpeed-Chat cannot load models from local file?
Issue -
State: open - Opened by MianWang123 about 1 year ago
Labels: deespeed chat, new-config
#511 - DeepSpeed-Chat cannot load models from local file?
Issue -
State: open - Opened by MianWang123 about 1 year ago
Labels: deespeed chat, new-config
#510 - How to save the model after each epoch
Issue -
State: open - Opened by nieallen about 1 year ago
- 1 comment
Labels: deespeed chat
#510 - How to save the model after each epoch
Issue -
State: open - Opened by nieallen about 1 year ago
- 1 comment
Labels: deespeed chat
#509 - NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by yangzhipeng1108 about 1 year ago
- 1 comment
#509 - NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by yangzhipeng1108 about 1 year ago
- 1 comment
#508 - Is there any Deepspeed Inference PTQ Example?
Issue -
State: open - Opened by tingshua-yts about 1 year ago
Labels: question, deespeed chat
#508 - Is there any Deepspeed Inference PTQ Example?
Issue -
State: open - Opened by tingshua-yts about 1 year ago
Labels: question, deespeed chat
#507 - Error for run_chinese.sh of step1, other_language
Issue -
State: closed - Opened by korlin0110 about 1 year ago
- 3 comments
Labels: deespeed chat
#507 - Error for run_chinese.sh of step1, other_language
Issue -
State: closed - Opened by korlin0110 about 1 year ago
- 3 comments
Labels: deespeed chat
#506 - In Step3: RuntimeError: numel: integer multiplication overflow
Issue -
State: open - Opened by 480284856 about 1 year ago
Labels: deespeed chat, new-config, modeling
#506 - In Step3: RuntimeError: numel: integer multiplication overflow
Issue -
State: open - Opened by 480284856 about 1 year ago
Labels: deespeed chat, new-config, modeling
#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model
Issue -
State: open - Opened by korlin0110 about 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model
Issue -
State: open - Opened by korlin0110 about 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#504 - merge master
Pull Request -
State: closed - Opened by yaozhewei about 1 year ago
#503 - Might be a bug of hibrid engine : In Step3 wrong generation secquence when hibrid engine is enabled.
Issue -
State: open - Opened by laoda513 about 1 year ago
- 4 comments
Labels: deespeed chat
#502 - Does the 1.3B model support multiple rounds of dialogue?
Issue -
State: open - Opened by tensorflowt about 1 year ago
Labels: deespeed chat
#501 - Fixing the numerical instability when calculating the loss of the cri…
Pull Request -
State: closed - Opened by minjiaz about 1 year ago
#498 - CPU OOM in training of step3
Issue -
State: open - Opened by cokuehuang about 1 year ago
- 2 comments
Labels: deespeed chat, system
#498 - CPU OOM in training of step3
Issue -
State: open - Opened by cokuehuang about 1 year ago
- 2 comments
Labels: deespeed chat, system
#497 - deepspeed hybrid-engine support bloom model with zero3?
Issue -
State: open - Opened by null-test-7 about 1 year ago
- 1 comment
Labels: deespeed chat, new-config
#496 - What is the purpose of end_of_conversation_token="<|endoftext|> and why it is not added as special token?
Issue -
State: open - Opened by REIGN12 about 1 year ago
- 1 comment
Labels: question, deespeed chat
#495 - What's default data_path in single_gpu/run_1.3b.sh
Issue -
State: closed - Opened by zy-sunshine about 1 year ago
- 1 comment
Labels: question, deespeed chat
#493 - Does deepspeed-chat support LLama ?
Issue -
State: open - Opened by janelu9 about 1 year ago
Labels: question, deespeed chat, llama
#493 - Does deepspeed-chat support LLama ?
Issue -
State: open - Opened by janelu9 about 1 year ago
Labels: question, deespeed chat, llama
#492 - OPT-13b model not pass step3_rlhf_finetuning
Issue -
State: open - Opened by korlin0110 about 1 year ago
- 3 comments
Labels: bug, deespeed chat
#491 - need a better metrics other than acc and average score for the reward modeling step
Issue -
State: open - Opened by DanqingZ about 1 year ago
- 5 comments
Labels: enhancement, deespeed chat, modeling
#491 - need a better metrics other than acc and average score for the reward modeling step
Issue -
State: open - Opened by DanqingZ about 1 year ago
- 5 comments
Labels: enhancement, deespeed chat, modeling
#490 - Why is it a need to set the pad token to be equal to eos token?
Issue -
State: open - Opened by alibabadoufu about 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#490 - Why is it a need to set the pad token to be equal to eos token?
Issue -
State: open - Opened by alibabadoufu about 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#489 - The reward in step3 seems to be completely random without any noticeable increase.
Issue -
State: open - Opened by laoda513 about 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#488 - Any plans to upgrade triton version in stable-diffusion example?
Issue -
State: open - Opened by alexeigor about 1 year ago
Labels: question
#488 - Any plans to upgrade triton version in stable-diffusion example?
Issue -
State: open - Opened by alexeigor about 1 year ago
Labels: question
#487 - program stop when i run my run_chinese.sh in GPU A100*4 80G
Issue -
State: open - Opened by liyuyuan6969 about 1 year ago
- 3 comments
Labels: deespeed chat
#487 - program stop when i run my run_chinese.sh in GPU A100*4 80G
Issue -
State: open - Opened by liyuyuan6969 about 1 year ago
- 3 comments
Labels: deespeed chat
#483 - Error in step1 finetuning decapoda-research/llama-7b-hf
Issue -
State: open - Opened by yctam about 1 year ago
- 3 comments
Labels: deespeed chat
#482 - A100 40 GB: OOM on step-3 for opt-6.7B
Issue -
State: open - Opened by akashsaravanan-georgian about 1 year ago
- 4 comments
Labels: deespeed chat, system, new-config
#481 - RuntimeError: Error building extension 'transformer_inference' in step3
Issue -
State: closed - Opened by SunQiDong1999 about 1 year ago
- 3 comments
#480 - unable to load 4 7b size model in step3
Issue -
State: open - Opened by Mr-lonely0 about 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#479 - Can not use bloom-560m model in the step2_reward_model_finetuning
Issue -
State: open - Opened by korlin0110 about 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#479 - Can not use bloom-560m model in the step2_reward_model_finetuning
Issue -
State: open - Opened by korlin0110 about 1 year ago
- 3 comments
Labels: deespeed chat, system, new-config
#478 - inference OOM
Issue -
State: open - Opened by Haoran1234567 about 1 year ago
- 1 comment
Labels: deespeed chat
#478 - inference OOM
Issue -
State: open - Opened by Haoran1234567 about 1 year ago
- 1 comment
Labels: deespeed chat
#475 - DeepspeedExample-Chat training failed with single_node parameter
Issue -
State: closed - Opened by wanbo432503 about 1 year ago
- 11 comments
#474 - PPO training unable to reproduce the training log provided
Issue -
State: open - Opened by REIGN12 about 1 year ago
Labels: deespeed chat, modeling
#474 - PPO training unable to reproduce the training log provided
Issue -
State: open - Opened by REIGN12 about 1 year ago
Labels: deespeed chat, modeling
#472 - [deepspeed-chat] finetuned model can not even overfit a really small dataset with only 11 samples
Issue -
State: open - Opened by valkryhx about 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#472 - [deepspeed-chat] finetuned model can not even overfit a really small dataset with only 11 samples
Issue -
State: open - Opened by valkryhx about 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#470 - Fix to allow hf-generation to generate eos-token
Pull Request -
State: open - Opened by aakutalev about 1 year ago
- 4 comments
#470 - Fix to allow hf-generation to generate eos-token
Pull Request -
State: open - Opened by aakutalev about 1 year ago
- 4 comments
#469 - Training time of DeepSpeed-Chat’s RLHF examples
Issue -
State: open - Opened by zasitonbyl about 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#469 - Training time of DeepSpeed-Chat’s RLHF examples
Issue -
State: open - Opened by zasitonbyl about 1 year ago
- 7 comments
Labels: deespeed chat, modeling
#467 - KeyError: "_name_or_path" in locally loading tokenizer config file for DeepSpeed-Chat
Issue -
State: open - Opened by ccclyu about 1 year ago
- 1 comment
Labels: deespeed chat
#463 - May I run any one of the example of deepseed on one 1080ti, P40 and P100
Issue -
State: closed - Opened by SeekPoint about 1 year ago
- 3 comments
Labels: deespeed chat
#463 - May I run any one of the example of deepseed on one 1080ti, P40 and P100
Issue -
State: closed - Opened by SeekPoint about 1 year ago
- 3 comments
Labels: deespeed chat
#458 - Adding two loss from actor will lead to an error " gradient computed twice for this partition"
Issue -
State: open - Opened by piekey1994 about 1 year ago
- 1 comment
Labels: deespeed chat
#458 - Adding two loss from actor will lead to an error " gradient computed twice for this partition"
Issue -
State: open - Opened by piekey1994 about 1 year ago
- 1 comment
Labels: deespeed chat
#456 - enable_hybrid_engine issue
Issue -
State: open - Opened by llllooong about 1 year ago
- 3 comments
Labels: deespeed chat
#456 - enable_hybrid_engine issue
Issue -
State: open - Opened by llllooong about 1 year ago
- 3 comments
Labels: deespeed chat
#454 - fix step_time
Pull Request -
State: open - Opened by thuzhf about 1 year ago
#454 - fix step_time
Pull Request -
State: open - Opened by thuzhf about 1 year ago
#453 - [BUG]RuntimeError: CUDA error: unknown error
Issue -
State: open - Opened by SH0AN about 1 year ago
- 3 comments
Labels: deespeed chat
#453 - [BUG]RuntimeError: CUDA error: unknown error
Issue -
State: open - Opened by SH0AN about 1 year ago
- 3 comments
Labels: deespeed chat
#452 - cant use zero-offload
Issue -
State: open - Opened by yanqiangmiffy about 1 year ago
- 4 comments
Labels: deespeed chat
#452 - cant use zero-offload
Issue -
State: open - Opened by yanqiangmiffy about 1 year ago
- 4 comments
Labels: deespeed chat
#451 - Finetuning Bloom model in step 3 failed
Issue -
State: open - Opened by cokuehuang about 1 year ago
- 5 comments
Labels: deespeed chat
#448 - [DeepSpeedExamples/applications/DeepSpeed-Chat/] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B
Issue -
State: open - Opened by GxjGit about 1 year ago
- 1 comment
Labels: deespeed chat
#448 - [DeepSpeedExamples/applications/DeepSpeed-Chat/] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B
Issue -
State: open - Opened by GxjGit about 1 year ago
- 1 comment
Labels: deespeed chat
#447 - training 12b model seems to require more memory than expected
Issue -
State: open - Opened by ChaoChungWu-Johnson about 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#447 - training 12b model seems to require more memory than expected
Issue -
State: open - Opened by ChaoChungWu-Johnson about 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#443 - How to train deepspeed-chat using nccl with multi-nodes?
Issue -
State: open - Opened by SefaZeng about 1 year ago
- 2 comments
Labels: deespeed chat
#443 - How to train deepspeed-chat using nccl with multi-nodes?
Issue -
State: open - Opened by SefaZeng about 1 year ago
- 2 comments
Labels: deespeed chat
#442 - Performance gap between actor ema and actor
Issue -
State: closed - Opened by DanqingZ about 1 year ago
- 9 comments