Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/DeepSpeedExamples issues and pull requests
#868 - Update Inference Benchmarking Scripts - Support AML
Pull Request -
State: closed - Opened by lekurile 9 months ago
- 1 comment
#868 - Update Inference Benchmarking Scripts - Support AML
Pull Request -
State: closed - Opened by lekurile 9 months ago
- 1 comment
#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
Issue -
State: open - Opened by allanj 9 months ago
- 3 comments
#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
Issue -
State: open - Opened by allanj 9 months ago
- 3 comments
#866 - [BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
Issue -
State: open - Opened by foin6 9 months ago
- 2 comments
#866 - [BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
Issue -
State: open - Opened by foin6 9 months ago
- 2 comments
#865 - Extend FastGen benchmark to use AML endpoints
Pull Request -
State: closed - Opened by mrwyattii 9 months ago
#865 - Extend FastGen benchmark to use AML endpoints
Pull Request -
State: closed - Opened by mrwyattii 9 months ago
#864 - zero3 and enable hybrid engine are not suitable for llama2, how to solve it?
Issue -
State: open - Opened by terence1023 9 months ago
- 3 comments
#863 - <fill-mask>Modify codes so that different accelerators can be called according to specific device conditions
Pull Request -
State: closed - Opened by foin6 9 months ago
- 1 comment
#863 - <fill-mask>Modify codes so that different accelerators can be called according to specific device conditions
Pull Request -
State: closed - Opened by foin6 9 months ago
- 1 comment
#862 - Fix path in human-eval example README
Pull Request -
State: closed - Opened by lekurile 9 months ago
#862 - Fix path in human-eval example README
Pull Request -
State: closed - Opened by lekurile 9 months ago
#861 - RLHF problems when using Qwen model
Issue -
State: open - Opened by 128Ghe980 9 months ago
- 1 comment
#861 - RLHF problems when using Qwen model
Issue -
State: open - Opened by 128Ghe980 9 months ago
- 1 comment
#860 - Codellama finetune
Issue -
State: open - Opened by nani1149 9 months ago
#860 - Codellama finetune
Issue -
State: open - Opened by nani1149 9 months ago
#859 - Different accelerators can be called according to specific device conditions
Pull Request -
State: closed - Opened by foin6 10 months ago
#858 - Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
Issue -
State: open - Opened by goelayu 10 months ago
#858 - Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
Issue -
State: open - Opened by goelayu 10 months ago
#856 - Add Human Eval Example
Pull Request -
State: closed - Opened by lekurile 10 months ago
#856 - Add Human Eval Example
Pull Request -
State: closed - Opened by lekurile 10 months ago
#853 - Control the kernel injection with new argument. And compare the outputs only on rank 0
Pull Request -
State: closed - Opened by foin6 11 months ago
- 6 comments
#853 - Control the kernel injection with new argument. And compare the outputs only on rank 0
Pull Request -
State: closed - Opened by foin6 11 months ago
- 6 comments
#846 - Enable overlap_comm for better performance
Pull Request -
State: closed - Opened by li-plus 11 months ago
#834 - fix: don't add eot token if add_eot_token knob is False
Pull Request -
State: closed - Opened by EeyoreLee 11 months ago
#828 - Add DPO support for DeepSpeed-Chat
Pull Request -
State: open - Opened by stceum 12 months ago
- 1 comment
#828 - Add DPO support for DeepSpeed-Chat
Pull Request -
State: open - Opened by stceum 12 months ago
- 1 comment
#821 - [BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled
Issue -
State: open - Opened by GeekDream-x 12 months ago
- 9 comments
#821 - [BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled
Issue -
State: open - Opened by GeekDream-x 12 months ago
- 8 comments
#819 - Fix labels & eos_token for SFT
Pull Request -
State: closed - Opened by li-plus 12 months ago
- 4 comments
#796 - Question about loading Dahous dataset from local path.
Issue -
State: open - Opened by Zhutianyi7230 about 1 year ago
- 9 comments
#795 - Unable to install deepspeed
Issue -
State: closed - Opened by cainiaobibi about 1 year ago
- 5 comments
#791 - How to save memory during inference
Issue -
State: open - Opened by Kangkang625 about 1 year ago
- 1 comment
#791 - How to save memory during inference
Issue -
State: open - Opened by Kangkang625 about 1 year ago
- 1 comment
#786 - step 3 "run_6.7b_lora.sh" doesn't work with a100 80GB single gpu.
Issue -
State: open - Opened by sophus1004 about 1 year ago
- 1 comment
#786 - step 3 "run_6.7b_lora.sh" doesn't work with a100 80GB single gpu.
Issue -
State: open - Opened by sophus1004 about 1 year ago
- 1 comment
#780 - deepspeed-chat: print mean stage1/2 loss periodically
Pull Request -
State: closed - Opened by mosheisland about 1 year ago
- 3 comments
#774 - Wrong import in inference quantization example
Issue -
State: open - Opened by Epliz about 1 year ago
- 1 comment
#774 - Wrong import in inference quantization example
Issue -
State: open - Opened by Epliz about 1 year ago
- 1 comment
#672 - Potential Bugs: `ends` in `ppo_trainer.py`
Pull Request -
State: closed - Opened by ZHZisZZ over 1 year ago
- 2 comments
#672 - Potential Bugs: `ends` in `ppo_trainer.py`
Pull Request -
State: closed - Opened by ZHZisZZ over 1 year ago
- 2 comments
#652 - changed step 3 scripts
Pull Request -
State: closed - Opened by askxiaozhang over 1 year ago
#639 - ModuleNotFoundError: No module named 'utils.data'
Issue -
State: open - Opened by xtu-xiaoc over 1 year ago
- 3 comments
#637 - Step3 Is padding side right or not?
Issue -
State: open - Opened by AaronKemon over 1 year ago
- 1 comment
#637 - Step3 Is padding side right or not?
Issue -
State: open - Opened by AaronKemon over 1 year ago
- 1 comment
#636 - DS Chat Step 3 - Fix Zero Stage 3
Pull Request -
State: closed - Opened by lekurile over 1 year ago
#615 - NCCL backend in DeepSpeed not yet implemented
Issue -
State: closed - Opened by David-Lee-1990 over 1 year ago
- 5 comments
#615 - NCCL backend in DeepSpeed not yet implemented
Issue -
State: closed - Opened by David-Lee-1990 over 1 year ago
- 5 comments
#587 - RuntimeError: The size of tensor a (6144) must match the size of tensor b (8192) at non-singleton dimension 0
Issue -
State: closed - Opened by gouchangjiang over 1 year ago
- 6 comments
Labels: bug, deespeed chat
#587 - RuntimeError: The size of tensor a (6144) must match the size of tensor b (8192) at non-singleton dimension 0
Issue -
State: closed - Opened by gouchangjiang over 1 year ago
- 6 comments
Labels: bug, deespeed chat
#532 - Hyper-param tuning for PPO
Issue -
State: open - Opened by luzai over 1 year ago
#531 - Error: Current loss scale already at minimum - cannot decrease scale anymore
Issue -
State: open - Opened by GenVr over 1 year ago
#530 - When running step3, the error "CUDA error: misaligned address"?
Issue -
State: open - Opened by EircYangQiXin over 1 year ago
#529 - Much more memory used in step 3 when using multi gpus compared to using single gpu
Issue -
State: open - Opened by cokuehuang over 1 year ago
- 5 comments
Labels: deespeed chat, system, llama
#528 - Rewards in ppo seem to be recomputed many times
Issue -
State: open - Opened by dwyzzy over 1 year ago
Labels: deespeed chat, modeling
#527 - Step2 reward model 'chosen_last_scores' is really low, while acc is really high
Issue -
State: open - Opened by LuciusMos over 1 year ago
- 1 comment
Labels: question, deespeed chat, modeling
#526 - Step 3 issue - TypeError for * : float and NoneType for DeepSpeedChat while generating experience.
Issue -
State: open - Opened by Ankush2k over 1 year ago
- 1 comment
Labels: deespeed chat, new-config, modeling
#525 - [bug]AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group'
Issue -
State: open - Opened by qingchu123 over 1 year ago
- 4 comments
Labels: bug, deespeed chat, hybrid engine
#524 - RuntimeError: Error building extension 'transformer_inference'
Issue -
State: closed - Opened by li995495592 over 1 year ago
- 4 comments
Labels: deespeed chat
#523 - Lm workaround
Pull Request -
State: closed - Opened by yaozhewei over 1 year ago
#523 - Lm workaround
Pull Request -
State: closed - Opened by yaozhewei over 1 year ago
#522 - Training gpt-neo-1.3B on GPU
Issue -
State: open - Opened by GenVr over 1 year ago
- 2 comments
Labels: deespeed chat, system
#521 - OOM problem when fine-tune reward model with LLaMA in step 2
Issue -
State: open - Opened by kiseliu over 1 year ago
- 1 comment
Labels: deespeed chat, llama
#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by EthenZhang over 1 year ago
- 1 comment
#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by EthenZhang over 1 year ago
- 1 comment
#519 - fix spelling mistakes & bash script
Pull Request -
State: open - Opened by QixuanAI over 1 year ago
- 2 comments
#518 - RLHF model return '{: {: {:' of every input
Issue -
State: open - Opened by kuangdao over 1 year ago
- 1 comment
Labels: deespeed chat, modeling
#517 - Sequence truncation mistake in step3 training
Issue -
State: open - Opened by puyuanOT over 1 year ago
Labels: deespeed chat, modeling
#517 - Sequence truncation mistake in step3 training
Issue -
State: open - Opened by puyuanOT over 1 year ago
Labels: deespeed chat, modeling
#516 - no GPU resources available
Issue -
State: closed - Opened by wuchaooooo over 1 year ago
- 1 comment
Labels: deespeed chat
#516 - no GPU resources available
Issue -
State: closed - Opened by wuchaooooo over 1 year ago
- 1 comment
Labels: deespeed chat
#515 - RuntimeError: Error building extension 'transformer_inference'
Issue -
State: closed - Opened by wwh5441 over 1 year ago
- 1 comment
#514 - Does the framework support ChatGLM now?
Issue -
State: open - Opened by MAJIN123 over 1 year ago
- 2 comments
Labels: deespeed chat, modeling
#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig
Issue -
State: closed - Opened by Chtholly1 over 1 year ago
- 5 comments
#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig
Issue -
State: closed - Opened by Chtholly1 over 1 year ago
- 5 comments
#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight
Issue -
State: open - Opened by KyrieXu11 over 1 year ago
- 6 comments
Labels: deespeed chat, llama
#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight
Issue -
State: open - Opened by KyrieXu11 over 1 year ago
- 6 comments
Labels: deespeed chat, llama
#511 - DeepSpeed-Chat cannot load models from local file?
Issue -
State: open - Opened by MianWang123 over 1 year ago
Labels: deespeed chat, new-config
#511 - DeepSpeed-Chat cannot load models from local file?
Issue -
State: open - Opened by MianWang123 over 1 year ago
Labels: deespeed chat, new-config
#510 - How to save the model after each epoch
Issue -
State: open - Opened by nieallen over 1 year ago
- 1 comment
Labels: deespeed chat
#510 - How to save the model after each epoch
Issue -
State: open - Opened by nieallen over 1 year ago
- 1 comment
Labels: deespeed chat
#509 - NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by yangzhipeng1108 over 1 year ago
- 1 comment
#509 - NotImplementedError: Cannot copy out of meta tensor; no data!
Issue -
State: closed - Opened by yangzhipeng1108 over 1 year ago
- 1 comment
#508 - Is there any Deepspeed Inference PTQ Example?
Issue -
State: open - Opened by tingshua-yts over 1 year ago
Labels: question, deespeed chat
#508 - Is there any Deepspeed Inference PTQ Example?
Issue -
State: open - Opened by tingshua-yts over 1 year ago
Labels: question, deespeed chat
#507 - Error for run_chinese.sh of step1, other_language
Issue -
State: closed - Opened by korlin0110 over 1 year ago
- 3 comments
Labels: deespeed chat
#507 - Error for run_chinese.sh of step1, other_language
Issue -
State: closed - Opened by korlin0110 over 1 year ago
- 3 comments
Labels: deespeed chat
#506 - In Step3: RuntimeError: numel: integer multiplication overflow
Issue -
State: open - Opened by 480284856 over 1 year ago
Labels: deespeed chat, new-config, modeling
#506 - In Step3: RuntimeError: numel: integer multiplication overflow
Issue -
State: open - Opened by 480284856 over 1 year ago
Labels: deespeed chat, new-config, modeling
#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model
Issue -
State: open - Opened by korlin0110 over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model
Issue -
State: open - Opened by korlin0110 over 1 year ago
- 2 comments
Labels: deespeed chat, new-config
#504 - merge master
Pull Request -
State: closed - Opened by yaozhewei over 1 year ago
#503 - Might be a bug of hibrid engine : In Step3 wrong generation secquence when hibrid engine is enabled.
Issue -
State: open - Opened by laoda513 over 1 year ago
- 7 comments
Labels: deespeed chat, hybrid engine
#502 - Does the 1.3B model support multiple rounds of dialogue?
Issue -
State: open - Opened by tensorflowt over 1 year ago
Labels: deespeed chat
#501 - Fixing the numerical instability when calculating the loss of the cri…
Pull Request -
State: closed - Opened by minjiaz over 1 year ago
#498 - CPU OOM in training of step3
Issue -
State: open - Opened by cokuehuang over 1 year ago
- 2 comments
Labels: deespeed chat, system
#498 - CPU OOM in training of step3
Issue -
State: open - Opened by cokuehuang over 1 year ago
- 2 comments
Labels: deespeed chat, system
#497 - deepspeed hybrid-engine support bloom model with zero3?
Issue -
State: open - Opened by null-test-7 over 1 year ago
- 1 comment
Labels: deespeed chat, new-config
#496 - What is the purpose of end_of_conversation_token="<|endoftext|> and why it is not added as special token?
Issue -
State: open - Opened by REIGN12 over 1 year ago
- 1 comment
Labels: question, deespeed chat