microsoft/DeepSpeedExamples issues and pull requests

#868 - Update Inference Benchmarking Scripts - Support AML

Pull Request - State: closed - Opened by lekurile 9 months ago - 1 comment

#868 - Update Inference Benchmarking Scripts - Support AML

Pull Request - State: closed - Opened by lekurile 9 months ago - 1 comment

#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

Issue - State: open - Opened by allanj 9 months ago - 3 comments

#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

Issue - State: open - Opened by allanj 9 months ago - 3 comments

#866 - [BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?

Issue - State: open - Opened by foin6 9 months ago - 2 comments

#866 - [BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?

Issue - State: open - Opened by foin6 9 months ago - 2 comments

#865 - Extend FastGen benchmark to use AML endpoints

Pull Request - State: closed - Opened by mrwyattii 9 months ago

#865 - Extend FastGen benchmark to use AML endpoints

Pull Request - State: closed - Opened by mrwyattii 9 months ago

#864 - zero3 and enable hybrid engine are not suitable for llama2, how to solve it?

Issue - State: open - Opened by terence1023 9 months ago - 3 comments

#863 - <fill-mask>Modify codes so that different accelerators can be called according to specific device conditions

Pull Request - State: closed - Opened by foin6 9 months ago - 1 comment

#863 - <fill-mask>Modify codes so that different accelerators can be called according to specific device conditions

Pull Request - State: closed - Opened by foin6 9 months ago - 1 comment

#862 - Fix path in human-eval example README

Pull Request - State: closed - Opened by lekurile 9 months ago

#862 - Fix path in human-eval example README

Pull Request - State: closed - Opened by lekurile 9 months ago

#861 - RLHF problems when using Qwen model

Issue - State: open - Opened by 128Ghe980 9 months ago - 1 comment

#861 - RLHF problems when using Qwen model

Issue - State: open - Opened by 128Ghe980 9 months ago - 1 comment

#860 - Codellama finetune

Issue - State: open - Opened by nani1149 9 months ago

#860 - Codellama finetune

Issue - State: open - Opened by nani1149 9 months ago

#859 - Different accelerators can be called according to specific device conditions

Pull Request - State: closed - Opened by foin6 10 months ago

#858 - Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

Issue - State: open - Opened by goelayu 10 months ago

#858 - Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

Issue - State: open - Opened by goelayu 10 months ago

#856 - Add Human Eval Example

Pull Request - State: closed - Opened by lekurile 10 months ago

#856 - Add Human Eval Example

Pull Request - State: closed - Opened by lekurile 10 months ago

#853 - Control the kernel injection with new argument. And compare the outputs only on rank 0

Pull Request - State: closed - Opened by foin6 11 months ago - 6 comments

#853 - Control the kernel injection with new argument. And compare the outputs only on rank 0

Pull Request - State: closed - Opened by foin6 11 months ago - 6 comments

#846 - Enable overlap_comm for better performance

Pull Request - State: closed - Opened by li-plus 11 months ago

#834 - fix: don't add eot token if add_eot_token knob is False

Pull Request - State: closed - Opened by EeyoreLee 11 months ago

#828 - Add DPO support for DeepSpeed-Chat

Pull Request - State: open - Opened by stceum 12 months ago - 1 comment

#828 - Add DPO support for DeepSpeed-Chat

Pull Request - State: open - Opened by stceum 12 months ago - 1 comment

#821 - [BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled

Issue - State: open - Opened by GeekDream-x 12 months ago - 9 comments

#821 - [BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled

Issue - State: open - Opened by GeekDream-x 12 months ago - 8 comments

#819 - Fix labels & eos_token for SFT

Pull Request - State: closed - Opened by li-plus 12 months ago - 4 comments

#796 - Question about loading Dahous dataset from local path.

Issue - State: open - Opened by Zhutianyi7230 about 1 year ago - 9 comments

#795 - Unable to install deepspeed

Issue - State: closed - Opened by cainiaobibi about 1 year ago - 5 comments

#791 - How to save memory during inference

Issue - State: open - Opened by Kangkang625 about 1 year ago - 1 comment

#791 - How to save memory during inference

Issue - State: open - Opened by Kangkang625 about 1 year ago - 1 comment

#786 - step 3 "run_6.7b_lora.sh" doesn't work with a100 80GB single gpu.

Issue - State: open - Opened by sophus1004 about 1 year ago - 1 comment

#786 - step 3 "run_6.7b_lora.sh" doesn't work with a100 80GB single gpu.

Issue - State: open - Opened by sophus1004 about 1 year ago - 1 comment

#780 - deepspeed-chat: print mean stage1/2 loss periodically

Pull Request - State: closed - Opened by mosheisland about 1 year ago - 3 comments

#774 - Wrong import in inference quantization example

Issue - State: open - Opened by Epliz about 1 year ago - 1 comment

#774 - Wrong import in inference quantization example

Issue - State: open - Opened by Epliz about 1 year ago - 1 comment

#672 - Potential Bugs: `ends` in `ppo_trainer.py`

Pull Request - State: closed - Opened by ZHZisZZ over 1 year ago - 2 comments

#672 - Potential Bugs: `ends` in `ppo_trainer.py`

Pull Request - State: closed - Opened by ZHZisZZ over 1 year ago - 2 comments

#652 - changed step 3 scripts

Pull Request - State: closed - Opened by askxiaozhang over 1 year ago

#639 - ModuleNotFoundError: No module named 'utils.data'

Issue - State: open - Opened by xtu-xiaoc over 1 year ago - 3 comments

#637 - Step3 Is padding side right or not?

Issue - State: open - Opened by AaronKemon over 1 year ago - 1 comment

#637 - Step3 Is padding side right or not?

Issue - State: open - Opened by AaronKemon over 1 year ago - 1 comment

#636 - DS Chat Step 3 - Fix Zero Stage 3

Pull Request - State: closed - Opened by lekurile over 1 year ago

#615 - NCCL backend in DeepSpeed not yet implemented

Issue - State: closed - Opened by David-Lee-1990 over 1 year ago - 5 comments

#615 - NCCL backend in DeepSpeed not yet implemented

Issue - State: closed - Opened by David-Lee-1990 over 1 year ago - 5 comments

#587 - RuntimeError: The size of tensor a (6144) must match the size of tensor b (8192) at non-singleton dimension 0

Issue - State: closed - Opened by gouchangjiang over 1 year ago - 6 comments
Labels: bug, deespeed chat

#587 - RuntimeError: The size of tensor a (6144) must match the size of tensor b (8192) at non-singleton dimension 0

Issue - State: closed - Opened by gouchangjiang over 1 year ago - 6 comments
Labels: bug, deespeed chat

#532 - Hyper-param tuning for PPO

Issue - State: open - Opened by luzai over 1 year ago

#531 - Error: Current loss scale already at minimum - cannot decrease scale anymore

Issue - State: open - Opened by GenVr over 1 year ago

#530 - When running step3, the error "CUDA error: misaligned address"?

Issue - State: open - Opened by EircYangQiXin over 1 year ago

#529 - Much more memory used in step 3 when using multi gpus compared to using single gpu

Issue - State: open - Opened by cokuehuang over 1 year ago - 5 comments
Labels: deespeed chat, system, llama

#528 - Rewards in ppo seem to be recomputed many times

Issue - State: open - Opened by dwyzzy over 1 year ago
Labels: deespeed chat, modeling

#527 - Step2 reward model 'chosen_last_scores' is really low, while acc is really high

Issue - State: open - Opened by LuciusMos over 1 year ago - 1 comment
Labels: question, deespeed chat, modeling

#526 - Step 3 issue - TypeError for * : float and NoneType for DeepSpeedChat while generating experience.

Issue - State: open - Opened by Ankush2k over 1 year ago - 1 comment
Labels: deespeed chat, new-config, modeling

#525 - [bug]AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group'

Issue - State: open - Opened by qingchu123 over 1 year ago - 4 comments
Labels: bug, deespeed chat, hybrid engine

#524 - RuntimeError: Error building extension 'transformer_inference'

Issue - State: closed - Opened by li995495592 over 1 year ago - 4 comments
Labels: deespeed chat

#523 - Lm workaround

Pull Request - State: closed - Opened by yaozhewei over 1 year ago

#523 - Lm workaround

Pull Request - State: closed - Opened by yaozhewei over 1 year ago

#522 - Training gpt-neo-1.3B on GPU

Issue - State: open - Opened by GenVr over 1 year ago - 2 comments
Labels: deespeed chat, system

#521 - OOM problem when fine-tune reward model with LLaMA in step 2

Issue - State: open - Opened by kiseliu over 1 year ago - 1 comment
Labels: deespeed chat, llama

#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!

Issue - State: closed - Opened by EthenZhang over 1 year ago - 1 comment

#520 - zero stage 3 error :NotImplementedError: Cannot copy out of meta tensor; no data!

Issue - State: closed - Opened by EthenZhang over 1 year ago - 1 comment

#519 - fix spelling mistakes & bash script

Pull Request - State: open - Opened by QixuanAI over 1 year ago - 2 comments

#518 - RLHF model return '{: {: {:' of every input

Issue - State: open - Opened by kuangdao over 1 year ago - 1 comment
Labels: deespeed chat, modeling

#517 - Sequence truncation mistake in step3 training

Issue - State: open - Opened by puyuanOT over 1 year ago
Labels: deespeed chat, modeling

#517 - Sequence truncation mistake in step3 training

Issue - State: open - Opened by puyuanOT over 1 year ago
Labels: deespeed chat, modeling

#516 - no GPU resources available

Issue - State: closed - Opened by wuchaooooo over 1 year ago - 1 comment
Labels: deespeed chat

#516 - no GPU resources available

Issue - State: closed - Opened by wuchaooooo over 1 year ago - 1 comment
Labels: deespeed chat

#515 - RuntimeError: Error building extension 'transformer_inference'

Issue - State: closed - Opened by wwh5441 over 1 year ago - 1 comment

#514 - Does the framework support ChatGLM now?

Issue - State: open - Opened by MAJIN123 over 1 year ago - 2 comments
Labels: deespeed chat, modeling

#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig

Issue - State: closed - Opened by Chtholly1 over 1 year ago - 5 comments

#513 - ValidationError: 1 validation error for DeepSpeedZeroConfig

Issue - State: closed - Opened by Chtholly1 over 1 year ago - 5 comments

#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight

Issue - State: open - Opened by KyrieXu11 over 1 year ago - 6 comments
Labels: deespeed chat, llama

#512 - In Step3, RuntimeError:RewardModel:size mismatch for rwtranrsformer.decoder.embed_tokens.weight

Issue - State: open - Opened by KyrieXu11 over 1 year ago - 6 comments
Labels: deespeed chat, llama

#511 - DeepSpeed-Chat cannot load models from local file?

Issue - State: open - Opened by MianWang123 over 1 year ago
Labels: deespeed chat, new-config

#511 - DeepSpeed-Chat cannot load models from local file?

Issue - State: open - Opened by MianWang123 over 1 year ago
Labels: deespeed chat, new-config

#510 - How to save the model after each epoch

Issue - State: open - Opened by nieallen over 1 year ago - 1 comment
Labels: deespeed chat

#510 - How to save the model after each epoch

Issue - State: open - Opened by nieallen over 1 year ago - 1 comment
Labels: deespeed chat

#509 - NotImplementedError: Cannot copy out of meta tensor; no data!

Issue - State: closed - Opened by yangzhipeng1108 over 1 year ago - 1 comment

#509 - NotImplementedError: Cannot copy out of meta tensor; no data!

Issue - State: closed - Opened by yangzhipeng1108 over 1 year ago - 1 comment

#508 - Is there any Deepspeed Inference PTQ Example?

Issue - State: open - Opened by tingshua-yts over 1 year ago
Labels: question, deespeed chat

#508 - Is there any Deepspeed Inference PTQ Example?

Issue - State: open - Opened by tingshua-yts over 1 year ago
Labels: question, deespeed chat

#507 - Error for run_chinese.sh of step1, other_language

Issue - State: closed - Opened by korlin0110 over 1 year ago - 3 comments
Labels: deespeed chat

#507 - Error for run_chinese.sh of step1, other_language

Issue - State: closed - Opened by korlin0110 over 1 year ago - 3 comments
Labels: deespeed chat

#506 - In Step3: RuntimeError: numel: integer multiplication overflow

Issue - State: open - Opened by 480284856 over 1 year ago
Labels: deespeed chat, new-config, modeling

#506 - In Step3: RuntimeError: numel: integer multiplication overflow

Issue - State: open - Opened by 480284856 over 1 year ago
Labels: deespeed chat, new-config, modeling

#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model

Issue - State: open - Opened by korlin0110 over 1 year ago - 2 comments
Labels: deespeed chat, new-config

#505 - Eextension of the issus #479, chatbot.py cannot load the bloom model

Issue - State: open - Opened by korlin0110 over 1 year ago - 2 comments
Labels: deespeed chat, new-config

#504 - merge master

Pull Request - State: closed - Opened by yaozhewei over 1 year ago

#503 - Might be a bug of hibrid engine : In Step3 wrong generation secquence when hibrid engine is enabled.

Issue - State: open - Opened by laoda513 over 1 year ago - 7 comments
Labels: deespeed chat, hybrid engine

#502 - Does the 1.3B model support multiple rounds of dialogue?

Issue - State: open - Opened by tensorflowt over 1 year ago
Labels: deespeed chat

#501 - Fixing the numerical instability when calculating the loss of the cri…

Pull Request - State: closed - Opened by minjiaz over 1 year ago

#498 - CPU OOM in training of step3

Issue - State: open - Opened by cokuehuang over 1 year ago - 2 comments
Labels: deespeed chat, system

#498 - CPU OOM in training of step3

Issue - State: open - Opened by cokuehuang over 1 year ago - 2 comments
Labels: deespeed chat, system

#497 - deepspeed hybrid-engine support bloom model with zero3?

Issue - State: open - Opened by null-test-7 over 1 year ago - 1 comment
Labels: deespeed chat, new-config

#496 - What is the purpose of end_of_conversation_token="<|endoftext|> and why it is not added as special token?

Issue - State: open - Opened by REIGN12 over 1 year ago - 1 comment
Labels: question, deespeed chat

GitHub / microsoft/DeepSpeedExamples issues and pull requests