microsoft/DeepSpeedExamples issues and pull requests

#934 - No module named 'transformers.deepspeed'

Issue - State: open - Opened by TianyuJIAA 27 days ago - 1 comment

#933 - Fixed mistake in readme

Pull Request - State: open - Opened by SCheekati about 1 month ago

#932 - Does DeepSpeed's Pipeline-Parallelism optimizer supports skip connections?

Issue - State: open - Opened by RoyMahlab about 1 month ago

#931 - [cifar ds training]: Set cuda device during initialization of distributed backend.

Pull Request - State: open - Opened by jagadish-amd about 1 month ago - 2 comments

#930 - Εnable reward model offloading option

Pull Request - State: open - Opened by kfertakis about 2 months ago - 2 comments

#929 - Deepspeed-Domino

Pull Request - State: open - Opened by zhangsmallshark 2 months ago - 1 comment

#928 - After using steps 1, 2, and 3, the test reply content only replies Assistant: </s>。

Issue - State: closed - Opened by jianmomo 2 months ago

#927 - Remove the fixed `eot_token` mechanism for SFT

Pull Request - State: open - Opened by Xingfu-Yi 2 months ago - 1 comment

#925 - Update requirements for opencv-python CVE

Pull Request - State: closed - Opened by loadams 3 months ago

#924 - AttributeError： 'DeepSpeedEngine' object has no attribute 'model'，

Issue - State: open - Opened by lovychen 3 months ago - 1 comment

#923 - How to calculate training efficiency ,i.e tokens/sec of step 1 fine tuning of llama2 model ?

Issue - State: open - Opened by sowmya04101998 3 months ago

#922 - Actor loss nan and Resizing model embedding

Issue - State: open - Opened by ouyanmei 3 months ago - 1 comment

#921 - DeepNVMe ZeRO-inf Tutorial

Pull Request - State: closed - Opened by jomayeri 3 months ago

#920 - FileNotFoundError: [Errno 2] No such file or directory: 'numactl'

Issue - State: open - Opened by zhiwentian 3 months ago - 4 comments

#919 - DeepNVMe README.md add xref

Pull Request - State: closed - Opened by stas00 3 months ago

#916 - Update README.md

Pull Request - State: closed - Opened by keshavkowshik 3 months ago

#916 - Update README.md

Pull Request - State: closed - Opened by keshavkowshik 3 months ago

#915 - step2 without any response for a long time

Issue - State: open - Opened by asfadfaf 3 months ago

#915 - step2 without any response for a long time

Issue - State: open - Opened by asfadfaf 3 months ago

#914 - DeepNVMe example scripts

Pull Request - State: closed - Opened by tjruwase 3 months ago

#913 - Add openai client to deepspeedometer

Pull Request - State: closed - Opened by delock 4 months ago - 2 comments

#912 - Different zero stage the training memory compute

Issue - State: open - Opened by Arcmoon-Hu 4 months ago

#912 - Different zero stage the training memory compute

Issue - State: open - Opened by Arcmoon-Hu 4 months ago

#911 - nvcc fatal : Unsupported gpu architecture 'compute_86' and nvcc fatal : Value 'c++17' is not defined for option 'std'

Issue - State: closed - Opened by Xccanxin 4 months ago - 1 comment

#911 - nvcc fatal : Unsupported gpu architecture 'compute_86' and nvcc fatal : Value 'c++17' is not defined for option 'std'

Issue - State: closed - Opened by Xccanxin 4 months ago - 1 comment

#910 - How to start deepspeed automatically?

Issue - State: closed - Opened by qwerfdsadad 5 months ago - 2 comments

#909 - Consult the first phase.

Issue - State: closed - Opened by csxrzhang 5 months ago - 2 comments

#909 - Consult the first phase.

Issue - State: closed - Opened by csxrzhang 5 months ago - 2 comments

#908 - an error with gradient checkpointing in DeepspeedChat step 3

Issue - State: open - Opened by wangyuwen1999 5 months ago

#908 - an error with gradient checkpointing in DeepspeedChat step 3

Issue - State: open - Opened by wangyuwen1999 5 months ago

#907 - 单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错

Issue - State: open - Opened by Dakai798 5 months ago - 1 comment

#907 - 单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错

Issue - State: open - Opened by Dakai798 5 months ago - 1 comment

#906 - DeepSpeed-Chat step-1 hanging for a long time

Issue - State: open - Opened by lemon-little 5 months ago

#906 - DeepSpeed-Chat step-1 hanging for a long time

Issue - State: open - Opened by lemon-little 5 months ago

#905 - Enable cpu/xpu support for the benchmarking suite

Pull Request - State: closed - Opened by louie-tsai 6 months ago - 8 comments

#905 - Enable cpu/xpu support for the benchmarking suite

Pull Request - State: closed - Opened by louie-tsai 6 months ago - 8 comments

#904 - CPU OOM when inferencing Llama3-70B-Chinese-Chat

Issue - State: open - Opened by GORGEOUSLCX 6 months ago

#903 - cannot pickle 'Stream' object

Issue - State: open - Opened by teis-e 6 months ago

#903 - cannot pickle 'Stream' object

Issue - State: open - Opened by teis-e 6 months ago

#902 - can not run the test-gpt.sh because of assertionError

Issue - State: open - Opened by leachee99 6 months ago

#901 - 请问fastgen 是否支持长文本和序列并行推理

Issue - State: open - Opened by AceCoder0 6 months ago

#901 - 请问fastgen 是否支持长文本和序列并行推理

Issue - State: open - Opened by AceCoder0 6 months ago

#900 - Add --client-only arg to mii benchmark

Pull Request - State: closed - Opened by delock 7 months ago

#900 - Add --client-only arg to mii benchmark

Pull Request - State: closed - Opened by delock 7 months ago

#899 - Refactored LLM benchmark code

Pull Request - State: closed - Opened by mrwyattii 7 months ago

#899 - Refactored LLM benchmark code

Pull Request - State: closed - Opened by mrwyattii 7 months ago

#898 - fix bug with queue.empty not being reliable

Pull Request - State: closed - Opened by mrwyattii 7 months ago

#897 - Update tokens_per_sec calculation to work w/ stream and non-stream cases

Pull Request - State: closed - Opened by lekurile 7 months ago

#897 - Update tokens_per_sec calculation to work w/ stream and non-stream cases

Pull Request - State: closed - Opened by lekurile 7 months ago

#896 - run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely

Issue - State: closed - Opened by awan-10 7 months ago - 11 comments

#895 - updating tokens per second to include the token count of generated tokens.

Pull Request - State: closed - Opened by guptha23 7 months ago

#895 - updating tokens per second to include the token count of generated tokens.

Pull Request - State: closed - Opened by guptha23 7 months ago

#894 - [Error] AutoTune: `connect to host localhost port 22: Connection refused`

Issue - State: open - Opened by wqw547243068 7 months ago

#894 - [Error] AutoTune: `connect to host localhost port 22: Connection refused`

Issue - State: open - Opened by wqw547243068 7 months ago

#893 - How to use deepspeed for multi-node and multi-card task in slurm cluster

Issue - State: open - Opened by dshwei 7 months ago

#893 - How to use deepspeed for multi-node and multi-card task in slurm cluster

Issue - State: open - Opened by dshwei 7 months ago

#892 - Does Zero-Inference support TP?

Issue - State: open - Opened by preminstrel 7 months ago - 11 comments

#892 - Does Zero-Inference support TP?

Issue - State: open - Opened by preminstrel 7 months ago - 11 comments

#891 - extend max_prompt_length and input text for 128k evaluation

Pull Request - State: closed - Opened by HeyangQin 7 months ago

#890 - Deepspeed support finetune extra model with lora ?

Issue - State: open - Opened by wanghongqu 7 months ago - 1 comment

#890 - Deepspeed support finetune extra model with lora ?

Issue - State: open - Opened by wanghongqu 7 months ago - 1 comment

#889 - 不同机器上python环境变量路径不同，deepspeed启动后发现找不到其他机器的python环境，如何解决

Issue - State: closed - Opened by liqwertyu 7 months ago

#888 - when calculating actor loss, why the mask is "action_mask[:, start: ] "

Issue - State: closed - Opened by fancghit 8 months ago

#888 - when calculating actor loss, why the mask is "action_mask[:, start: ] "

Issue - State: closed - Opened by fancghit 8 months ago

#887 - The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled

Issue - State: open - Opened by mousewu 8 months ago - 1 comment

#887 - The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled

Issue - State: open - Opened by mousewu 8 months ago - 1 comment

#886 - About multiple-thread attention computation on CPU using zero-inference example.

Issue - State: open - Opened by luckyq 8 months ago

#886 - About multiple-thread attention computation on CPU using zero-inference example.

Issue - State: open - Opened by luckyq 8 months ago

#885 - Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)

Issue - State: open - Opened by wenbozhangjs 8 months ago

#885 - Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)

Issue - State: open - Opened by wenbozhangjs 8 months ago

#884 - [REQUEST] More fine-grained distributed strategies for RLHF training

Issue - State: open - Opened by youshaox 8 months ago

#884 - [REQUEST] More fine-grained distributed strategies for RLHF training

Issue - State: open - Opened by youshaox 8 months ago

#883 - The reward value did not increase.

Issue - State: open - Opened by Sun-Shiqi 8 months ago - 1 comment

#883 - The reward value did not increase.

Issue - State: open - Opened by Sun-Shiqi 8 months ago - 1 comment

#882 - Fix response check in call_aml function

Pull Request - State: closed - Opened by HeyangQin 8 months ago

#881 - Update throughput-latency plot script

Pull Request - State: closed - Opened by lekurile 8 months ago

#880 - [Inference Benchmark] set `num_requests` based on `num_clients`

Pull Request - State: closed - Opened by mrwyattii 8 months ago

#879 - Confusion about Deepspeed Inference

Issue - State: open - Opened by ZekaiGalaxy 8 months ago - 1 comment

#879 - Confusion about Deepspeed Inference

Issue - State: open - Opened by ZekaiGalaxy 8 months ago - 1 comment

#878 - `AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed

Issue - State: open - Opened by htjain 8 months ago

#878 - `AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed

Issue - State: open - Opened by htjain 8 months ago

#876 - [inference benchmark] update AML kwargs to match vLLM kwargs

Pull Request - State: closed - Opened by mrwyattii 8 months ago

#876 - [inference benchmark] update AML kwargs to match vLLM kwargs

Pull Request - State: closed - Opened by mrwyattii 8 months ago

#875 - Improve robustness of infernece AML benchmark

Pull Request - State: closed - Opened by HeyangQin 8 months ago

#875 - Improve robustness of infernece AML benchmark

Pull Request - State: closed - Opened by HeyangQin 8 months ago

#874 - Fix AML benchmark E2E measurment

Pull Request - State: closed - Opened by mrwyattii 8 months ago

#873 - Add LoRA optimization to the SD training example

Pull Request - State: open - Opened by PareesaMS 9 months ago

#873 - Add LoRA optimization to the SD training example

Pull Request - State: open - Opened by PareesaMS 9 months ago

#872 - Replace deprecated transformers.deepspeed module

Pull Request - State: open - Opened by HollowMan6 9 months ago

#872 - Replace deprecated transformers.deepspeed module

Pull Request - State: open - Opened by HollowMan6 9 months ago

#871 - Xiaoxia/fp v1

Pull Request - State: closed - Opened by xiaoxiawu-microsoft 9 months ago

#871 - Xiaoxia/fp v1

Pull Request - State: closed - Opened by xiaoxiawu-microsoft 9 months ago

#870 - Remove AML key from args dict when saving results

Pull Request - State: closed - Opened by lekurile 9 months ago

#870 - Remove AML key from args dict when saving results

Pull Request - State: closed - Opened by lekurile 9 months ago

#869 - Inference Benchmark: Catch AML error response

Pull Request - State: closed - Opened by mrwyattii 9 months ago

#869 - Inference Benchmark: Catch AML error response

Pull Request - State: closed - Opened by mrwyattii 9 months ago

#868 - Update Inference Benchmarking Scripts - Support AML

Pull Request - State: closed - Opened by lekurile 9 months ago - 1 comment

#868 - Update Inference Benchmarking Scripts - Support AML

Pull Request - State: closed - Opened by lekurile 9 months ago - 1 comment

#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

Issue - State: open - Opened by allanj 9 months ago - 3 comments

#867 - [Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

Issue - State: open - Opened by allanj 9 months ago - 3 comments

GitHub / microsoft/DeepSpeedExamples issues and pull requests