Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/DeepSpeedExamples issues and pull requests
#954 - fix: the json format of the training imagenet configuration file
Pull Request -
State: open - Opened by muskonu 24 days ago
#953 - Cleanup CODEOWNERS
Pull Request -
State: closed - Opened by loadams 25 days ago
#952 - Mosm/torch profile
Pull Request -
State: closed - Opened by Dharshan-SK about 1 month ago
#951 - Is there any example about DeepSpeed Zero with Ulysses/Ulysses-offload
Issue -
State: open - Opened by LSC527 about 1 month ago
#950 - Domino + PP
Issue -
State: open - Opened by XZQshiyu about 1 month ago
#949 - Update references to torchvision
Pull Request -
State: closed - Opened by loadams about 1 month ago
#948 - Error when running training example DeepSpeed-Domino/pretrain_gpt3_2.7b.sh
Issue -
State: closed - Opened by ZhiyiHu1999 about 2 months ago
- 1 comment
#947 - remove-redundant-code
Pull Request -
State: closed - Opened by simonJJJ 2 months ago
#946 - Assertion `srcIndex < srcSelectDimSize` failed
Issue -
State: open - Opened by boqiny 2 months ago
- 1 comment
#945 - add checkpoint
Pull Request -
State: open - Opened by zhangsmallshark 2 months ago
- 1 comment
#944 - Question to attention computation
Issue -
State: open - Opened by yuzhenmao 2 months ago
#943 - KV_cache offload
Issue -
State: open - Opened by yuzhenmao 2 months ago
#942 - Example and benchmark of APIs to offload states
Pull Request -
State: closed - Opened by tohtana 2 months ago
#941 - A bug in argument parser.
Issue -
State: open - Opened by ChenDaiwei-99 2 months ago
#940 - Failed to run Domino example
Issue -
State: closed - Opened by lucifer1004 3 months ago
- 2 comments
#939 - Update DeepSpeed version requirement to >=0.16.0 for Domino
Pull Request -
State: closed - Opened by shenzheyu 3 months ago
#938 - Bump the pip group across 9 directories with 15 updates #3
Pull Request -
State: open - Opened by akaday 3 months ago
#937 - Bump the pip group across 2 directories with 1 update #2
Pull Request -
State: closed - Opened by akaday 3 months ago
- 1 comment
#936 - How can I change the master_port when using deepspeed for multi-GPU on single node, i.e. localhost
Issue -
State: open - Opened by lovedoubledan 3 months ago
- 4 comments
#935 - RuntimeError: CUDA error: no kernel image is available for execution on the device
Issue -
State: closed - Opened by mrpeerat 3 months ago
- 1 comment
#934 - No module named 'transformers.deepspeed'
Issue -
State: closed - Opened by TianyuJIAA 4 months ago
- 2 comments
#933 - Fixed mistake in readme
Pull Request -
State: closed - Opened by SCheekati 4 months ago
#932 - Does DeepSpeed's Pipeline-Parallelism optimizer supports skip connections?
Issue -
State: open - Opened by RoyMahlab 4 months ago
#931 - [cifar ds training]: Set cuda device during initialization of distributed backend.
Pull Request -
State: closed - Opened by jagadish-amd 4 months ago
- 3 comments
#930 - Εnable reward model offloading option
Pull Request -
State: closed - Opened by kfertakis 5 months ago
- 2 comments
#929 - Deepspeed-Domino
Pull Request -
State: closed - Opened by zhangsmallshark 5 months ago
- 3 comments
#928 - After using steps 1, 2, and 3, the test reply content only replies Assistant: </s>。
Issue -
State: closed - Opened by jianmomo 5 months ago
#927 - Remove the fixed `eot_token` mechanism for SFT
Pull Request -
State: closed - Opened by Xingfu-Yi 5 months ago
- 2 comments
#925 - Update requirements for opencv-python CVE
Pull Request -
State: closed - Opened by loadams 6 months ago
#924 - AttributeError: 'DeepSpeedEngine' object has no attribute 'model',
Issue -
State: closed - Opened by lovychen 6 months ago
- 1 comment
#923 - How to calculate training efficiency ,i.e tokens/sec of step 1 fine tuning of llama2 model ?
Issue -
State: open - Opened by sowmya04101998 6 months ago
#922 - Actor loss nan and Resizing model embedding
Issue -
State: open - Opened by ouyanmei 6 months ago
- 1 comment
#921 - DeepNVMe ZeRO-inf Tutorial
Pull Request -
State: closed - Opened by jomayeri 6 months ago
#920 - FileNotFoundError: [Errno 2] No such file or directory: 'numactl'
Issue -
State: closed - Opened by zhiwentian 6 months ago
- 6 comments
#919 - DeepNVMe README.md add xref
Pull Request -
State: closed - Opened by stas00 6 months ago
#916 - Update README.md
Pull Request -
State: closed - Opened by keshavkowshik 6 months ago
#916 - Update README.md
Pull Request -
State: closed - Opened by keshavkowshik 6 months ago
#915 - step2 without any response for a long time
Issue -
State: open - Opened by asfadfaf 6 months ago
#915 - step2 without any response for a long time
Issue -
State: open - Opened by asfadfaf 6 months ago
#914 - DeepNVMe example scripts
Pull Request -
State: closed - Opened by tjruwase 6 months ago
#913 - Add openai client to deepspeedometer
Pull Request -
State: closed - Opened by delock 6 months ago
- 2 comments
#912 - Different zero stage the training memory compute
Issue -
State: open - Opened by Arcmoon-Hu 7 months ago
#912 - Different zero stage the training memory compute
Issue -
State: open - Opened by Arcmoon-Hu 7 months ago
#911 - nvcc fatal : Unsupported gpu architecture 'compute_86' and nvcc fatal : Value 'c++17' is not defined for option 'std'
Issue -
State: closed - Opened by Xccanxin 7 months ago
- 1 comment
#911 - nvcc fatal : Unsupported gpu architecture 'compute_86' and nvcc fatal : Value 'c++17' is not defined for option 'std'
Issue -
State: closed - Opened by Xccanxin 7 months ago
- 1 comment
#910 - How to start deepspeed automatically?
Issue -
State: closed - Opened by qwerfdsadad 8 months ago
- 2 comments
#909 - Consult the first phase.
Issue -
State: closed - Opened by csxrzhang 8 months ago
- 2 comments
#909 - Consult the first phase.
Issue -
State: closed - Opened by csxrzhang 8 months ago
- 2 comments
#908 - an error with gradient checkpointing in DeepspeedChat step 3
Issue -
State: open - Opened by wangyuwen1999 8 months ago
#908 - an error with gradient checkpointing in DeepspeedChat step 3
Issue -
State: open - Opened by wangyuwen1999 8 months ago
#907 - 单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错
Issue -
State: open - Opened by Dakai798 8 months ago
- 1 comment
#907 - 单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错
Issue -
State: open - Opened by Dakai798 8 months ago
- 2 comments
#906 - DeepSpeed-Chat step-1 hanging for a long time
Issue -
State: open - Opened by lemon-little 8 months ago
#906 - DeepSpeed-Chat step-1 hanging for a long time
Issue -
State: open - Opened by lemon-little 8 months ago
#905 - Enable cpu/xpu support for the benchmarking suite
Pull Request -
State: closed - Opened by louie-tsai 9 months ago
- 8 comments
#905 - Enable cpu/xpu support for the benchmarking suite
Pull Request -
State: closed - Opened by louie-tsai 9 months ago
- 8 comments
#904 - CPU OOM when inferencing Llama3-70B-Chinese-Chat
Issue -
State: open - Opened by GORGEOUSLCX 9 months ago
#903 - cannot pickle 'Stream' object
Issue -
State: open - Opened by teis-e 9 months ago
#903 - cannot pickle 'Stream' object
Issue -
State: open - Opened by teis-e 9 months ago
#902 - can not run the test-gpt.sh because of assertionError
Issue -
State: open - Opened by leachee99 9 months ago
#901 - 请问fastgen 是否支持长文本和序列并行推理
Issue -
State: open - Opened by AceCoder0 9 months ago
#901 - 请问fastgen 是否支持长文本和序列并行推理
Issue -
State: open - Opened by AceCoder0 9 months ago
#900 - Add --client-only arg to mii benchmark
Pull Request -
State: closed - Opened by delock 10 months ago
#900 - Add --client-only arg to mii benchmark
Pull Request -
State: closed - Opened by delock 10 months ago
#899 - Refactored LLM benchmark code
Pull Request -
State: closed - Opened by mrwyattii 10 months ago
#899 - Refactored LLM benchmark code
Pull Request -
State: closed - Opened by mrwyattii 10 months ago
#898 - fix bug with queue.empty not being reliable
Pull Request -
State: closed - Opened by mrwyattii 10 months ago
#897 - Update tokens_per_sec calculation to work w/ stream and non-stream cases
Pull Request -
State: closed - Opened by lekurile 10 months ago
#897 - Update tokens_per_sec calculation to work w/ stream and non-stream cases
Pull Request -
State: closed - Opened by lekurile 10 months ago
#896 - run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely
Issue -
State: closed - Opened by awan-10 10 months ago
- 11 comments
#895 - updating tokens per second to include the token count of generated tokens.
Pull Request -
State: closed - Opened by guptha23 10 months ago
#895 - updating tokens per second to include the token count of generated tokens.
Pull Request -
State: closed - Opened by guptha23 10 months ago
#894 - [Error] AutoTune: `connect to host localhost port 22: Connection refused`
Issue -
State: open - Opened by wqw547243068 10 months ago
#894 - [Error] AutoTune: `connect to host localhost port 22: Connection refused`
Issue -
State: open - Opened by wqw547243068 10 months ago
#893 - How to use deepspeed for multi-node and multi-card task in slurm cluster
Issue -
State: open - Opened by dshwei 10 months ago
#893 - How to use deepspeed for multi-node and multi-card task in slurm cluster
Issue -
State: open - Opened by dshwei 10 months ago
#892 - Does Zero-Inference support TP?
Issue -
State: open - Opened by preminstrel 10 months ago
- 11 comments
#892 - Does Zero-Inference support TP?
Issue -
State: open - Opened by preminstrel 10 months ago
- 11 comments
#891 - extend max_prompt_length and input text for 128k evaluation
Pull Request -
State: closed - Opened by HeyangQin 10 months ago
#890 - Deepspeed support finetune extra model with lora ?
Issue -
State: open - Opened by wanghongqu 10 months ago
- 1 comment
#890 - Deepspeed support finetune extra model with lora ?
Issue -
State: open - Opened by wanghongqu 10 months ago
- 1 comment
#889 - 不同机器上python环境变量路径不同,deepspeed启动后发现找不到其他机器的python环境,如何解决
Issue -
State: closed - Opened by liqwertyu 10 months ago
#888 - when calculating actor loss, why the mask is "action_mask[:, start: ] "
Issue -
State: closed - Opened by fancghit 11 months ago
#888 - when calculating actor loss, why the mask is "action_mask[:, start: ] "
Issue -
State: closed - Opened by fancghit 11 months ago
#887 - The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
Issue -
State: open - Opened by mousewu 11 months ago
- 1 comment
#887 - The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
Issue -
State: open - Opened by mousewu 11 months ago
- 1 comment
#886 - About multiple-thread attention computation on CPU using zero-inference example.
Issue -
State: open - Opened by luckyq 11 months ago
#886 - About multiple-thread attention computation on CPU using zero-inference example.
Issue -
State: open - Opened by luckyq 11 months ago
#885 - Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
Issue -
State: open - Opened by wenbozhangjs 11 months ago
#885 - Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
Issue -
State: open - Opened by wenbozhangjs 11 months ago
#884 - [REQUEST] More fine-grained distributed strategies for RLHF training
Issue -
State: open - Opened by youshaox 11 months ago
#884 - [REQUEST] More fine-grained distributed strategies for RLHF training
Issue -
State: open - Opened by youshaox 11 months ago
#883 - The reward value did not increase.
Issue -
State: open - Opened by Sun-Shiqi 11 months ago
- 1 comment
#883 - The reward value did not increase.
Issue -
State: open - Opened by Sun-Shiqi 11 months ago
- 1 comment
#882 - Fix response check in call_aml function
Pull Request -
State: closed - Opened by HeyangQin 11 months ago
#881 - Update throughput-latency plot script
Pull Request -
State: closed - Opened by lekurile 11 months ago
#880 - [Inference Benchmark] set `num_requests` based on `num_clients`
Pull Request -
State: closed - Opened by mrwyattii 11 months ago
#879 - Confusion about Deepspeed Inference
Issue -
State: open - Opened by ZekaiGalaxy 11 months ago
- 1 comment
#879 - Confusion about Deepspeed Inference
Issue -
State: open - Opened by ZekaiGalaxy 11 months ago
- 1 comment
#878 - `AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed
Issue -
State: open - Opened by htjain 11 months ago