Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / laekov/fastmoe issues and pull requests
#214 - Detailed documentation about model parallelism
Issue -
State: open - Opened by ZSL98 3 months ago
#213 - smart Schedule中R操作没有和C操作重叠
Issue -
State: open - Opened by WhatBrain 4 months ago
- 5 comments
#212 - bash run_enwik8_base.sh train train --work_dir /dir/
Issue -
State: closed - Opened by WYCAS 4 months ago
#211 - how to run transformer-xl with parallel experts with single gpu?
Issue -
State: open - Opened by HudashiNeo 4 months ago
- 6 comments
#210 - Do We support DeepSpeed training? Thanks.
Issue -
State: open - Opened by lzl-mt 5 months ago
- 1 comment
#209 - 前向传播返回值缺少bal_loss
Issue -
State: open - Opened by tisgotos 5 months ago
- 2 comments
#208 - 您好,请问Megatron-LM的v2.2版本在哪里获取?
Issue -
State: closed - Opened by tisgotos 5 months ago
- 7 comments
#207 - 打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错
Issue -
State: open - Opened by WhatBrain 5 months ago
- 6 comments
#206 - No hiding output when using `pytest -s`
Pull Request -
State: closed - Opened by roastduck 8 months ago
#205 - Make the code neutral to device by removing `.cuda()`
Pull Request -
State: closed - Opened by roastduck 8 months ago
#204 - FasterMoE Shadow Policy: Detailed Inquiry
Issue -
State: closed - Opened by Guodanding 9 months ago
- 7 comments
#203 - Update readme-cn.md
Pull Request -
State: closed - Opened by HelloWorldLTY 9 months ago
#202 - DDP error
Issue -
State: closed - Opened by Peg-Wu 10 months ago
#201 - CUDA memory increases after each loss.backward()
Issue -
State: open - Opened by sreetamasarkar 10 months ago
- 6 comments
#200 - Update switch_gate.py
Pull Request -
State: closed - Opened by Heihaierr 11 months ago
#199 - A bug in switch_gate
Issue -
State: open - Opened by Heihaierr 11 months ago
- 6 comments
#198 - About switch_gate
Issue -
State: open - Opened by Heihaierr 11 months ago
- 1 comment
#197 - multi-node problem
Issue -
State: open - Opened by Qianshaowei 11 months ago
- 1 comment
#196 - Example to run Megatron
Issue -
State: open - Opened by Juanhui28 11 months ago
- 3 comments
#195 - [BUG] AttributeError: module 'fmoe_cuda' has no attribute 'assign_pos_'
Issue -
State: open - Opened by pangsg 11 months ago
- 3 comments
#195 - [BUG] AttributeError: module 'fmoe_cuda' has no attribute 'assign_pos_'
Issue -
State: open - Opened by pangsg 11 months ago
- 3 comments
#194 - 跑FMOE的时候提示cudaErrorInvalidDevice
Issue -
State: closed - Opened by pangsg 12 months ago
- 6 comments
#194 - 跑FMOE的时候提示cudaErrorInvalidDevice
Issue -
State: closed - Opened by pangsg 12 months ago
- 6 comments
#193 - fastmoe支持微调吗
Issue -
State: closed - Opened by PowerDispatch 12 months ago
#193 - fastmoe支持微调吗
Issue -
State: closed - Opened by PowerDispatch 12 months ago
#192 - fastmoe是否支持微调,page-attention,flasahattention和kvcache,混合精度等
Issue -
State: open - Opened by PowerDispatch 12 months ago
- 4 comments
#192 - fastmoe是否支持微调,page-attention,flasahattention和kvcache,混合精度等
Issue -
State: open - Opened by PowerDispatch 12 months ago
- 4 comments
#191 - 请问fastmoe能被集成到VLLM里吗
Issue -
State: open - Opened by pangsg 12 months ago
- 4 comments
#191 - 请问fastmoe能被集成到VLLM里吗
Issue -
State: open - Opened by pangsg 12 months ago
- 4 comments
#190 - prep_text8.py没有该脚本
Issue -
State: closed - Opened by PowerDispatch 12 months ago
- 1 comment
#189 - 我们有线上沟通的群吗
Issue -
State: open - Opened by PowerDispatch 12 months ago
- 1 comment
#189 - 我们有线上沟通的群吗
Issue -
State: open - Opened by PowerDispatch 12 months ago
- 1 comment
#188 - 你好,我想请问下在fastmoe中如何定义 dp+mp下的moe
Issue -
State: closed - Opened by daixiangzi 12 months ago
- 6 comments
#187 - This PR resolves issue #186
Pull Request -
State: closed - Opened by Cobalt-27 12 months ago
#187 - This PR resolves issue #186
Pull Request -
State: closed - Opened by Cobalt-27 12 months ago
#186 - num_experts argument error for Megatron-LM
Issue -
State: closed - Opened by Cobalt-27 12 months ago
#186 - num_experts argument error for Megatron-LM
Issue -
State: closed - Opened by Cobalt-27 12 months ago
#185 - [Feature] Make bias of gate optional for naive_gate and its subclasses.
Pull Request -
State: closed - Opened by Zhang-RQ about 1 year ago
#185 - [Feature] Make bias of gate optional for naive_gate and its subclasses.
Pull Request -
State: closed - Opened by Zhang-RQ about 1 year ago
#184 - 开启Smart schedule时报错Segmentation fault
Issue -
State: open - Opened by Xingzhi107 about 1 year ago
- 8 comments
Labels: bug
#184 - 开启Smart schedule时报错Segmentation fault
Issue -
State: open - Opened by Xingzhi107 about 1 year ago
- 8 comments
Labels: bug
#183 - pytest error
Issue -
State: open - Opened by R-QinQ about 1 year ago
- 3 comments
#183 - pytest error
Issue -
State: open - Opened by R-QinQ about 1 year ago
- 3 comments
#182 - setup.py error!
Issue -
State: closed - Opened by R-QinQ about 1 year ago
- 4 comments
#182 - setup.py error!
Issue -
State: closed - Opened by R-QinQ about 1 year ago
- 4 comments
#181 - ImportError: cannot import name 'get_args' from 'megatron'
Issue -
State: open - Opened by peter-fei about 1 year ago
- 5 comments
#181 - ImportError: cannot import name 'get_args' from 'megatron'
Issue -
State: open - Opened by peter-fei about 1 year ago
- 5 comments
#180 - During inference, the output of noisy gate is nan.
Issue -
State: open - Opened by zqhang about 1 year ago
- 5 comments
#180 - During inference, the output of noisy gate is nan.
Issue -
State: open - Opened by zqhang about 1 year ago
- 5 comments
#179 - Inconsistent evaluation result when clone expert parameters from original FFN
Issue -
State: closed - Opened by Heihaierr about 1 year ago
- 1 comment
#179 - Inconsistent evaluation result when clone expert parameters from original FFN
Issue -
State: closed - Opened by Heihaierr about 1 year ago
- 1 comment
#178 - MOELinear is much slower than torch.nn.Linear
Issue -
State: closed - Opened by kamanphoebe about 1 year ago
- 7 comments
#178 - MOELinear is much slower than torch.nn.Linear
Issue -
State: closed - Opened by kamanphoebe about 1 year ago
- 7 comments
#177 - ModuleNotFoundError: No module named 'fmoe_cuda'
Issue -
State: open - Opened by Taskii-Lei about 1 year ago
- 1 comment
#177 - ModuleNotFoundError: No module named 'fmoe_cuda'
Issue -
State: open - Opened by Taskii-Lei about 1 year ago
- 3 comments
#176 - how to use balance loss?
Issue -
State: open - Opened by Heihaierr about 1 year ago
- 1 comment
#176 - how to use balance loss?
Issue -
State: open - Opened by Heihaierr about 1 year ago
- 1 comment
#175 - update clip-grad-v2.2.patch for grads_in_moe is empty
Pull Request -
State: closed - Opened by Fragile-azalea over 1 year ago
#174 - Fix tests
Pull Request -
State: closed - Opened by laekov over 1 year ago
#174 - Fix tests
Pull Request -
State: closed - Opened by laekov over 1 year ago
#173 - Fit old code with new smgr
Pull Request -
State: closed - Opened by laekov over 1 year ago
#173 - Fit old code with new smgr
Pull Request -
State: closed - Opened by laekov over 1 year ago
#172 - [BUG FIX] Fix bugs in stream manager.
Pull Request -
State: closed - Opened by zms1999 over 1 year ago
- 1 comment
#172 - [BUG FIX] Fix bugs in stream manager.
Pull Request -
State: closed - Opened by zms1999 over 1 year ago
- 1 comment
#171 - fix cublas gemm call for bf16 input
Pull Request -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#171 - fix cublas gemm call for bf16 input
Pull Request -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#170 - MOELinear always returns a zero tensor for bf16 input
Issue -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#170 - MOELinear always returns a zero tensor for bf16 input
Issue -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#169 - MoE L2 norm reduce in Megatron
Issue -
State: closed - Opened by blankde over 1 year ago
- 3 comments
#168 - No overlapping observed when enabling Smart Scheduling
Issue -
State: open - Opened by chenyu-jiang over 1 year ago
- 8 comments
#167 - Update outdated README
Pull Request -
State: closed - Opened by zms1999 over 1 year ago
#166 - Outdated doc for smart schedule with num_expert > 1?
Issue -
State: closed - Opened by chenyu-jiang over 1 year ago
- 1 comment
#166 - Outdated doc for smart schedule with num_expert > 1?
Issue -
State: closed - Opened by chenyu-jiang over 1 year ago
- 1 comment
#165 - Document for process groups
Pull Request -
State: closed - Opened by laekov over 1 year ago
#165 - Document for process groups
Pull Request -
State: closed - Opened by laekov over 1 year ago
#164 - Doc-string / Documentation clarification for parallel groups
Issue -
State: closed - Opened by XMaster96 over 1 year ago
- 2 comments
#164 - Doc-string / Documentation clarification for parallel groups
Issue -
State: closed - Opened by XMaster96 over 1 year ago
- 2 comments
#163 - Only 204 unique tokens (vocabulary size) in enwik8 (transformer-XL example)
Issue -
State: open - Opened by chenwydj over 1 year ago
- 3 comments
#162 - fmoe with deepspeed
Pull Request -
State: open - Opened by KimmiShi over 1 year ago
#162 - fmoe with deepspeed
Pull Request -
State: open - Opened by KimmiShi over 1 year ago
#161 - Mixture of Expert in Vison Task (Segmentation )
Issue -
State: open - Opened by deep-matter over 1 year ago
- 2 comments
#161 - Mixture of Expert in Vison Task (Segmentation )
Issue -
State: open - Opened by deep-matter over 1 year ago
- 2 comments
#160 - bf16 support
Pull Request -
State: closed - Opened by laekov over 1 year ago
#160 - bf16 support
Pull Request -
State: closed - Opened by laekov over 1 year ago
#159 - [WIP] Megatron v3.0.2 with known issues
Pull Request -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#159 - [WIP] Megatron v3.0.2 with known issues
Pull Request -
State: closed - Opened by xptree over 1 year ago
- 1 comment
#158 - Is there any plan to adapt to newer version of Megatron-LM?
Issue -
State: closed - Opened by lvcc2018 over 1 year ago
- 1 comment
#157 - Fix ProcessGroupNCCL mismatch in pytorch2
Pull Request -
State: closed - Opened by laekov over 1 year ago
#157 - Fix ProcessGroupNCCL mismatch in pytorch2
Pull Request -
State: closed - Opened by laekov over 1 year ago
#156 - Distributed Training is failing
Issue -
State: closed - Opened by santurini over 1 year ago
- 9 comments
#156 - Distributed Training is failing
Issue -
State: closed - Opened by santurini over 1 year ago
- 9 comments
#155 - Added link to installation guide
Pull Request -
State: closed - Opened by santurini over 1 year ago
#155 - Added link to installation guide
Pull Request -
State: closed - Opened by santurini over 1 year ago
#154 - Create installation-guide.md
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 1 comment
#154 - Create installation-guide.md
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 1 comment
#153 - Added GitHub Gist link to installation tutorial
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 4 comments
#153 - Added GitHub Gist link to installation tutorial
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 4 comments
#152 - Cast input to weights type for AMP support
Pull Request -
State: closed - Opened by santurini over 1 year ago
#152 - Cast input to weights type for AMP support
Pull Request -
State: closed - Opened by santurini over 1 year ago
#151 - Revert "convert input to same type as weight for mixed precision training"
Pull Request -
State: closed - Opened by laekov over 1 year ago