Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / laekov/fastmoe issues and pull requests
#151 - Revert "convert input to same type as weight for mixed precision training"
Pull Request -
State: closed - Opened by laekov over 1 year ago
#150 - convert input to same type as weight for mixed precision training
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 3 comments
#150 - convert input to same type as weight for mixed precision training
Pull Request -
State: closed - Opened by santurini over 1 year ago
- 3 comments
#149 - Training on Single GPU gives NCCL error
Issue -
State: closed - Opened by santurini over 1 year ago
- 1 comment
#149 - Training on Single GPU gives NCCL error
Issue -
State: closed - Opened by santurini over 1 year ago
- 1 comment
#148 - MoE DDP + Expert Parallelism
Issue -
State: closed - Opened by santurini over 1 year ago
- 6 comments
#148 - MoE DDP + Expert Parallelism
Issue -
State: closed - Opened by santurini over 1 year ago
- 6 comments
#147 - TypeError: linear_forward(): incompatible function arguments
Issue -
State: closed - Opened by kamanphoebe over 1 year ago
- 4 comments
#147 - TypeError: linear_forward(): incompatible function arguments
Issue -
State: closed - Opened by kamanphoebe over 1 year ago
- 4 comments
#146 - make FasterMoE more general
Pull Request -
State: closed - Opened by zms1999 almost 2 years ago
#146 - make FasterMoE more general
Pull Request -
State: closed - Opened by zms1999 almost 2 years ago
#145 - FastMoE with Megatron-LM v2.5
Pull Request -
State: closed - Opened by zms1999 almost 2 years ago
#145 - FastMoE with Megatron-LM v2.5
Pull Request -
State: closed - Opened by zms1999 almost 2 years ago
#144 - remove synchronize
Pull Request -
State: closed - Opened by Fragile-azalea almost 2 years ago
#143 - Is it necessary to use the synchronize operation after the allreduce operation here?
Issue -
State: closed - Opened by Fragile-azalea almost 2 years ago
- 1 comment
#143 - Is it necessary to use the synchronize operation after the allreduce operation here?
Issue -
State: closed - Opened by Fragile-azalea almost 2 years ago
- 1 comment
#142 - Compatibility to older cuda and torch 1.13
Pull Request -
State: closed - Opened by laekov about 2 years ago
#142 - Compatibility to older cuda and torch 1.13
Pull Request -
State: closed - Opened by laekov about 2 years ago
#141 - Diverge gshard gate
Pull Request -
State: closed - Opened by laekov about 2 years ago
#141 - Diverge gshard gate
Pull Request -
State: closed - Opened by laekov about 2 years ago
#140 - smart_schedule.h bug fixed
Pull Request -
State: closed - Opened by lawrence-cj about 2 years ago
- 1 comment
#140 - smart_schedule.h bug fixed
Pull Request -
State: closed - Opened by lawrence-cj about 2 years ago
- 1 comment
#139 - Does FastMoe have a plan to support pipeline parallel with Megatron?
Issue -
State: closed - Opened by LitLeo about 2 years ago
- 2 comments
#139 - Does FastMoe have a plan to support pipeline parallel with Megatron?
Issue -
State: closed - Opened by LitLeo about 2 years ago
- 2 comments
#138 - fix bug: add proper comm group
Pull Request -
State: closed - Opened by zms1999 about 2 years ago
#138 - fix bug: add proper comm group
Pull Request -
State: closed - Opened by zms1999 about 2 years ago
#137 - More GPU number than expert number
Issue -
State: closed - Opened by hanxuel about 2 years ago
- 5 comments
#137 - More GPU number than expert number
Issue -
State: closed - Opened by hanxuel about 2 years ago
- 5 comments
#136 - Update version requirement in the documents
Pull Request -
State: closed - Opened by laekov over 2 years ago
#136 - Update version requirement in the documents
Pull Request -
State: closed - Opened by laekov over 2 years ago
#135 - Document for examples
Pull Request -
State: closed - Opened by laekov over 2 years ago
#135 - Document for examples
Pull Request -
State: closed - Opened by laekov over 2 years ago
#134 - CUBLAS_STATUS_ARCH_MISMATCH
Issue -
State: closed - Opened by Irenehere over 2 years ago
- 2 comments
#134 - CUBLAS_STATUS_ARCH_MISMATCH
Issue -
State: closed - Opened by Irenehere over 2 years ago
- 2 comments
#133 - 'Namespace' object has no attribute 'balance_strategy'
Issue -
State: closed - Opened by Irenehere over 2 years ago
- 2 comments
#133 - 'Namespace' object has no attribute 'balance_strategy'
Issue -
State: closed - Opened by Irenehere over 2 years ago
- 2 comments
#132 - fastmoe-master/build/temp.linux-x86_64-3.8/cuda/global_exchange.o: No such file or directory
Issue -
State: closed - Opened by Irenehere over 2 years ago
- 9 comments
#131 - During inference, I need to run forward on CPU, so FMOE does not support CPU inference now?
Issue -
State: closed - Opened by snsun over 2 years ago
- 2 comments
#131 - During inference, I need to run forward on CPU, so FMOE does not support CPU inference now?
Issue -
State: closed - Opened by snsun over 2 years ago
- 2 comments
#130 - Fix GshardGate top1_idx
Pull Request -
State: closed - Opened by Fragile-azalea over 2 years ago
#130 - Fix GshardGate top1_idx
Pull Request -
State: closed - Opened by Fragile-azalea over 2 years ago
#129 - The top_k in Gshard seems to be one.
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 3 comments
#128 - About balance loss
Issue -
State: closed - Opened by LoganLiu66 over 2 years ago
- 3 comments
#128 - About balance loss
Issue -
State: closed - Opened by LoganLiu66 over 2 years ago
- 3 comments
#127 - update readme: enable NCCL by default
Pull Request -
State: closed - Opened by heheda12345 over 2 years ago
#127 - update readme: enable NCCL by default
Pull Request -
State: closed - Opened by heheda12345 over 2 years ago
#126 - NCCL Error at /home/xxx/fastmoe/cuda/global_exchange.cpp:127 value 5
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 2 comments
#126 - NCCL Error at /home/xxx/fastmoe/cuda/global_exchange.cpp:127 value 5
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 2 comments
#125 - Performance difference when replacing FFN with FMoETransformerMLP in transformer
Issue -
State: closed - Opened by LoganLiu66 over 2 years ago
- 1 comment
#125 - Performance difference when replacing FFN with FMoETransformerMLP in transformer
Issue -
State: closed - Opened by LoganLiu66 over 2 years ago
- 1 comment
#124 - module 'fmoe_cuda' has no attribute 'ensure_nccl'
Issue -
State: closed - Opened by Fangbo0506 over 2 years ago
- 4 comments
#124 - module 'fmoe_cuda' has no attribute 'ensure_nccl'
Issue -
State: closed - Opened by Fangbo0506 over 2 years ago
- 4 comments
#123 - Fix nccl uid bcast for torch v1.12.0
Pull Request -
State: closed - Opened by laekov over 2 years ago
#123 - Fix nccl uid bcast for torch v1.12.0
Pull Request -
State: closed - Opened by laekov over 2 years ago
#122 - Ninja Build Stopped Subcommand Failed
Issue -
State: closed - Opened by QiyaoWei over 2 years ago
- 2 comments
#121 - How to use Convolution operator as the expert?
Issue -
State: closed - Opened by hobbitlzy over 2 years ago
- 12 comments
#119 - nccl.h is not found or ncclUnhandledCudaError: Call to CUDA function failed
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 9 comments
#116 - 询问DistributedGroupedDataParallel的使用方式
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 7 comments
#116 - 询问DistributedGroupedDataParallel的使用方式
Issue -
State: closed - Opened by Fragile-azalea over 2 years ago
- 7 comments
#111 - python setup.py install error with ["ninja", "-v"]
Issue -
State: closed - Opened by louislau1129 over 2 years ago
- 11 comments
#105 - How to support data parallel and model parallel for megatron at the same time.
Issue -
State: closed - Opened by superqing001 almost 3 years ago
- 3 comments
#96 - setup.py install 安装报错
Issue -
State: closed - Opened by zxw866 almost 3 years ago
- 4 comments
#96 - setup.py install 安装报错
Issue -
State: closed - Opened by zxw866 almost 3 years ago
- 4 comments
#82 - When running fastmoe with model parallel, the training process hanged
Issue -
State: closed - Opened by sandyhouse over 3 years ago
- 6 comments
#82 - When running fastmoe with model parallel, the training process hanged
Issue -
State: closed - Opened by sandyhouse over 3 years ago
- 6 comments
#61 - Adaptation guidelines for Megatron v2.4
Issue -
State: closed - Opened by ymjiang over 3 years ago
- 6 comments
Labels: good first issue
#53 - How to use fastmoe on fairseq?
Issue -
State: closed - Opened by Hanlard over 3 years ago
- 5 comments
#53 - How to use fastmoe on fairseq?
Issue -
State: closed - Opened by Hanlard over 3 years ago
- 5 comments
#16 - Can't find ProcessGroupNCCL.hpp
Issue -
State: closed - Opened by zjujh1995 almost 4 years ago
- 9 comments