Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / laekov/fastmoe issues and pull requests

#150 - convert input to same type as weight for mixed precision training

Pull Request - State: closed - Opened by santurini over 1 year ago - 3 comments

#150 - convert input to same type as weight for mixed precision training

Pull Request - State: closed - Opened by santurini over 1 year ago - 3 comments

#149 - Training on Single GPU gives NCCL error

Issue - State: closed - Opened by santurini over 1 year ago - 1 comment

#149 - Training on Single GPU gives NCCL error

Issue - State: closed - Opened by santurini over 1 year ago - 1 comment

#148 - MoE DDP + Expert Parallelism

Issue - State: closed - Opened by santurini over 1 year ago - 6 comments

#148 - MoE DDP + Expert Parallelism

Issue - State: closed - Opened by santurini over 1 year ago - 6 comments

#147 - TypeError: linear_forward(): incompatible function arguments

Issue - State: closed - Opened by kamanphoebe over 1 year ago - 4 comments

#147 - TypeError: linear_forward(): incompatible function arguments

Issue - State: closed - Opened by kamanphoebe over 1 year ago - 4 comments

#146 - make FasterMoE more general

Pull Request - State: closed - Opened by zms1999 almost 2 years ago

#146 - make FasterMoE more general

Pull Request - State: closed - Opened by zms1999 almost 2 years ago

#145 - FastMoE with Megatron-LM v2.5

Pull Request - State: closed - Opened by zms1999 almost 2 years ago

#145 - FastMoE with Megatron-LM v2.5

Pull Request - State: closed - Opened by zms1999 almost 2 years ago

#144 - remove synchronize

Pull Request - State: closed - Opened by Fragile-azalea almost 2 years ago

#142 - Compatibility to older cuda and torch 1.13

Pull Request - State: closed - Opened by laekov about 2 years ago

#142 - Compatibility to older cuda and torch 1.13

Pull Request - State: closed - Opened by laekov about 2 years ago

#141 - Diverge gshard gate

Pull Request - State: closed - Opened by laekov about 2 years ago

#141 - Diverge gshard gate

Pull Request - State: closed - Opened by laekov about 2 years ago

#140 - smart_schedule.h bug fixed

Pull Request - State: closed - Opened by lawrence-cj about 2 years ago - 1 comment

#140 - smart_schedule.h bug fixed

Pull Request - State: closed - Opened by lawrence-cj about 2 years ago - 1 comment

#139 - Does FastMoe have a plan to support pipeline parallel with Megatron?

Issue - State: closed - Opened by LitLeo about 2 years ago - 2 comments

#139 - Does FastMoe have a plan to support pipeline parallel with Megatron?

Issue - State: closed - Opened by LitLeo about 2 years ago - 2 comments

#138 - fix bug: add proper comm group

Pull Request - State: closed - Opened by zms1999 about 2 years ago

#138 - fix bug: add proper comm group

Pull Request - State: closed - Opened by zms1999 about 2 years ago

#137 - More GPU number than expert number

Issue - State: closed - Opened by hanxuel about 2 years ago - 5 comments

#137 - More GPU number than expert number

Issue - State: closed - Opened by hanxuel about 2 years ago - 5 comments

#136 - Update version requirement in the documents

Pull Request - State: closed - Opened by laekov over 2 years ago

#136 - Update version requirement in the documents

Pull Request - State: closed - Opened by laekov over 2 years ago

#135 - Document for examples

Pull Request - State: closed - Opened by laekov over 2 years ago

#135 - Document for examples

Pull Request - State: closed - Opened by laekov over 2 years ago

#134 - CUBLAS_STATUS_ARCH_MISMATCH

Issue - State: closed - Opened by Irenehere over 2 years ago - 2 comments

#134 - CUBLAS_STATUS_ARCH_MISMATCH

Issue - State: closed - Opened by Irenehere over 2 years ago - 2 comments

#133 - 'Namespace' object has no attribute 'balance_strategy'

Issue - State: closed - Opened by Irenehere over 2 years ago - 2 comments

#133 - 'Namespace' object has no attribute 'balance_strategy'

Issue - State: closed - Opened by Irenehere over 2 years ago - 2 comments

#130 - Fix GshardGate top1_idx

Pull Request - State: closed - Opened by Fragile-azalea over 2 years ago

#130 - Fix GshardGate top1_idx

Pull Request - State: closed - Opened by Fragile-azalea over 2 years ago

#129 - The top_k in Gshard seems to be one.

Issue - State: closed - Opened by Fragile-azalea over 2 years ago - 3 comments

#128 - About balance loss

Issue - State: closed - Opened by LoganLiu66 over 2 years ago - 3 comments

#128 - About balance loss

Issue - State: closed - Opened by LoganLiu66 over 2 years ago - 3 comments

#127 - update readme: enable NCCL by default

Pull Request - State: closed - Opened by heheda12345 over 2 years ago

#127 - update readme: enable NCCL by default

Pull Request - State: closed - Opened by heheda12345 over 2 years ago

#126 - NCCL Error at /home/xxx/fastmoe/cuda/global_exchange.cpp:127 value 5

Issue - State: closed - Opened by Fragile-azalea over 2 years ago - 2 comments

#126 - NCCL Error at /home/xxx/fastmoe/cuda/global_exchange.cpp:127 value 5

Issue - State: closed - Opened by Fragile-azalea over 2 years ago - 2 comments

#124 - module 'fmoe_cuda' has no attribute 'ensure_nccl'

Issue - State: closed - Opened by Fangbo0506 over 2 years ago - 4 comments

#124 - module 'fmoe_cuda' has no attribute 'ensure_nccl'

Issue - State: closed - Opened by Fangbo0506 over 2 years ago - 4 comments

#123 - Fix nccl uid bcast for torch v1.12.0

Pull Request - State: closed - Opened by laekov over 2 years ago

#123 - Fix nccl uid bcast for torch v1.12.0

Pull Request - State: closed - Opened by laekov over 2 years ago

#122 - Ninja Build Stopped Subcommand Failed

Issue - State: closed - Opened by QiyaoWei over 2 years ago - 2 comments

#121 - How to use Convolution operator as the expert?

Issue - State: closed - Opened by hobbitlzy over 2 years ago - 12 comments

#116 - 询问DistributedGroupedDataParallel的使用方式

Issue - State: closed - Opened by Fragile-azalea over 2 years ago - 7 comments

#116 - 询问DistributedGroupedDataParallel的使用方式

Issue - State: closed - Opened by Fragile-azalea over 2 years ago - 7 comments

#111 - python setup.py install error with ["ninja", "-v"]

Issue - State: closed - Opened by louislau1129 over 2 years ago - 11 comments

#105 - How to support data parallel and model parallel for megatron at the same time.

Issue - State: closed - Opened by superqing001 almost 3 years ago - 3 comments

#96 - setup.py install 安装报错

Issue - State: closed - Opened by zxw866 almost 3 years ago - 4 comments

#96 - setup.py install 安装报错

Issue - State: closed - Opened by zxw866 almost 3 years ago - 4 comments

#82 - When running fastmoe with model parallel, the training process hanged

Issue - State: closed - Opened by sandyhouse over 3 years ago - 6 comments

#82 - When running fastmoe with model parallel, the training process hanged

Issue - State: closed - Opened by sandyhouse over 3 years ago - 6 comments

#61 - Adaptation guidelines for Megatron v2.4

Issue - State: closed - Opened by ymjiang over 3 years ago - 6 comments
Labels: good first issue

#53 - How to use fastmoe on fairseq?

Issue - State: closed - Opened by Hanlard over 3 years ago - 5 comments

#53 - How to use fastmoe on fairseq?

Issue - State: closed - Opened by Hanlard over 3 years ago - 5 comments

#16 - Can't find ProcessGroupNCCL.hpp

Issue - State: closed - Opened by zjujh1995 almost 4 years ago - 9 comments