stanford-futuredata/megablocks issues and pull requests

#97 - support amd/rocm

Issue - State: open - Opened by ehartford 8 months ago - 2 comments
Labels: enhancement, help wanted

#96 - Remove turbo

Pull Request - State: closed - Opened by dblalock 9 months ago

#95 - AMP + BF16 failing

Issue - State: open - Opened by jramapuram 10 months ago - 2 comments

#94 - Unsharding scripts for megablocks models

Issue - State: open - Opened by mayank31398 10 months ago

#93 - the wrong loss func was chosen at evaluation

Issue - State: open - Opened by peterjc123 10 months ago - 2 comments

#92 - Seeking a good multi-node training config

Issue - State: open - Opened by rpand002 11 months ago - 3 comments

#91 - selective router precision

Issue - State: open - Opened by 152334H 11 months ago - 1 comment
Labels: question

#90 - Does this framework support SFT?

Issue - State: open - Opened by banksy23 11 months ago - 1 comment
Labels: question

#89 - Updt triton pin

Pull Request - State: closed - Opened by vchiley 11 months ago - 1 comment

#88 - RuntimeError: Triton Error [CUDA]: invalid argument

Issue - State: open - Opened by noob-ctrl 11 months ago - 12 comments
Labels: question

#87 - Fix `moe_normalize_expert_weights` when `top_k=1`

Pull Request - State: closed - Opened by 152334H 11 months ago - 3 comments

#86 - Gradient scale size for expert gradient

Issue - State: closed - Opened by fanshiqing 11 months ago - 4 comments

#85 - different load_balancing_loss with different pipeline_parallel_size

Issue - State: open - Opened by bozheng-hit 11 months ago - 8 comments
Labels: question

#84 - How to integrate to transformers-based mixtral

Issue - State: open - Opened by nxphi47 11 months ago - 1 comment
Labels: question

#83 - ParallelDroplessMLP initialises self.mlp twice

Issue - State: open - Opened by 152334H 11 months ago - 6 comments
Labels: enhancement, help wanted

#82 - save loading_balancing_loss properly

Issue - State: closed - Opened by gouchangjiang 11 months ago - 2 comments
Labels: question

#81 - Why the second matrix of the mlp layer has the same shape of the first one?

Issue - State: open - Opened by gouchangjiang 11 months ago - 1 comment
Labels: question

#80 - [BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights

Issue - State: open - Opened by RookieHong 11 months ago - 1 comment
Labels: bug

#79 - fix the abnormal ‘CAPACITY_FACTOR’ value

Pull Request - State: open - Opened by jordgedu 11 months ago - 3 comments

#78 - Error from pip about missing torch module

Issue - State: closed - Opened by michaelwhitford 11 months ago - 4 comments
Labels: help wanted

#77 - Efficiency of torch mlp

Issue - State: closed - Opened by imoneoi 11 months ago - 2 comments

#76 - Fix default to be sparse

Pull Request - State: closed - Opened by mvpatel2000 11 months ago

#75 - Add dmlp registry args

Pull Request - State: closed - Opened by j316chuck 11 months ago

#74 - Refactor dtesnor

Pull Request - State: closed - Opened by mvpatel2000 11 months ago

#73 - Dtensor to all paths

Pull Request - State: closed - Opened by mvpatel2000 11 months ago

#72 - Mem opt glu bkwd

Pull Request - State: closed - Opened by mvpatel2000 11 months ago

#71 - Add cast to tensor for DTensor inputs for groupedmlp

Pull Request - State: closed - Opened by eracah 11 months ago

#70 - Change router weight norm from in-place

Pull Request - State: closed - Opened by sashaDoubov 11 months ago

#69 - Skip updating load balancing loss on eval

Pull Request - State: closed - Opened by sedrick-keh-tri 11 months ago - 2 comments

#68 - Script for Full Fine-Tuning of Mixtral

Issue - State: open - Opened by alpayariyak 11 months ago - 1 comment
Labels: question

#67 - Docker issues with PyPI installation

Issue - State: open - Opened by sedrick-keh-tri 11 months ago - 3 comments

#66 - add mem optimized grouped glu

Pull Request - State: closed - Opened by vchiley 11 months ago

#65 - enable custom activation functions

Pull Request - State: closed - Opened by vchiley 11 months ago - 4 comments

#64 - How do you use routing balancing loss under pipeline parallelism

Issue - State: closed - Opened by szhengac 12 months ago - 5 comments

#63 - Update README.md

Pull Request - State: closed - Opened by eltociear 12 months ago - 1 comment

#62 - Has anyone encountered this CUDA error?

Issue - State: closed - Opened by bozheng-hit 12 months ago - 15 comments

#61 - Question on offsets in figures 5

Issue - State: closed - Opened by DaehanKim 12 months ago - 1 comment

#60 - More customizable norm for expert weights

Pull Request - State: closed - Opened by snarayan21 12 months ago

#59 - About the Multi-node Script

Issue - State: closed - Opened by XingyuXie 12 months ago - 4 comments

#58 - enable arg enabled normalization of routing weights

Pull Request - State: closed - Opened by vchiley 12 months ago

#57 - [integrating megablocks with open_lm] Question about megablocks + FSDP

Issue - State: closed - Opened by kernelmachine 12 months ago - 8 comments

#56 - Update setup.py to support multiple device capabilities

Pull Request - State: closed - Opened by simon-mo 12 months ago - 6 comments

#55 - Update Megatron-LM scripts and integration for latest Docker container.

Pull Request - State: closed - Opened by tgale96 12 months ago

#54 - Remove errant "*" in README

Pull Request - State: closed - Opened by tgale96 12 months ago

#53 - Fix * in README

Pull Request - State: closed - Opened by tgale96 12 months ago

#52 - Update dependencies and package organization.

Pull Request - State: closed - Opened by tgale96 12 months ago

#51 - Installation fails due to missing mosaicml-turbo

Issue - State: closed - Opened by AlpinDale 12 months ago - 2 comments

#50 - Latest GitHub release version higher than main branch setup.py

Issue - State: closed - Opened by nateraw 12 months ago - 4 comments

#49 - Comparison against top-2 routing?

Issue - State: open - Opened by sunnyszy 12 months ago - 4 comments
Labels: question

#48 - Inference code

Issue - State: closed - Opened by AlpinDale 12 months ago - 5 comments

#47 - Fix bug in topology kernel for ffn_hidden_size>4096.

Pull Request - State: closed - Opened by tgale96 12 months ago - 2 comments

#46 - Wrong outputs for hidden dim 14336

Issue - State: closed - Opened by pierrestock 12 months ago - 3 comments

#45 - Support new model

Pull Request - State: closed - Opened by pierrestock 12 months ago - 4 comments

#44 - Add expert dropout

Pull Request - State: closed - Opened by samhavens 12 months ago

#43 - Removing an extra size call

Pull Request - State: closed - Opened by bcui19 12 months ago

#42 - Torch Moe

Pull Request - State: closed - Opened by j316chuck 12 months ago - 2 comments

#41 - Enable generic dimentionality for input

Pull Request - State: closed - Opened by vchiley 12 months ago

#40 - Why not support tensor model parallel?

Issue - State: closed - Opened by Richie-yan about 1 year ago - 7 comments

#39 - Have megablocks rely on torch default precision

Pull Request - State: closed - Opened by mvpatel2000 about 1 year ago

#38 - Add GLU support

Pull Request - State: closed - Opened by sashaDoubov about 1 year ago - 4 comments

#37 - Avoid duplicate `.cpu()` call

Pull Request - State: closed - Opened by mvpatel2000 about 1 year ago - 3 comments

#36 - Update version

Pull Request - State: closed - Opened by mvpatel2000 about 1 year ago

#35 - How to add support for swiglu in Megablocks?

Issue - State: closed - Opened by fanshiqing about 1 year ago - 14 comments

#34 - Refactoring class hierarchy for FSDP wrapping

Pull Request - State: closed - Opened by tgale96 about 1 year ago - 2 comments

#32 - How to pip install the latest megablocks?

Issue - State: closed - Opened by fanshiqing about 1 year ago - 2 comments

#31 - Enable running MegaBlocks MoE without bias

Pull Request - State: closed - Opened by vchiley about 1 year ago

#30 - Fix activation quantization

Pull Request - State: closed - Opened by dblalock about 1 year ago - 4 comments

#29 - Remove unusued import

Pull Request - State: closed - Opened by mvpatel2000 about 1 year ago - 1 comment

#28 - Fix grouped GEMM API

Pull Request - State: closed - Opened by tgale96 about 1 year ago

#27 - Small optimizations for EP/TP

Pull Request - State: closed - Opened by tgale96 about 1 year ago

#26 - Support memory_optimized_mlp with grouped_mlp.

Pull Request - State: closed - Opened by tgale96 about 1 year ago

#25 - Gate grouped gemm install

Pull Request - State: closed - Opened by mvpatel2000 about 1 year ago - 2 comments

#24 - Make MegaBlocks go vroom on Hopper.

Pull Request - State: closed - Opened by tgale96 about 1 year ago - 1 comment

#23 - Add optional activation quantization

Pull Request - State: closed - Opened by dblalock about 1 year ago - 7 comments

#22 - update Megatron-LM submodule and update a test script

Pull Request - State: closed - Opened by feifeibear about 1 year ago

#21 - Does megablocks support the true expert parallelism?

Issue - State: closed - Opened by feifeibear about 1 year ago - 2 comments

#20 - Fix weight gradients with expert model parallelism.

Pull Request - State: closed - Opened by tgale96 about 1 year ago

#19 - Enable FSDP sharding for bias

Pull Request - State: closed - Opened by b-chu about 1 year ago - 1 comment

#18 - multi-node problem

Issue - State: closed - Opened by sudahui about 1 year ago - 5 comments

#17 - Activation memory optimization

Pull Request - State: closed - Opened by tgale96 over 1 year ago - 2 comments

#16 - Update citation in README to MLSys

Pull Request - State: closed - Opened by deepakn94 over 1 year ago

#15 - Adding support for tensor model parallelism when expert_parallel_world_size > num_experts.

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#14 - Use builtin decorators for AMP.

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#13 - Update out-of-date README.

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#12 - Minor cleanup

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#11 - Add support for fully-sharded data parallelism.

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#10 - Add flag to force uniform assignment to experts for load balancing.

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#9 - updt setup.py; fix tokens_per_expert casting

Pull Request - State: closed - Opened by vchiley over 1 year ago

#8 - add guangnian webtext2 training scripts

Pull Request - State: closed - Opened by feifeibear over 1 year ago

#7 - add guangnian webtext2 training scripts

Pull Request - State: closed - Opened by feifeibear over 1 year ago

#6 - Optimizations for top_k > 1

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#5 - Switch dMoE models to use bfloat16

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#4 - Add support for bfloat16 and AdaFactor

Pull Request - State: closed - Opened by tgale96 over 1 year ago

#3 - Current installation instructions don't quite work

Issue - State: closed - Opened by deepakn94 almost 2 years ago - 1 comment

#2 - Re-factoring for Composer integration.

Pull Request - State: closed - Opened by tgale96 almost 2 years ago

#1 - Remove Megatron dependency from core layers and tests.

Pull Request - State: closed - Opened by tgale96 almost 2 years ago

GitHub / stanford-futuredata/megablocks issues and pull requests