Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / microsoft/mup issues and pull requests
#81 - Add support for dataclasses
Pull Request -
State: open - Opened by francois-rozet about 1 month ago
- 1 comment
#80 - More options of input/output types in coord_check
Issue -
State: open - Opened by francois-rozet about 1 month ago
#79 - CNN utility
Pull Request -
State: closed - Opened by JeremyCCHsu 3 months ago
#78 - How to use with SSL methods like DINOv2?
Issue -
State: open - Opened by josephcappadona 4 months ago
#77 - MuP for RNNs
Issue -
State: open - Opened by norikazu99 4 months ago
#76 - Not getting perf improvements from muP at ~1.5B scale
Issue -
State: open - Opened by gordicaleksa 4 months ago
#75 - fix: adopt mup/Transformers API for torch2.3
Pull Request -
State: open - Opened by emergenz 5 months ago
#74 - MuP for Mamba
Issue -
State: open - Opened by norikazu99 6 months ago
#73 - Refactor: Addressing Sources of User Error
Pull Request -
State: open - Opened by thomasfortin1 7 months ago
- 1 comment
#72 - Support FSDP usage
Pull Request -
State: open - Opened by janEbert 7 months ago
- 1 comment
#71 - Increasing coord check for the network output
Issue -
State: open - Opened by AkshitaB 8 months ago
- 2 comments
#70 - mu parametrization for gated-mlp and group-query attention
Issue -
State: open - Opened by ftgreat 8 months ago
#69 - Reproducing Figure 1 using 'examples/Transformer/main.py'
Issue -
State: open - Opened by jndean 11 months ago
#68 - coord_check for model that returns loss function directly
Issue -
State: open - Opened by ad8e 11 months ago
#67 - Reproducing the validation accuracy vs learning rates curve on ResNet
Issue -
State: open - Opened by liulei277 11 months ago
- 1 comment
#66 - Questions for training gpt-2 using mup
Issue -
State: closed - Opened by jiangjiadi about 1 year ago
- 6 comments
#65 - add width_mult to optimizer dict
Pull Request -
State: open - Opened by marcobellagente93 about 1 year ago
#64 - About Learning rate decay
Issue -
State: open - Opened by afcruzs about 1 year ago
- 2 comments
#63 - Demo notebook
Pull Request -
State: closed - Opened by edwardjhu about 1 year ago
#62 - Unclear `assert_hidden_size_inf` triggers
Issue -
State: closed - Opened by dreavjr about 1 year ago
- 1 comment
#61 - dim_feedforward
Issue -
State: closed - Opened by dreavjr about 1 year ago
#60 - Usage with torch.compile in Pytorch 2?
Issue -
State: open - Opened by dreavjr about 1 year ago
- 2 comments
#59 - FSDP support?
Issue -
State: open - Opened by platers about 1 year ago
- 3 comments
#58 - Interpreting jitter in coordcheck
Issue -
State: closed - Opened by leenachennuru about 1 year ago
- 2 comments
#57 - Some questions about the implementation of muP.
Issue -
State: open - Opened by lepodl about 1 year ago
#56 - µTransfer across batch size && weight decay setting
Issue -
State: open - Opened by PanYue2023 over 1 year ago
#55 - _rescale_parameters() inconsistent with the paper for the tied embedding scenario?
Issue -
State: open - Opened by ofivite over 1 year ago
- 2 comments
#54 - Is it possible to also scale the depth of the model?
Issue -
State: open - Opened by ricomnl over 1 year ago
- 5 comments
#53 - Once the best HPs have been found, does the final model have to be trained with `mup` or can one just use the found HPs and train the model in a standard way?
Issue -
State: closed - Opened by ricomnl over 1 year ago
#52 - Reproducing the training loss vs learning rates curve on MLP
Issue -
State: closed - Opened by jhj0411jhj over 1 year ago
- 5 comments
#51 - Warmup schedule when changing the number of tokens/steps (GPT-3 experiment detail)
Issue -
State: open - Opened by sashaDoubov over 1 year ago
#48 - Positional Embeddings should be MuReadout parameters ?
Issue -
State: open - Opened by codedecde over 1 year ago
- 2 comments
#47 - Question about the difference between init code and paper
Issue -
State: closed - Opened by midori1 over 1 year ago
- 2 comments
#46 - Does mup support fine tuning pretrained models
Issue -
State: closed - Opened by jhj0411jhj over 1 year ago
- 2 comments
#45 - Embedding Multiplier for Transformer - Clarification
Issue -
State: closed - Opened by sashaDoubov over 1 year ago
- 2 comments
#43 - Are Sequentials with list comprehension handled incorrectly?
Issue -
State: open - Opened by RobertBaruch over 1 year ago
- 2 comments
#42 - interpreting coord checks
Issue -
State: closed - Opened by llucid-97 over 1 year ago
- 2 comments
#41 - in mlp example: 2 problems
Issue -
State: open - Opened by yjjinjie over 1 year ago
- 1 comment
#40 - Questions on learning schedule and binary classification
Issue -
State: closed - Opened by FlamingHorizon over 1 year ago
- 12 comments
#39 - Can base model be larger than target model?
Issue -
State: closed - Opened by jhj0411jhj over 1 year ago
- 3 comments
#38 - coord check plot improvements
Pull Request -
State: closed - Opened by TevenLeScao almost 2 years ago
- 1 comment
#37 - Allowing users to create their own shapes
Pull Request -
State: closed - Opened by TevenLeScao almost 2 years ago
#36 - Should query layers in self-attention be initialized to 0 in practice?
Issue -
State: closed - Opened by xinwuyun almost 2 years ago
- 2 comments
#35 - Plot bugfix
Pull Request -
State: closed - Opened by TevenLeScao almost 2 years ago
#33 - fix: dtype for newer torch versions
Pull Request -
State: closed - Opened by zanussbaum almost 2 years ago
- 1 comment
#32 - Proper error return in coord_check.py
Pull Request -
State: closed - Opened by TevenLeScao almost 2 years ago
- 1 comment
#31 - Finetuning a Pretrained Model Using MuP
Issue -
State: closed - Opened by zanussbaum almost 2 years ago
- 3 comments
#30 - Issue in reproducing the training loss vs learning rates curve
Issue -
State: closed - Opened by NicolasWinckler almost 2 years ago
- 5 comments
#29 - Are parameters with no "infinite" dimensions allowed?
Issue -
State: closed - Opened by callumm-graphcore about 2 years ago
- 5 comments
#28 - LayerNorm Gain and Bias Multipliers
Issue -
State: closed - Opened by AWildridge about 2 years ago
- 2 comments
#27 - MuP Coord Check not Working with Electra Style Model
Issue -
State: closed - Opened by zanussbaum about 2 years ago
- 8 comments
#26 - Has MuP been tested on segmentation models?
Issue -
State: open - Opened by isdj about 2 years ago
- 4 comments
#25 - Should `base=None` be used in `set_base_shapes` for model used for tuning?
Issue -
State: open - Opened by callumm-graphcore about 2 years ago
- 2 comments
#24 - Batch size, Seq len, Step Transfering
Issue -
State: closed - Opened by timothyxp about 2 years ago
- 2 comments
#23 - Conv1D Coord check looks good (I think), but μTransfer does not seem to work?
Issue -
State: closed - Opened by zanussbaum over 2 years ago
- 20 comments
#22 - Coord check looks good, but μTransfer is not working as expected
Issue -
State: closed - Opened by shjwudp over 2 years ago
- 6 comments
#21 - Does mup support Swin Transformer v2 model?
Issue -
State: open - Opened by shiyf129 over 2 years ago
- 2 comments
#20 - muP for contrastive losses
Issue -
State: closed - Opened by xwjabc over 2 years ago
- 2 comments
#19 - missing os import in mup/examples/MLP/main.py ?
Issue -
State: closed - Opened by james-simon over 2 years ago
- 1 comment
#18 - mu parametrization for channel attention
Issue -
State: closed - Opened by xwjabc over 2 years ago
- 5 comments
#17 - mu parametrization for multi-head attention / grouped convolution
Issue -
State: closed - Opened by xwjabc over 2 years ago
- 3 comments
#16 - Optimizers for coord check
Issue -
State: closed - Opened by xwjabc over 2 years ago
- 2 comments
#15 - Torchdistx
Pull Request -
State: closed - Opened by edwardjhu over 2 years ago
- 2 comments
#14 - Coord-check for conv1d
Issue -
State: closed - Opened by bob80333 over 2 years ago
- 17 comments
#13 - ResNet readout_zero_init=True?
Issue -
State: closed - Opened by D-X-Y over 2 years ago
- 2 comments
#12 - Hyperparameter search on base models
Issue -
State: closed - Opened by davisyoshida over 2 years ago
- 2 comments
#11 - integration with Flax?
Issue -
State: open - Opened by nestordemeure over 2 years ago
- 4 comments
#10 - Examples with ConvNets
Issue -
State: closed - Opened by Aboussejra over 2 years ago
- 2 comments
#9 - Does MuReadout apply to all outputs on which loss is computed?
Issue -
State: closed - Opened by jaivardhankapoor over 2 years ago
- 2 comments
#8 - How to use 'attn_mult' config
Issue -
State: closed - Opened by JiayiFeng over 2 years ago
- 2 comments
#7 - MuAdam not adjusting lr for output weights
Issue -
State: closed - Opened by zhuzilin over 2 years ago
- 4 comments
#6 - Is this compatible with DeepSpeed / ZeRO?
Issue -
State: closed - Opened by StellaAthena over 2 years ago
- 6 comments
#4 - Multiple nn.Linear layers
Issue -
State: closed - Opened by windspirit95 over 2 years ago
- 4 comments
#3 - Does mup work with model with Conv2D as output?
Issue -
State: closed - Opened by BurguerJohn over 2 years ago
- 8 comments
#2 - PyTorch Lightning example
Issue -
State: open - Opened by tchaton over 2 years ago
- 1 comment
#1 - Consider decoupled weight decay optimizers?
Issue -
State: closed - Opened by abhi-mosaic over 2 years ago
- 6 comments