Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/nanotron issues and pull requests
#278 - Add MLA
Pull Request -
State: open - Opened by zzhhjjj 2 days ago
#277 - add support for nn.Linear layers in StandardParametrizator
Pull Request -
State: open - Opened by sapiosaturn 3 days ago
#276 - Make sure nanotron works with lighteval
Issue -
State: open - Opened by xrsrke 11 days ago
#275 - [Question] I'd like to get help to load dataset.
Issue -
State: closed - Opened by barneylogo 13 days ago
- 2 comments
#274 - Add nanotron performance
Pull Request -
State: open - Opened by xrsrke 15 days ago
#270 - [CICD] Add a timeout for unit tests and measure their execution time
Pull Request -
State: open - Opened by xrsrke 21 days ago
#266 - fp8
Pull Request -
State: open - Opened by xrsrke about 2 months ago
#265 - Invalid Usage of NCCL Library
Issue -
State: open - Opened by awsankur about 2 months ago
- 4 comments
#264 - Support custom logging API for wandb alternatives (ex. ClearML)
Issue -
State: open - Opened by arcyleung about 2 months ago
#263 - [Feature] Support resume ZeRO1 in a new data parallelism size
Pull Request -
State: open - Opened by xrsrke about 2 months ago
#262 - AllReduce taking extra long for ShardedCrossEntropy
Issue -
State: open - Opened by NouamaneTazi about 2 months ago
Labels: help wanted
#261 - Fix DDP with Zero1
Issue -
State: open - Opened by NouamaneTazi about 2 months ago
Labels: help wanted
#260 - Updating dependency versions in pyproject.toml and README.
Pull Request -
State: closed - Opened by sapiosaturn 2 months ago
#259 - Optimize memory in loss consumption
Issue -
State: open - Opened by NouamaneTazi 2 months ago
Labels: help wanted
#258 - quick fix
Pull Request -
State: closed - Opened by eliebak 2 months ago
#258 - quick fix
Pull Request -
State: closed - Opened by eliebak 2 months ago
#257 - FileNotFoundError: No files matching "datasets/fineweb-edu-dedup/*.ds" found in /home/nanotron/datasets/fineweb-edu-dedup
Issue -
State: open - Opened by sankexin 2 months ago
- 2 comments
#256 - Fix wrong initialization of lr scheduler
Pull Request -
State: open - Opened by kylematoba 2 months ago
- 1 comment
#255 - [NEW] Llama3.2 weight converters π¦
Pull Request -
State: open - Opened by TJ-Solergibert 2 months ago
- 1 comment
#254 - the generations not right
Issue -
State: open - Opened by sankexin 2 months ago
- 2 comments
#253 - resuming checkpoint without lr schedule or optimizer state
Pull Request -
State: closed - Opened by eliebak 2 months ago
#252 - Error due to missing `is_zero` arg when saving LR scheduler
Issue -
State: closed - Opened by Lauler 2 months ago
- 3 comments
#251 - Cannot run the Model generated from the example script
Issue -
State: closed - Opened by hz-nm 2 months ago
- 1 comment
#250 - ValueError: ('The number of process requires to run all replicas (8)', 'must be equal to the world size (4).')
Issue -
State: open - Opened by sankexin 2 months ago
- 2 comments
#249 - Support per-domain loss
Pull Request -
State: closed - Opened by MaxiBoether 2 months ago
- 1 comment
#248 - Quick fix last PR
Pull Request -
State: closed - Opened by NouamaneTazi 2 months ago
#247 - Add shuffling in Nanotron for subsequent epochs when data is repeated
Pull Request -
State: open - Opened by Lauler 3 months ago
#246 - Optimize memory when loading checkpoint
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#245 - Small fixes when resuming training
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#244 - Add optional validation (to see how to do, see ). Take out some checkβ¦
Pull Request -
State: closed - Opened by kylematoba 3 months ago
#243 - Fix initial_lr when resuming training
Pull Request -
State: open - Opened by Lauler 3 months ago
- 6 comments
#242 - Sync weights across dp dimension during consistency validation
Issue -
State: open - Opened by gritukan 3 months ago
#241 - convert_nt_to_hf is broken for non-interleaved RoPE
Issue -
State: open - Opened by gritukan 3 months ago
- 1 comment
#240 - Support for AMD GPUs
Issue -
State: open - Opened by YuTian8328 3 months ago
#239 - Pretraining dataset too large to fit into memory
Issue -
State: closed - Opened by alexchen4ai 3 months ago
- 1 comment
#238 - Load random states from checkpoint
Pull Request -
State: open - Opened by gritukan 3 months ago
- 1 comment
#237 - Nanoset reuses same shuffling for repeated data (epochs)
Issue -
State: open - Opened by Lauler 3 months ago
- 1 comment
#236 - CUDA_DEVICE_MAX_CONNECTIONS
Issue -
State: open - Opened by jeromeku 3 months ago
- 1 comment
#235 - ImportError: cannot import name 'LoggingArgs' from partially initialized module 'nanotron.config.config error (circular imports)
Issue -
State: open - Opened by Akhilvallala2023 4 months ago
#234 - Ci move
Pull Request -
State: closed - Opened by glegendre01 4 months ago
#233 - Learning rate restart broken with Nanoset?
Issue -
State: open - Opened by Pclanglais 5 months ago
- 13 comments
Labels: bug
#230 - Fix loading scheduler when having more than one param_group
Pull Request -
State: closed - Opened by TJ-Solergibert 5 months ago
#222 - lighteval support after checkpoint, UX refactor
Pull Request -
State: open - Opened by eliebak 6 months ago
- 5 comments
#221 - error resuming from checkpoint if PP > 1
Issue -
State: closed - Opened by moussaKam 6 months ago
- 7 comments
#215 - Hello, team
Issue -
State: open - Opened by barneylogo 6 months ago
- 2 comments
#198 - [Bug] Missing `_is_using_mup` when resume checkpoint
Issue -
State: open - Opened by xrsrke 8 months ago
- 1 comment
Labels: bug, good first issue, help wanted
#190 - Add utility to preview samples used for training. See https://github.com/huggingface/nanotron/issues/184.
Pull Request -
State: closed - Opened by kylematoba 8 months ago
#180 - FEAT: Adding 1.58bit LLMs training architecture in nanotron
Pull Request -
State: open - Opened by MekkCyber 9 months ago
- 6 comments
#177 - PyTorch profiler is unable to serialize numpy datatypes sometimes inserted as process group ranks
Issue -
State: open - Opened by hatanp 9 months ago
- 1 comment
#174 - Llama3 conversion scripts π¦
Pull Request -
State: open - Opened by TJ-Solergibert 9 months ago
- 9 comments
#160 - Enable masking when tp=1
Pull Request -
State: closed - Opened by YongjunHe 9 months ago
#143 - Use CUDA Events for measuring elapsed time
Pull Request -
State: closed - Opened by staghado 10 months ago
- 2 comments
#100 - Fix some bugs
Pull Request -
State: closed - Opened by jordane95 11 months ago
- 1 comment
#100 - Fix some bugs
Pull Request -
State: closed - Opened by jordane95 11 months ago
- 1 comment
#99 - [Features] support gradient checkpointing for memory saving
Issue -
State: open - Opened by zguo0525 11 months ago
- 1 comment
#99 - [Features] support gradient checkpointing for memory saving
Issue -
State: open - Opened by zguo0525 11 months ago
- 1 comment
#98 - [Refactor] Refactoring Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
- 1 comment
#98 - [Refactor] Refactoring Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
- 1 comment
#97 - [Quick fix] fix circular import in logging
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
#97 - [Quick fix] fix circular import in logging
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
#96 - Bump v0.4 + Quick refactos
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
#96 - Bump v0.4 + Quick refactos
Pull Request -
State: closed - Opened by NouamaneTazi 11 months ago
#95 - [DoReMi] Small refactors
Pull Request -
State: closed - Opened by xrsrke 11 months ago
#95 - [DoReMi] Small refactors
Pull Request -
State: closed - Opened by xrsrke 11 months ago
#94 - [Feature] Refactor ParallelContext.world_rank_matrix
Pull Request -
State: closed - Opened by 0xkerem 11 months ago
- 7 comments
#94 - [Feature] Refactor ParallelContext.world_rank_matrix
Pull Request -
State: closed - Opened by 0xkerem 11 months ago
- 7 comments
#93 - [Docs] Add unit tests as a requirement
Pull Request -
State: closed - Opened by xrsrke 11 months ago
- 1 comment
#93 - [Docs] Add unit tests as a requirement
Pull Request -
State: closed - Opened by xrsrke 11 months ago
- 1 comment
#92 - [Bug] Fix clipping gradients's test
Issue -
State: closed - Opened by xrsrke 11 months ago
Labels: bug, good first issue, help wanted, High Priority
#92 - [Bug] Fix clipping gradients's test
Issue -
State: closed - Opened by xrsrke 11 months ago
Labels: bug, good first issue, help wanted, High Priority
#91 - [Feature] All GPUs within the same TP group load training data from shared memory
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, Low Priority
#91 - [Feature] All GPUs within the same TP group load training data from shared memory
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, Low Priority
#90 - [Unit Test] Add unit tests for DistributedTrainer
Issue -
State: open - Opened by xrsrke 11 months ago
- 5 comments
Labels: good first issue, help wanted, High Priority
#90 - [Unit Test] Add unit tests for DistributedTrainer
Issue -
State: open - Opened by xrsrke 11 months ago
- 5 comments
Labels: good first issue, help wanted, High Priority
#89 - [Unit Test] Add unit test for DoReMi's trainer
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: Medium Priority
#89 - [Unit Test] Add unit test for DoReMi's trainer
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: Medium Priority
#88 - [Feature] Use CUDA event for measuring elasped time
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted
#88 - [Feature] Use CUDA event for measuring elasped time
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted
#87 - [Feature] Asyncronous Serialization
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted
#87 - [Feature] Asyncronous Serialization
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted
#86 - [Feature] Kernel Fusion of Layer Norm and GeLU
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted
#86 - [Feature] Kernel Fusion of Layer Norm and GeLU
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted
#85 - [Feature] LAMB optimizer
Issue -
State: open - Opened by xrsrke 11 months ago
- 1 comment
Labels: enhancement, help wanted
#85 - [Feature] LAMB optimizer
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted
#84 - [Feature] Parallel transformer block
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, High Priority
#84 - [Feature] Parallel transformer block
Issue -
State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, High Priority
#83 - Add Mamba PR
Pull Request -
State: closed - Opened by 3outeille 12 months ago
- 1 comment
#83 - Add Mamba PR
Pull Request -
State: closed - Opened by 3outeille 12 months ago
- 1 comment
#82 - [Bug] Not saving `lm_head` in checkpoint
Issue -
State: closed - Opened by xrsrke 12 months ago
Labels: bug
#82 - [Bug] Not saving `lm_head` in checkpoint
Issue -
State: closed - Opened by xrsrke 12 months ago
Labels: bug
#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel
Pull Request -
State: closed - Opened by xrsrke 12 months ago
#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel
Pull Request -
State: closed - Opened by xrsrke 12 months ago
#80 - [Feature] Add loading different datasets based on training stages
Pull Request -
State: closed - Opened by xrsrke 12 months ago
#80 - [Feature] Add loading different datasets based on training stages
Pull Request -
State: closed - Opened by xrsrke 12 months ago
#79 - Continued Pretraining on Llama 7b.
Issue -
State: open - Opened by wiseyy 12 months ago
- 8 comments
#79 - Continued Pretraining on Llama 7b.
Issue -
State: open - Opened by wiseyy 12 months ago
- 8 comments
#78 - Continued Pretraining on Llama7b.
Issue -
State: closed - Opened by wiseyy 12 months ago
- 1 comment
#78 - Continued Pretraining on Llama7b.
Issue -
State: closed - Opened by wiseyy 12 months ago
- 1 comment
#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`
Issue -
State: open - Opened by NouamaneTazi 12 months ago
Labels: enhancement, good first issue, help wanted
#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`
Issue -
State: open - Opened by NouamaneTazi 12 months ago
Labels: enhancement, good first issue, help wanted