Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/nanotron issues and pull requests
#245 - Small fixes when resuming training
Pull Request -
State: open - Opened by NouamaneTazi 4 days ago
#244 - Add optional validation (to see how to do, see ). Take out some checkβ¦
Pull Request -
State: closed - Opened by kylematoba 5 days ago
#243 - Fix initial_lr when resuming training
Pull Request -
State: open - Opened by Lauler 5 days ago
- 1 comment
#242 - Sync weights across dp dimension during consistency validation
Issue -
State: open - Opened by gritukan 8 days ago
#241 - convert_nt_to_hf is broken for non-interleaved RoPE
Issue -
State: open - Opened by gritukan 8 days ago
#240 - Support for AMD GPUs
Issue -
State: open - Opened by YuTian8328 14 days ago
#239 - Pretraining dataset too large to fit into memory
Issue -
State: closed - Opened by alexchen4ai 18 days ago
#238 - Load random states from checkpoint
Pull Request -
State: open - Opened by gritukan 21 days ago
#237 - Nanoset reuses same shuffling for repeated data (epochs)
Issue -
State: open - Opened by Lauler 23 days ago
- 1 comment
#236 - CUDA_DEVICE_MAX_CONNECTIONS
Issue -
State: open - Opened by jeromeku 24 days ago
#235 - ImportError: cannot import name 'LoggingArgs' from partially initialized module 'nanotron.config.config error (circular imports)
Issue -
State: open - Opened by Akhilvallala2023 about 1 month ago
#234 - Ci move
Pull Request -
State: closed - Opened by glegendre01 about 2 months ago
#233 - Learning rate restart broken with Nanoset?
Issue -
State: open - Opened by Pclanglais about 2 months ago
- 10 comments
#222 - lighteval support after checkpoint, UX refactor
Pull Request -
State: open - Opened by eliebak 3 months ago
- 5 comments
#198 - [Bug] Missing `_is_using_mup` when resume checkpoint
Issue -
State: open - Opened by xrsrke 5 months ago
- 1 comment
Labels: bug, good first issue, help wanted
#180 - FEAT: Adding 1.58bit LLMs training architecture in nanotron
Pull Request -
State: open - Opened by MekkCyber 6 months ago
- 5 comments
#174 - Llama3 conversion scripts π¦
Pull Request -
State: open - Opened by TJ-Solergibert 6 months ago
- 9 comments
#160 - Enable masking when tp=1
Pull Request -
State: closed - Opened by YongjunHe 7 months ago
#100 - Fix some bugs
Pull Request -
State: closed - Opened by jordane95 9 months ago
- 1 comment
#100 - Fix some bugs
Pull Request -
State: closed - Opened by jordane95 9 months ago
- 1 comment
#99 - [Features] support gradient checkpointing for memory saving
Issue -
State: open - Opened by zguo0525 9 months ago
- 1 comment
#99 - [Features] support gradient checkpointing for memory saving
Issue -
State: open - Opened by zguo0525 9 months ago
- 1 comment
#98 - [Refactor] Refactoring Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
- 1 comment
#98 - [Refactor] Refactoring Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
- 1 comment
#97 - [Quick fix] fix circular import in logging
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#97 - [Quick fix] fix circular import in logging
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#96 - Bump v0.4 + Quick refactos
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#96 - Bump v0.4 + Quick refactos
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#95 - [DoReMi] Small refactors
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#95 - [DoReMi] Small refactors
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#94 - [Feature] Refactor ParallelContext.world_rank_matrix
Pull Request -
State: closed - Opened by 0xkerem 9 months ago
- 7 comments
#94 - [Feature] Refactor ParallelContext.world_rank_matrix
Pull Request -
State: closed - Opened by 0xkerem 9 months ago
- 7 comments
#93 - [Docs] Add unit tests as a requirement
Pull Request -
State: closed - Opened by xrsrke 9 months ago
- 1 comment
#93 - [Docs] Add unit tests as a requirement
Pull Request -
State: closed - Opened by xrsrke 9 months ago
- 1 comment
#92 - [Bug] Fix clipping gradients's test
Issue -
State: closed - Opened by xrsrke 9 months ago
Labels: bug, good first issue, help wanted, High Priority
#92 - [Bug] Fix clipping gradients's test
Issue -
State: closed - Opened by xrsrke 9 months ago
Labels: bug, good first issue, help wanted, High Priority
#91 - [Feature] All GPUs within the same TP group load training data from shared memory
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, Low Priority
#91 - [Feature] All GPUs within the same TP group load training data from shared memory
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, Low Priority
#90 - [Unit Test] Add unit tests for DistributedTrainer
Issue -
State: open - Opened by xrsrke 9 months ago
- 5 comments
Labels: good first issue, help wanted, High Priority
#90 - [Unit Test] Add unit tests for DistributedTrainer
Issue -
State: open - Opened by xrsrke 9 months ago
- 5 comments
Labels: good first issue, help wanted, High Priority
#89 - [Unit Test] Add unit test for DoReMi's trainer
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: Medium Priority
#89 - [Unit Test] Add unit test for DoReMi's trainer
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: Medium Priority
#88 - [Feature] Use CUDA event for measuring elasped time
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted
#88 - [Feature] Use CUDA event for measuring elasped time
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted
#87 - [Feature] Asyncronous Serialization
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted
#87 - [Feature] Asyncronous Serialization
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted
#86 - [Feature] Kernel Fusion of Layer Norm and GeLU
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted
#86 - [Feature] Kernel Fusion of Layer Norm and GeLU
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted
#85 - [Feature] LAMB optimizer
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted
#85 - [Feature] LAMB optimizer
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted
#84 - [Feature] Parallel transformer block
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, High Priority
#84 - [Feature] Parallel transformer block
Issue -
State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, High Priority
#83 - Add Mamba PR
Pull Request -
State: closed - Opened by 3outeille 9 months ago
- 1 comment
#83 - Add Mamba PR
Pull Request -
State: closed - Opened by 3outeille 9 months ago
- 1 comment
#82 - [Bug] Not saving `lm_head` in checkpoint
Issue -
State: closed - Opened by xrsrke 9 months ago
Labels: bug
#82 - [Bug] Not saving `lm_head` in checkpoint
Issue -
State: closed - Opened by xrsrke 9 months ago
Labels: bug
#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#80 - [Feature] Add loading different datasets based on training stages
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#80 - [Feature] Add loading different datasets based on training stages
Pull Request -
State: closed - Opened by xrsrke 9 months ago
#79 - Continued Pretraining on Llama 7b.
Issue -
State: open - Opened by wiseyy 9 months ago
- 8 comments
#79 - Continued Pretraining on Llama 7b.
Issue -
State: open - Opened by wiseyy 9 months ago
- 8 comments
#78 - Continued Pretraining on Llama7b.
Issue -
State: closed - Opened by wiseyy 9 months ago
- 1 comment
#78 - Continued Pretraining on Llama7b.
Issue -
State: closed - Opened by wiseyy 9 months ago
- 1 comment
#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`
Issue -
State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, good first issue, help wanted
#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`
Issue -
State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, good first issue, help wanted
#76 - Deprecate `recompute_granularity` in config
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#76 - Deprecate `recompute_granularity` in config
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#75 - Refactor dMoE
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#75 - Refactor dMoE
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#74 - [Feature] Fix support for sequence parallelism with MoEs
Issue -
State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, help wanted
#74 - [Feature] Fix support for sequence parallelism with MoEs
Issue -
State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, help wanted
#73 - Add MoEs support
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#73 - Add MoEs support
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#72 - Support Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#72 - Support Expert Parallelism
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#71 - Implement pipeline parallel size-agnostic optimizer state loading
Pull Request -
State: closed - Opened by nopperl 9 months ago
#71 - Implement pipeline parallel size-agnostic optimizer state loading
Pull Request -
State: closed - Opened by nopperl 9 months ago
#70 - [FP8 Training] End-to-end FP8 Training
Pull Request -
State: open - Opened by xrsrke 9 months ago
- 3 comments
#70 - [FP8 Training] End-to-end FP8 Training
Pull Request -
State: open - Opened by xrsrke 9 months ago
- 3 comments
#69 - Refactor `ParallelContext` and some process groups creation
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#69 - Refactor `ParallelContext` and some process groups creation
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#68 - Fix topology agnostic loading
Pull Request -
State: closed - Opened by nopperl 9 months ago
- 1 comment
#68 - Fix topology agnostic loading
Pull Request -
State: closed - Opened by nopperl 9 months ago
- 1 comment
#67 - fix configs
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#67 - fix configs
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#66 - quick fix train steps assertion
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
- 1 comment
#66 - quick fix train steps assertion
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
- 1 comment
#65 - quick fix train steps assertion
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#65 - quick fix train steps assertion
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#64 - Update bench script
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#64 - Update bench script
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#63 - [`Docs`] Fix typos
Pull Request -
State: closed - Opened by tolgacangoz 9 months ago
#63 - [`Docs`] Fix typos
Pull Request -
State: closed - Opened by tolgacangoz 9 months ago
#62 - Refactoring tying mechanism + small fixes
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#62 - Refactoring tying mechanism + small fixes
Pull Request -
State: closed - Opened by NouamaneTazi 9 months ago
#61 - [Feature request] Performance and accuracy benchmarks
Issue -
State: open - Opened by brianyu-nexusflowai 10 months ago
- 2 comments
#61 - [Feature request] Performance and accuracy benchmarks
Issue -
State: open - Opened by brianyu-nexusflowai 10 months ago
- 2 comments
#60 - Lighteval naming
Pull Request -
State: closed - Opened by thomwolf 10 months ago
#60 - Lighteval naming
Pull Request -
State: closed - Opened by thomwolf 10 months ago