Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/nanotron issues and pull requests

#278 - Add MLA

Pull Request - State: open - Opened by zzhhjjj 2 days ago

#276 - Make sure nanotron works with lighteval

Issue - State: open - Opened by xrsrke 11 days ago

#275 - [Question] I'd like to get help to load dataset.

Issue - State: closed - Opened by barneylogo 13 days ago - 2 comments

#274 - Add nanotron performance

Pull Request - State: open - Opened by xrsrke 15 days ago

#266 - fp8

Pull Request - State: open - Opened by xrsrke about 2 months ago

#265 - Invalid Usage of NCCL Library

Issue - State: open - Opened by awsankur about 2 months ago - 4 comments

#263 - [Feature] Support resume ZeRO1 in a new data parallelism size

Pull Request - State: open - Opened by xrsrke about 2 months ago

#262 - AllReduce taking extra long for ShardedCrossEntropy

Issue - State: open - Opened by NouamaneTazi about 2 months ago
Labels: help wanted

#261 - Fix DDP with Zero1

Issue - State: open - Opened by NouamaneTazi about 2 months ago
Labels: help wanted

#260 - Updating dependency versions in pyproject.toml and README.

Pull Request - State: closed - Opened by sapiosaturn 2 months ago

#259 - Optimize memory in loss consumption

Issue - State: open - Opened by NouamaneTazi 2 months ago
Labels: help wanted

#258 - quick fix

Pull Request - State: closed - Opened by eliebak 2 months ago

#258 - quick fix

Pull Request - State: closed - Opened by eliebak 2 months ago

#256 - Fix wrong initialization of lr scheduler

Pull Request - State: open - Opened by kylematoba 2 months ago - 1 comment

#255 - [NEW] Llama3.2 weight converters πŸ¦™

Pull Request - State: open - Opened by TJ-Solergibert 2 months ago - 1 comment

#254 - the generations not right

Issue - State: open - Opened by sankexin 2 months ago - 2 comments

#253 - resuming checkpoint without lr schedule or optimizer state

Pull Request - State: closed - Opened by eliebak 2 months ago

#252 - Error due to missing `is_zero` arg when saving LR scheduler

Issue - State: closed - Opened by Lauler 2 months ago - 3 comments

#251 - Cannot run the Model generated from the example script

Issue - State: closed - Opened by hz-nm 2 months ago - 1 comment

#249 - Support per-domain loss

Pull Request - State: closed - Opened by MaxiBoether 2 months ago - 1 comment

#248 - Quick fix last PR

Pull Request - State: closed - Opened by NouamaneTazi 2 months ago

#246 - Optimize memory when loading checkpoint

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#245 - Small fixes when resuming training

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#243 - Fix initial_lr when resuming training

Pull Request - State: open - Opened by Lauler 3 months ago - 6 comments

#241 - convert_nt_to_hf is broken for non-interleaved RoPE

Issue - State: open - Opened by gritukan 3 months ago - 1 comment

#240 - Support for AMD GPUs

Issue - State: open - Opened by YuTian8328 3 months ago

#239 - Pretraining dataset too large to fit into memory

Issue - State: closed - Opened by alexchen4ai 3 months ago - 1 comment

#238 - Load random states from checkpoint

Pull Request - State: open - Opened by gritukan 3 months ago - 1 comment

#237 - Nanoset reuses same shuffling for repeated data (epochs)

Issue - State: open - Opened by Lauler 3 months ago - 1 comment

#236 - CUDA_DEVICE_MAX_CONNECTIONS

Issue - State: open - Opened by jeromeku 3 months ago - 1 comment

#234 - Ci move

Pull Request - State: closed - Opened by glegendre01 4 months ago

#233 - Learning rate restart broken with Nanoset?

Issue - State: open - Opened by Pclanglais 5 months ago - 13 comments
Labels: bug

#230 - Fix loading scheduler when having more than one param_group

Pull Request - State: closed - Opened by TJ-Solergibert 5 months ago

#222 - lighteval support after checkpoint, UX refactor

Pull Request - State: open - Opened by eliebak 6 months ago - 5 comments

#221 - error resuming from checkpoint if PP > 1

Issue - State: closed - Opened by moussaKam 6 months ago - 7 comments

#215 - Hello, team

Issue - State: open - Opened by barneylogo 6 months ago - 2 comments

#198 - [Bug] Missing `_is_using_mup` when resume checkpoint

Issue - State: open - Opened by xrsrke 8 months ago - 1 comment
Labels: bug, good first issue, help wanted

#180 - FEAT: Adding 1.58bit LLMs training architecture in nanotron

Pull Request - State: open - Opened by MekkCyber 9 months ago - 6 comments

#174 - Llama3 conversion scripts πŸ¦™

Pull Request - State: open - Opened by TJ-Solergibert 9 months ago - 9 comments

#160 - Enable masking when tp=1

Pull Request - State: closed - Opened by YongjunHe 9 months ago

#143 - Use CUDA Events for measuring elapsed time

Pull Request - State: closed - Opened by staghado 10 months ago - 2 comments

#100 - Fix some bugs

Pull Request - State: closed - Opened by jordane95 11 months ago - 1 comment

#100 - Fix some bugs

Pull Request - State: closed - Opened by jordane95 11 months ago - 1 comment

#99 - [Features] support gradient checkpointing for memory saving

Issue - State: open - Opened by zguo0525 11 months ago - 1 comment

#99 - [Features] support gradient checkpointing for memory saving

Issue - State: open - Opened by zguo0525 11 months ago - 1 comment

#98 - [Refactor] Refactoring Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago - 1 comment

#98 - [Refactor] Refactoring Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago - 1 comment

#97 - [Quick fix] fix circular import in logging

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago

#97 - [Quick fix] fix circular import in logging

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago

#96 - Bump v0.4 + Quick refactos

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago

#96 - Bump v0.4 + Quick refactos

Pull Request - State: closed - Opened by NouamaneTazi 11 months ago

#95 - [DoReMi] Small refactors

Pull Request - State: closed - Opened by xrsrke 11 months ago

#95 - [DoReMi] Small refactors

Pull Request - State: closed - Opened by xrsrke 11 months ago

#94 - [Feature] Refactor ParallelContext.world_rank_matrix

Pull Request - State: closed - Opened by 0xkerem 11 months ago - 7 comments

#94 - [Feature] Refactor ParallelContext.world_rank_matrix

Pull Request - State: closed - Opened by 0xkerem 11 months ago - 7 comments

#93 - [Docs] Add unit tests as a requirement

Pull Request - State: closed - Opened by xrsrke 11 months ago - 1 comment

#93 - [Docs] Add unit tests as a requirement

Pull Request - State: closed - Opened by xrsrke 11 months ago - 1 comment

#92 - [Bug] Fix clipping gradients's test

Issue - State: closed - Opened by xrsrke 11 months ago
Labels: bug, good first issue, help wanted, High Priority

#92 - [Bug] Fix clipping gradients's test

Issue - State: closed - Opened by xrsrke 11 months ago
Labels: bug, good first issue, help wanted, High Priority

#91 - [Feature] All GPUs within the same TP group load training data from shared memory

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, Low Priority

#91 - [Feature] All GPUs within the same TP group load training data from shared memory

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, Low Priority

#90 - [Unit Test] Add unit tests for DistributedTrainer

Issue - State: open - Opened by xrsrke 11 months ago - 5 comments
Labels: good first issue, help wanted, High Priority

#90 - [Unit Test] Add unit tests for DistributedTrainer

Issue - State: open - Opened by xrsrke 11 months ago - 5 comments
Labels: good first issue, help wanted, High Priority

#89 - [Unit Test] Add unit test for DoReMi's trainer

Issue - State: open - Opened by xrsrke 11 months ago
Labels: Medium Priority

#89 - [Unit Test] Add unit test for DoReMi's trainer

Issue - State: open - Opened by xrsrke 11 months ago
Labels: Medium Priority

#88 - [Feature] Use CUDA event for measuring elasped time

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted

#88 - [Feature] Use CUDA event for measuring elasped time

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted

#87 - [Feature] Asyncronous Serialization

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted

#87 - [Feature] Asyncronous Serialization

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, good first issue, help wanted

#86 - [Feature] Kernel Fusion of Layer Norm and GeLU

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted

#86 - [Feature] Kernel Fusion of Layer Norm and GeLU

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted

#85 - [Feature] LAMB optimizer

Issue - State: open - Opened by xrsrke 11 months ago - 1 comment
Labels: enhancement, help wanted

#85 - [Feature] LAMB optimizer

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted

#84 - [Feature] Parallel transformer block

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, High Priority

#84 - [Feature] Parallel transformer block

Issue - State: open - Opened by xrsrke 11 months ago
Labels: enhancement, help wanted, High Priority

#83 - Add Mamba PR

Pull Request - State: closed - Opened by 3outeille 12 months ago - 1 comment

#83 - Add Mamba PR

Pull Request - State: closed - Opened by 3outeille 12 months ago - 1 comment

#82 - [Bug] Not saving `lm_head` in checkpoint

Issue - State: closed - Opened by xrsrke 12 months ago
Labels: bug

#82 - [Bug] Not saving `lm_head` in checkpoint

Issue - State: closed - Opened by xrsrke 12 months ago
Labels: bug

#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel

Pull Request - State: closed - Opened by xrsrke 12 months ago

#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel

Pull Request - State: closed - Opened by xrsrke 12 months ago

#80 - [Feature] Add loading different datasets based on training stages

Pull Request - State: closed - Opened by xrsrke 12 months ago

#80 - [Feature] Add loading different datasets based on training stages

Pull Request - State: closed - Opened by xrsrke 12 months ago

#79 - Continued Pretraining on Llama 7b.

Issue - State: open - Opened by wiseyy 12 months ago - 8 comments

#79 - Continued Pretraining on Llama 7b.

Issue - State: open - Opened by wiseyy 12 months ago - 8 comments

#78 - Continued Pretraining on Llama7b.

Issue - State: closed - Opened by wiseyy 12 months ago - 1 comment

#78 - Continued Pretraining on Llama7b.

Issue - State: closed - Opened by wiseyy 12 months ago - 1 comment

#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`

Issue - State: open - Opened by NouamaneTazi 12 months ago
Labels: enhancement, good first issue, help wanted

#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`

Issue - State: open - Opened by NouamaneTazi 12 months ago
Labels: enhancement, good first issue, help wanted