Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/nanotron issues and pull requests

#245 - Small fixes when resuming training

Pull Request - State: open - Opened by NouamaneTazi 4 days ago

#243 - Fix initial_lr when resuming training

Pull Request - State: open - Opened by Lauler 5 days ago - 1 comment

#240 - Support for AMD GPUs

Issue - State: open - Opened by YuTian8328 14 days ago

#239 - Pretraining dataset too large to fit into memory

Issue - State: closed - Opened by alexchen4ai 18 days ago

#238 - Load random states from checkpoint

Pull Request - State: open - Opened by gritukan 21 days ago

#237 - Nanoset reuses same shuffling for repeated data (epochs)

Issue - State: open - Opened by Lauler 23 days ago - 1 comment

#236 - CUDA_DEVICE_MAX_CONNECTIONS

Issue - State: open - Opened by jeromeku 24 days ago

#234 - Ci move

Pull Request - State: closed - Opened by glegendre01 about 2 months ago

#233 - Learning rate restart broken with Nanoset?

Issue - State: open - Opened by Pclanglais about 2 months ago - 10 comments

#222 - lighteval support after checkpoint, UX refactor

Pull Request - State: open - Opened by eliebak 3 months ago - 5 comments

#198 - [Bug] Missing `_is_using_mup` when resume checkpoint

Issue - State: open - Opened by xrsrke 5 months ago - 1 comment
Labels: bug, good first issue, help wanted

#180 - FEAT: Adding 1.58bit LLMs training architecture in nanotron

Pull Request - State: open - Opened by MekkCyber 6 months ago - 5 comments

#174 - Llama3 conversion scripts πŸ¦™

Pull Request - State: open - Opened by TJ-Solergibert 6 months ago - 9 comments

#160 - Enable masking when tp=1

Pull Request - State: closed - Opened by YongjunHe 7 months ago

#100 - Fix some bugs

Pull Request - State: closed - Opened by jordane95 9 months ago - 1 comment

#100 - Fix some bugs

Pull Request - State: closed - Opened by jordane95 9 months ago - 1 comment

#99 - [Features] support gradient checkpointing for memory saving

Issue - State: open - Opened by zguo0525 9 months ago - 1 comment

#99 - [Features] support gradient checkpointing for memory saving

Issue - State: open - Opened by zguo0525 9 months ago - 1 comment

#98 - [Refactor] Refactoring Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago - 1 comment

#98 - [Refactor] Refactoring Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago - 1 comment

#97 - [Quick fix] fix circular import in logging

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#97 - [Quick fix] fix circular import in logging

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#96 - Bump v0.4 + Quick refactos

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#96 - Bump v0.4 + Quick refactos

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#95 - [DoReMi] Small refactors

Pull Request - State: closed - Opened by xrsrke 9 months ago

#95 - [DoReMi] Small refactors

Pull Request - State: closed - Opened by xrsrke 9 months ago

#94 - [Feature] Refactor ParallelContext.world_rank_matrix

Pull Request - State: closed - Opened by 0xkerem 9 months ago - 7 comments

#94 - [Feature] Refactor ParallelContext.world_rank_matrix

Pull Request - State: closed - Opened by 0xkerem 9 months ago - 7 comments

#93 - [Docs] Add unit tests as a requirement

Pull Request - State: closed - Opened by xrsrke 9 months ago - 1 comment

#93 - [Docs] Add unit tests as a requirement

Pull Request - State: closed - Opened by xrsrke 9 months ago - 1 comment

#92 - [Bug] Fix clipping gradients's test

Issue - State: closed - Opened by xrsrke 9 months ago
Labels: bug, good first issue, help wanted, High Priority

#92 - [Bug] Fix clipping gradients's test

Issue - State: closed - Opened by xrsrke 9 months ago
Labels: bug, good first issue, help wanted, High Priority

#91 - [Feature] All GPUs within the same TP group load training data from shared memory

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, Low Priority

#91 - [Feature] All GPUs within the same TP group load training data from shared memory

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, Low Priority

#90 - [Unit Test] Add unit tests for DistributedTrainer

Issue - State: open - Opened by xrsrke 9 months ago - 5 comments
Labels: good first issue, help wanted, High Priority

#90 - [Unit Test] Add unit tests for DistributedTrainer

Issue - State: open - Opened by xrsrke 9 months ago - 5 comments
Labels: good first issue, help wanted, High Priority

#89 - [Unit Test] Add unit test for DoReMi's trainer

Issue - State: open - Opened by xrsrke 9 months ago
Labels: Medium Priority

#89 - [Unit Test] Add unit test for DoReMi's trainer

Issue - State: open - Opened by xrsrke 9 months ago
Labels: Medium Priority

#88 - [Feature] Use CUDA event for measuring elasped time

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted

#88 - [Feature] Use CUDA event for measuring elasped time

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted

#87 - [Feature] Asyncronous Serialization

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted

#87 - [Feature] Asyncronous Serialization

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, good first issue, help wanted

#86 - [Feature] Kernel Fusion of Layer Norm and GeLU

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted

#86 - [Feature] Kernel Fusion of Layer Norm and GeLU

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted

#85 - [Feature] LAMB optimizer

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted

#85 - [Feature] LAMB optimizer

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted

#84 - [Feature] Parallel transformer block

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, High Priority

#84 - [Feature] Parallel transformer block

Issue - State: open - Opened by xrsrke 9 months ago
Labels: enhancement, help wanted, High Priority

#83 - Add Mamba PR

Pull Request - State: closed - Opened by 3outeille 9 months ago - 1 comment

#83 - Add Mamba PR

Pull Request - State: closed - Opened by 3outeille 9 months ago - 1 comment

#82 - [Bug] Not saving `lm_head` in checkpoint

Issue - State: closed - Opened by xrsrke 9 months ago
Labels: bug

#82 - [Bug] Not saving `lm_head` in checkpoint

Issue - State: closed - Opened by xrsrke 9 months ago
Labels: bug

#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel

Pull Request - State: closed - Opened by xrsrke 9 months ago

#81 - [Fix] Assert the wrong tolerance of FA2's Layer Norm kernel

Pull Request - State: closed - Opened by xrsrke 9 months ago

#80 - [Feature] Add loading different datasets based on training stages

Pull Request - State: closed - Opened by xrsrke 9 months ago

#80 - [Feature] Add loading different datasets based on training stages

Pull Request - State: closed - Opened by xrsrke 9 months ago

#79 - Continued Pretraining on Llama 7b.

Issue - State: open - Opened by wiseyy 9 months ago - 8 comments

#79 - Continued Pretraining on Llama 7b.

Issue - State: open - Opened by wiseyy 9 months ago - 8 comments

#78 - Continued Pretraining on Llama7b.

Issue - State: closed - Opened by wiseyy 9 months ago - 1 comment

#78 - Continued Pretraining on Llama7b.

Issue - State: closed - Opened by wiseyy 9 months ago - 1 comment

#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`

Issue - State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, good first issue, help wanted

#77 - [Feature] Refactor `ParallelContext.world_rank_matrix`

Issue - State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, good first issue, help wanted

#76 - Deprecate `recompute_granularity` in config

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#76 - Deprecate `recompute_granularity` in config

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#75 - Refactor dMoE

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#75 - Refactor dMoE

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#74 - [Feature] Fix support for sequence parallelism with MoEs

Issue - State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, help wanted

#74 - [Feature] Fix support for sequence parallelism with MoEs

Issue - State: open - Opened by NouamaneTazi 9 months ago
Labels: enhancement, help wanted

#73 - Add MoEs support

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#73 - Add MoEs support

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#72 - Support Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#72 - Support Expert Parallelism

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#71 - Implement pipeline parallel size-agnostic optimizer state loading

Pull Request - State: closed - Opened by nopperl 9 months ago

#71 - Implement pipeline parallel size-agnostic optimizer state loading

Pull Request - State: closed - Opened by nopperl 9 months ago

#70 - [FP8 Training] End-to-end FP8 Training

Pull Request - State: open - Opened by xrsrke 9 months ago - 3 comments

#70 - [FP8 Training] End-to-end FP8 Training

Pull Request - State: open - Opened by xrsrke 9 months ago - 3 comments

#69 - Refactor `ParallelContext` and some process groups creation

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#69 - Refactor `ParallelContext` and some process groups creation

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#68 - Fix topology agnostic loading

Pull Request - State: closed - Opened by nopperl 9 months ago - 1 comment

#68 - Fix topology agnostic loading

Pull Request - State: closed - Opened by nopperl 9 months ago - 1 comment

#67 - fix configs

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#67 - fix configs

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#66 - quick fix train steps assertion

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago - 1 comment

#66 - quick fix train steps assertion

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago - 1 comment

#65 - quick fix train steps assertion

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#65 - quick fix train steps assertion

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#64 - Update bench script

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#64 - Update bench script

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#63 - [`Docs`] Fix typos

Pull Request - State: closed - Opened by tolgacangoz 9 months ago

#63 - [`Docs`] Fix typos

Pull Request - State: closed - Opened by tolgacangoz 9 months ago

#62 - Refactoring tying mechanism + small fixes

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#62 - Refactoring tying mechanism + small fixes

Pull Request - State: closed - Opened by NouamaneTazi 9 months ago

#61 - [Feature request] Performance and accuracy benchmarks

Issue - State: open - Opened by brianyu-nexusflowai 10 months ago - 2 comments

#61 - [Feature request] Performance and accuracy benchmarks

Issue - State: open - Opened by brianyu-nexusflowai 10 months ago - 2 comments

#60 - Lighteval naming

Pull Request - State: closed - Opened by thomwolf 10 months ago

#60 - Lighteval naming

Pull Request - State: closed - Opened by thomwolf 10 months ago