huggingface/nanotron issues and pull requests

#382 - SmolLM3 nanotron->hf converter

Pull Request - State: open - Opened by anton-l 25 days ago

#381 - Removed assertion for s3 datasets and handled string and object cases

Pull Request - State: open - Opened by SulRash 30 days ago

#380 - Fixed nanoset data stage handling during pretraining

Pull Request - State: open - Opened by SulRash 30 days ago

#379 - Fix issue while running tiny llama script on ADA 4000 gpu

Pull Request - State: open - Opened by chetandhembre about 1 month ago - 3 comments

#378 - Extra name argument to select configuration of hf dataset

Pull Request - State: open - Opened by SulRash about 1 month ago

#377 - Fixed llama parameterization config use

Pull Request - State: open - Opened by SulRash about 1 month ago

#376 - SmolLM3 training 🚀

Pull Request - State: closed - Opened by NouamaneTazi about 1 month ago

#375 - SmoLM3 training 🚀

Pull Request - State: closed - Opened by NouamaneTazi about 1 month ago

#374 - lighteval fixes

Pull Request - State: open - Opened by NouamaneTazi about 1 month ago

#373 - Expert Parallelism

Pull Request - State: open - Opened by xrsrke about 2 months ago

#372 - datatrove need numpy>=2.0.0 bug nanotron 0.4 requires numpy<2, how to fix?

Issue - State: open - Opened by lxyyang about 2 months ago

#371 - Getting UnionMatchError error when trying to run official examples in SmolLM

Issue - State: open - Opened by rekcu 2 months ago - 1 comment

#370 - [WIP] Fix Llama inference

Pull Request - State: open - Opened by duynht 2 months ago

#369 - [BUG] Bug in Llama inference

Issue - State: open - Opened by duynht 2 months ago

#368 - [feature] Add debug_dataloader_samples utility to preview decoded dataloader samples (#184)

Pull Request - State: open - Opened by garongkim 2 months ago

#367 - Hynky/lighteval fix

Pull Request - State: open - Opened by hynky1999 3 months ago

#366 - cp

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#365 - logmixin

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#364 - [feature] Add debug_dataloader_samples utility to preview decoded dataloader samples (#184)

Pull Request - State: closed - Opened by garongkim 3 months ago - 1 comment

#363 - fix makefile, sync with datatrove, update lighteval config

Pull Request - State: closed - Opened by hynky1999 3 months ago

#362 - Issues running tiny Llama quick start example

Issue - State: open - Opened by marluxiaboss 3 months ago

#361 - Expert Parallelism

Pull Request - State: open - Opened by xrsrke 3 months ago

#360 - Nouamane/lighteval fix

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#359 - Update README.md

Pull Request - State: closed - Opened by eliebak 3 months ago

#358 - deepwiki

Pull Request - State: closed - Opened by eliebak 3 months ago

#357 - quick typo fix in readme

Pull Request - State: closed - Opened by eliebak 3 months ago

#356 - Nouamane/lighteval

Pull Request - State: closed - Opened by NouamaneTazi 3 months ago

#355 - MoE without token dropping

Pull Request - State: closed - Opened by xrsrke 3 months ago

#354 - amend previous pr

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#353 - fix init and init scaling factor and run evals in background

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago - 1 comment

#352 - run evals in background

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#351 - Request for Clarification on Validation in Nanotron

Issue - State: open - Opened by tpavankalyan 4 months ago

#350 - DRAFT: Add per domain logging and improve validation mechanism

Pull Request - State: open - Opened by paultltc 4 months ago

#349 - fix init and init scaling factor and run evals in background

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#348 - can only merge to main from dev

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#347 - Confusing variable name in PP configuration assert fail

Issue - State: open - Opened by wirthFelix 4 months ago

#346 - [Feature] Implement CUDA event-based timing for improved GPU performa…

Pull Request - State: closed - Opened by grewalsk 4 months ago - 1 comment

#345 - remove gc when ckpt

Pull Request - State: closed - Opened by eliebak 4 months ago

#344 - Nouamane/wandb

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#343 - Fix cuseqlen

Pull Request - State: closed - Opened by eliebak 4 months ago - 1 comment

#342 - Nouamane/timers

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#341 - new dataloader

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#340 - temp wo rope

Pull Request - State: closed - Opened by eliebak 4 months ago - 1 comment

#339 - Fix UnBoundLocalError in `clm_collator.py`

Pull Request - State: closed - Opened by c8ef 4 months ago - 2 comments

#338 - quicks

Pull Request - State: open - Opened by NouamaneTazi 4 months ago

#337 - calcuate mean token accuracy metric while training

Pull Request - State: open - Opened by kashif 4 months ago

#336 - [WIP] Add multilingual evals

Pull Request - State: open - Opened by anton-l 4 months ago

#335 - Nanoset clean up old index

Pull Request - State: closed - Opened by eliebak 4 months ago

#334 - weka script

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#333 - supporting Qwen-2.5 series ？

Issue - State: open - Opened by JoshonSmith 4 months ago - 1 comment

#332 - Logging outlier batch

Pull Request - State: open - Opened by eliebak 4 months ago

#331 - add sanity check to see the batch diversity

Pull Request - State: open - Opened by eliebak 4 months ago

#330 - Fix trufflehog false positives

Pull Request - State: closed - Opened by lewtun 4 months ago - 1 comment

#329 - Use `uv` for installation and fix tiny Llama config

Pull Request - State: closed - Opened by lewtun 4 months ago

#328 - Nope

Pull Request - State: closed - Opened by eliebak 4 months ago

#327 - reshape rotary_part

Pull Request - State: closed - Opened by loubnabnl 4 months ago - 1 comment

#326 - docmasking

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago - 1 comment

#325 - Nouamane/optis3

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#324 - Intra document attention with FA2

Pull Request - State: closed - Opened by eliebak 4 months ago - 1 comment

#323 - Update nanoset build index + Flexing the attention

Pull Request - State: closed - Opened by eliebak 4 months ago - 1 comment

#322 - [WIP] Able to parse np.int64(1440) + allow passing .yaml to evaluator

Pull Request - State: closed - Opened by Stillerman 4 months ago

#321 - change to true

Pull Request - State: closed - Opened by eliebak 4 months ago - 1 comment

#320 - typo

Pull Request - State: closed - Opened by eliebak 4 months ago

#319 - position_ids

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#318 - some kernels

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#317 - fix flash_attn

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#316 - quick fix revert nanoset position ids for now

Pull Request - State: closed - Opened by eliebak 4 months ago

#315 - sdpa fixes

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#314 - fix attn

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#313 - correct vocab in nanoset

Pull Request - State: closed - Opened by eliebak 4 months ago

#312 - quick fix main

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#311 - Add weight decay per layer

Pull Request - State: closed - Opened by eliebak 4 months ago

#310 - add zloss

Pull Request - State: closed - Opened by eliebak 4 months ago

#309 - non-blocking dataloading + qols

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#308 - Zloss

Pull Request - State: closed - Opened by eliebak 4 months ago

#307 - fix: allow conversion from checkpoints trained with llama.py or qwen.py

Pull Request - State: closed - Opened by Stillerman 4 months ago - 1 comment

#306 - Abnormal grad norm during pre-training

Issue - State: closed - Opened by SinclairCoder 4 months ago - 1 comment

#305 - Improve wanbd logging

Pull Request - State: closed - Opened by eliebak 4 months ago - 2 comments

#304 - add custom wd

Pull Request - State: closed - Opened by eliebak 4 months ago

#303 - add is_causal attribute for FA2

Pull Request - State: closed - Opened by loubnabnl 4 months ago

#302 - Error when initializing DistributedTrainer

Issue - State: open - Opened by manuelbrack 4 months ago

#301 - docs: fix path section links

Pull Request - State: closed - Opened by guspan-tanadi 4 months ago

#300 - Ademamix

Pull Request - State: open - Opened by eliebak 4 months ago

#299 - flex-attention

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#298 - Muon

Pull Request - State: open - Opened by eliebak 4 months ago

#297 - Nouamane/sft2

Pull Request - State: closed - Opened by NouamaneTazi 4 months ago

#296 - Any plans for supporting Qwen-2.5 series models?

Issue - State: open - Opened by SinclairCoder 5 months ago

#295 - SFT and bunch of features

Pull Request - State: closed - Opened by NouamaneTazi 5 months ago

#294 - prepare for v0.5

Pull Request - State: closed - Opened by NouamaneTazi 5 months ago

#293 - why my nanoset is empty

Issue - State: open - Opened by ziyanxzy 5 months ago

#292 - [Feature] Hide 75% of the communication in tensor parallelism using DoMiNo

Pull Request - State: open - Opened by xrsrke 5 months ago

#291 - does Nanotron support AMSP (a new DP shard strategy)

Issue - State: closed - Opened by ChenQiaoling00 5 months ago

#290 - [WIP] Distillation

Pull Request - State: open - Opened by Stillerman 5 months ago

#289 - Fix unpacking issue caused by newer Flash Attention

Pull Request - State: open - Opened by Stillerman 5 months ago - 1 comment

#288 - Update PR template

Pull Request - State: closed - Opened by NouamaneTazi 5 months ago

#287 - Add pr template

Pull Request - State: closed - Opened by NouamaneTazi 5 months ago

#286 - [Feature] Over 99% communication overlap in Tensor Parallelism using Domino

Pull Request - State: open - Opened by hwchen2017 5 months ago - 4 comments

GitHub / huggingface/nanotron issues and pull requests