GitHub / huggingface/nanotron issues and pull requests
#382 - SmolLM3 nanotron->hf converter
Pull Request -
State: open - Opened by anton-l 25 days ago
#381 - Removed assertion for s3 datasets and handled string and object cases
Pull Request -
State: open - Opened by SulRash 30 days ago
#380 - Fixed nanoset data stage handling during pretraining
Pull Request -
State: open - Opened by SulRash 30 days ago
#379 - Fix issue while running tiny llama script on ADA 4000 gpu
Pull Request -
State: open - Opened by chetandhembre about 1 month ago
- 3 comments
#378 - Extra name argument to select configuration of hf dataset
Pull Request -
State: open - Opened by SulRash about 1 month ago
#377 - Fixed llama parameterization config use
Pull Request -
State: open - Opened by SulRash about 1 month ago
#376 - SmolLM3 training 🚀
Pull Request -
State: closed - Opened by NouamaneTazi about 1 month ago
#375 - SmoLM3 training 🚀
Pull Request -
State: closed - Opened by NouamaneTazi about 1 month ago
#374 - lighteval fixes
Pull Request -
State: open - Opened by NouamaneTazi about 1 month ago
#373 - Expert Parallelism
Pull Request -
State: open - Opened by xrsrke about 2 months ago
#372 - datatrove need numpy>=2.0.0 bug nanotron 0.4 requires numpy<2, how to fix?
Issue -
State: open - Opened by lxyyang about 2 months ago
#371 - Getting UnionMatchError error when trying to run official examples in SmolLM
Issue -
State: open - Opened by rekcu 2 months ago
- 1 comment
#370 - [WIP] Fix Llama inference
Pull Request -
State: open - Opened by duynht 2 months ago
#369 - [BUG] Bug in Llama inference
Issue -
State: open - Opened by duynht 2 months ago
#368 - [feature] Add debug_dataloader_samples utility to preview decoded dataloader samples (#184)
Pull Request -
State: open - Opened by garongkim 2 months ago
#367 - Hynky/lighteval fix
Pull Request -
State: open - Opened by hynky1999 3 months ago
#366 - cp
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#365 - logmixin
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#364 - [feature] Add debug_dataloader_samples utility to preview decoded dataloader samples (#184)
Pull Request -
State: closed - Opened by garongkim 3 months ago
- 1 comment
#363 - fix makefile, sync with datatrove, update lighteval config
Pull Request -
State: closed - Opened by hynky1999 3 months ago
#362 - Issues running tiny Llama quick start example
Issue -
State: open - Opened by marluxiaboss 3 months ago
#361 - Expert Parallelism
Pull Request -
State: open - Opened by xrsrke 3 months ago
#360 - Nouamane/lighteval fix
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#359 - Update README.md
Pull Request -
State: closed - Opened by eliebak 3 months ago
#358 - deepwiki
Pull Request -
State: closed - Opened by eliebak 3 months ago
#357 - quick typo fix in readme
Pull Request -
State: closed - Opened by eliebak 3 months ago
#356 - Nouamane/lighteval
Pull Request -
State: closed - Opened by NouamaneTazi 3 months ago
#355 - MoE without token dropping
Pull Request -
State: closed - Opened by xrsrke 3 months ago
#354 - amend previous pr
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#353 - fix init and init scaling factor and run evals in background
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
- 1 comment
#352 - run evals in background
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#351 - Request for Clarification on Validation in Nanotron
Issue -
State: open - Opened by tpavankalyan 4 months ago
#350 - DRAFT: Add per domain logging and improve validation mechanism
Pull Request -
State: open - Opened by paultltc 4 months ago
#349 - fix init and init scaling factor and run evals in background
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#348 - can only merge to main from dev
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#347 - Confusing variable name in PP configuration assert fail
Issue -
State: open - Opened by wirthFelix 4 months ago
#346 - [Feature] Implement CUDA event-based timing for improved GPU performa…
Pull Request -
State: closed - Opened by grewalsk 4 months ago
- 1 comment
#345 - remove gc when ckpt
Pull Request -
State: closed - Opened by eliebak 4 months ago
#344 - Nouamane/wandb
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#343 - Fix cuseqlen
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 1 comment
#342 - Nouamane/timers
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#341 - new dataloader
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#340 - temp wo rope
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 1 comment
#339 - Fix UnBoundLocalError in `clm_collator.py`
Pull Request -
State: closed - Opened by c8ef 4 months ago
- 2 comments
#338 - quicks
Pull Request -
State: open - Opened by NouamaneTazi 4 months ago
#337 - calcuate mean token accuracy metric while training
Pull Request -
State: open - Opened by kashif 4 months ago
#336 - [WIP] Add multilingual evals
Pull Request -
State: open - Opened by anton-l 4 months ago
#335 - Nanoset clean up old index
Pull Request -
State: closed - Opened by eliebak 4 months ago
#334 - weka script
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#333 - supporting Qwen-2.5 series ?
Issue -
State: open - Opened by JoshonSmith 4 months ago
- 1 comment
#332 - Logging outlier batch
Pull Request -
State: open - Opened by eliebak 4 months ago
#331 - add sanity check to see the batch diversity
Pull Request -
State: open - Opened by eliebak 4 months ago
#330 - Fix trufflehog false positives
Pull Request -
State: closed - Opened by lewtun 4 months ago
- 1 comment
#329 - Use `uv` for installation and fix tiny Llama config
Pull Request -
State: closed - Opened by lewtun 4 months ago
#328 - Nope
Pull Request -
State: closed - Opened by eliebak 4 months ago
#327 - reshape rotary_part
Pull Request -
State: closed - Opened by loubnabnl 4 months ago
- 1 comment
#326 - docmasking
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
- 1 comment
#325 - Nouamane/optis3
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#324 - Intra document attention with FA2
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 1 comment
#323 - Update nanoset build index + Flexing the attention
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 1 comment
#322 - [WIP] Able to parse np.int64(1440) + allow passing .yaml to evaluator
Pull Request -
State: closed - Opened by Stillerman 4 months ago
#321 - change to true
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 1 comment
#320 - typo
Pull Request -
State: closed - Opened by eliebak 4 months ago
#319 - position_ids
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#318 - some kernels
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#317 - fix flash_attn
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#316 - quick fix revert nanoset position ids for now
Pull Request -
State: closed - Opened by eliebak 4 months ago
#315 - sdpa fixes
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#314 - fix attn
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#313 - correct vocab in nanoset
Pull Request -
State: closed - Opened by eliebak 4 months ago
#312 - quick fix main
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#311 - Add weight decay per layer
Pull Request -
State: closed - Opened by eliebak 4 months ago
#310 - add zloss
Pull Request -
State: closed - Opened by eliebak 4 months ago
#309 - non-blocking dataloading + qols
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#308 - Zloss
Pull Request -
State: closed - Opened by eliebak 4 months ago
#307 - fix: allow conversion from checkpoints trained with llama.py or qwen.py
Pull Request -
State: closed - Opened by Stillerman 4 months ago
- 1 comment
#306 - Abnormal grad norm during pre-training
Issue -
State: closed - Opened by SinclairCoder 4 months ago
- 1 comment
#305 - Improve wanbd logging
Pull Request -
State: closed - Opened by eliebak 4 months ago
- 2 comments
#304 - add custom wd
Pull Request -
State: closed - Opened by eliebak 4 months ago
#303 - add is_causal attribute for FA2
Pull Request -
State: closed - Opened by loubnabnl 4 months ago
#302 - Error when initializing DistributedTrainer
Issue -
State: open - Opened by manuelbrack 4 months ago
#301 - docs: fix path section links
Pull Request -
State: closed - Opened by guspan-tanadi 4 months ago
#300 - Ademamix
Pull Request -
State: open - Opened by eliebak 4 months ago
#299 - flex-attention
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#298 - Muon
Pull Request -
State: open - Opened by eliebak 4 months ago
#297 - Nouamane/sft2
Pull Request -
State: closed - Opened by NouamaneTazi 4 months ago
#296 - Any plans for supporting Qwen-2.5 series models?
Issue -
State: open - Opened by SinclairCoder 5 months ago
#295 - SFT and bunch of features
Pull Request -
State: closed - Opened by NouamaneTazi 5 months ago
#294 - prepare for v0.5
Pull Request -
State: closed - Opened by NouamaneTazi 5 months ago
#293 - why my nanoset is empty
Issue -
State: open - Opened by ziyanxzy 5 months ago
#292 - [Feature] Hide 75% of the communication in tensor parallelism using DoMiNo
Pull Request -
State: open - Opened by xrsrke 5 months ago
#291 - does Nanotron support AMSP (a new DP shard strategy)
Issue -
State: closed - Opened by ChenQiaoling00 5 months ago
#290 - [WIP] Distillation
Pull Request -
State: open - Opened by Stillerman 5 months ago
#289 - Fix unpacking issue caused by newer Flash Attention
Pull Request -
State: open - Opened by Stillerman 5 months ago
- 1 comment
#288 - Update PR template
Pull Request -
State: closed - Opened by NouamaneTazi 5 months ago
#287 - Add pr template
Pull Request -
State: closed - Opened by NouamaneTazi 5 months ago
#286 - [Feature] Over 99% communication overlap in Tensor Parallelism using Domino
Pull Request -
State: open - Opened by hwchen2017 5 months ago
- 4 comments
#285 - [Feature] Hide 75% of the communication in tensor parallelism using DoMiNo
Pull Request -
State: closed - Opened by xrsrke 5 months ago
- 2 comments
#284 - [Feature] DoMiNO with 62% communication hiding in tensor parallelism
Pull Request -
State: closed - Opened by xrsrke 5 months ago
- 1 comment
#283 - Torchao link in ultrascale playbook is broken
Issue -
State: closed - Opened by danielvegamyhre 5 months ago