Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / pytorch/torchtune issues and pull requests
#2380 - Model weights conversion failed
Issue -
State: open - Opened by xeasonx 7 days ago
#2379 - parallelize_module() parallelism module fqn wildcard doesn't work for llama3.2
Issue -
State: open - Opened by acisseJZhong 7 days ago
#2378 - Add tests and implementation for disabling dropout layers in models
Pull Request -
State: open - Opened by Ankur-singh 7 days ago
- 1 comment
Labels: CLA Signed
#2377 - Fix Qwen config
Pull Request -
State: closed - Opened by acisseJZhong 7 days ago
- 1 comment
Labels: CLA Signed
#2376 - fix: Moved dev deps from optional-dependencies to dependency-groups
Pull Request -
State: open - Opened by bogdansalyp 7 days ago
- 1 comment
Labels: CLA Signed
#2375 - pyproject.toml wrong dev deps organization
Issue -
State: open - Opened by bogdansalyp 7 days ago
- 1 comment
#2374 - update in docs mentions of "ft-" before ckpt name, since we removed it
Issue -
State: open - Opened by felipemello1 7 days ago
Labels: community help wanted
#2373 - expose max-autotune in configs for better perf
Issue -
State: open - Opened by felipemello1 7 days ago
Labels: community help wanted
#2372 - Disable `reshard_after_forward` for last transformer layer FSDP param group
Issue -
State: open - Opened by SalmanMohammadi 8 days ago
Labels: community help wanted
#2371 - Adding new role "Tool" to support Llama 3.3 models
Issue -
State: open - Opened by init27 8 days ago
- 4 comments
#2370 - [WIP]: Get rid of optim_bwd checks via wrapper.
Pull Request -
State: open - Opened by krammnic 9 days ago
- 4 comments
Labels: CLA Signed
#2369 - Misleading import error message for torchao
Issue -
State: open - Opened by bogdansalyp 9 days ago
#2368 - fix: torch and torchvision import check
Pull Request -
State: open - Opened by bogdansalyp 9 days ago
- 2 comments
Labels: CLA Signed
#2367 - feat: Added cfg.cudnn_deterministic_mode flag
Pull Request -
State: open - Opened by bogdansalyp 9 days ago
- 5 comments
Labels: CLA Signed
#2366 - Refactor load_image to return torch.Tensor instead of PIL.Image
Pull Request -
State: open - Opened by Ankur-singh 10 days ago
- 2 comments
Labels: CLA Signed
#2365 - Implements MLFlowLogger
Pull Request -
State: open - Opened by nathan-az 10 days ago
- 11 comments
Labels: CLA Signed
#2364 - Add torchdata Parallel Packer for faster startup
Pull Request -
State: open - Opened by andrewkho 10 days ago
- 1 comment
Labels: CLA Signed
#2363 - readme updates for full DPO distributed recipe
Pull Request -
State: closed - Opened by ebsmothers 10 days ago
- 1 comment
Labels: CLA Signed
#2362 - [Fix Test] Fix failed generation test by pining pytorch nightlies
Pull Request -
State: closed - Opened by acisseJZhong 10 days ago
- 1 comment
Labels: CLA Signed
#2361 - How to change the datasets in JSON format?
Issue -
State: closed - Opened by kailashg26 10 days ago
- 3 comments
#2360 - Resume from checkpoint broken with distributed optimizer-in-backward
Issue -
State: open - Opened by ebsmothers 10 days ago
#2359 - Resume from checkpoint with distributed optimizer-in-backward repro
Pull Request -
State: open - Opened by ebsmothers 10 days ago
- 1 comment
Labels: CLA Signed
#2358 - Add mistral small
Pull Request -
State: open - Opened by AndrewMead10 11 days ago
- 1 comment
Labels: CLA Signed
#2357 - Add max-autotune try/except if flex attn breaks
Pull Request -
State: closed - Opened by felipemello1 11 days ago
- 3 comments
Labels: CLA Signed
#2356 - Generic classifier builder
Pull Request -
State: open - Opened by SalmanMohammadi 11 days ago
- 1 comment
Labels: CLA Signed
#2355 - [WIP] Support Continual Pretraining Multi Dataset using Streaming
Pull Request -
State: open - Opened by mostafaelhoushi 11 days ago
- 1 comment
Labels: CLA Signed
#2354 - Remove "ft-" prefix from checkpoint shards.
Pull Request -
State: closed - Opened by EugenHotaj 11 days ago
- 2 comments
Labels: CLA Signed
#2353 - Add a `disable_dropout` utility fn
Issue -
State: open - Opened by SalmanMohammadi 12 days ago
- 4 comments
Labels: good first issue, community help wanted, better engineering
#2352 - Incorrect Default Config File Paths for Llama 3.1 8B and Qwen 2.5 7B Models
Issue -
State: open - Opened by MaxHastings 12 days ago
- 1 comment
#2351 - Fix saving adapter weights after disabling DSD
Pull Request -
State: closed - Opened by acisseJZhong 12 days ago
- 2 comments
Labels: CLA Signed
#2350 - HF tokenizers: initial base tokenizer support
Pull Request -
State: open - Opened by ebsmothers 12 days ago
- 2 comments
Labels: CLA Signed
#2349 - Rework recipes section of README and simplify models ref
Pull Request -
State: open - Opened by joecummings 12 days ago
- 2 comments
Labels: CLA Signed
#2348 - Update README for multinode
Pull Request -
State: closed - Opened by joecummings 12 days ago
- 1 comment
Labels: CLA Signed
#2347 - Add multi node training to README
Pull Request -
State: closed - Opened by joecummings 12 days ago
- 1 comment
Labels: CLA Signed
#2346 - [Bug Fix]Disable DSD for saving ckpt
Pull Request -
State: closed - Opened by acisseJZhong 12 days ago
- 2 comments
Labels: CLA Signed
#2345 - "ft-" prefix for finetuned checkpoints
Issue -
State: closed - Opened by EugenHotaj 12 days ago
- 3 comments
#2344 - Discussion: Update dataloader to skip rows that dont require training
Issue -
State: open - Opened by felipemello1 13 days ago
- 4 comments
Labels: discussion, best practice, triage review
#2343 - Traj dpo
Pull Request -
State: open - Opened by Vattikondadheeraj 13 days ago
- 3 comments
#2342 - Update to proper EOS ids for Qwen2 and Qwen2.5
Pull Request -
State: closed - Opened by joecummings 13 days ago
- 3 comments
Labels: CLA Signed
#2341 - CEWithChunkedOutputLoss does not check division by zero
Issue -
State: open - Opened by pocca2048 14 days ago
- 6 comments
Labels: discussion, triaged
#2340 - Feature request: GRPO support
Issue -
State: open - Opened by tikikun 14 days ago
- 5 comments
#2339 - DistributedSampler has the same seed randomization
Issue -
State: closed - Opened by bogdansalyp 14 days ago
- 3 comments
#2338 - Seed: null isn't random
Issue -
State: open - Opened by bogdansalyp 14 days ago
- 1 comment
Labels: bug, triaged
#2337 - Qwen Tokenizer Excludes Last Assistant EOT Token
Issue -
State: closed - Opened by roeetal 14 days ago
- 2 comments
Labels: bug, triaged
#2336 - Update PT pin for modules/_export
Pull Request -
State: closed - Opened by Jack-Khuu 14 days ago
- 5 comments
Labels: CLA Signed, fb-exported
#2335 - Seed is not applied for DPO recipes
Issue -
State: open - Opened by bogdansalyp 14 days ago
- 3 comments
Labels: bug, triaged
#2334 - Apply gradient accumulation fix to DPO/PPO recipes
Issue -
State: open - Opened by bogdansalyp 14 days ago
- 1 comment
#2333 - Distributed DPO loss normalization by amount of tokens
Issue -
State: open - Opened by bogdansalyp 14 days ago
- 2 comments
#2332 - Loss shouldn't be averaged within one grad_acc step
Issue -
State: closed - Opened by bogdansalyp 14 days ago
- 1 comment
#2331 - added `tie_word_embeddings` to llama3_2 models
Pull Request -
State: closed - Opened by jingzhaoou 15 days ago
- 4 comments
Labels: CLA Signed
#2330 - TP + FSDP distributed training (full finetuning)
Pull Request -
State: closed - Opened by acisseJZhong 15 days ago
- 2 comments
Labels: CLA Signed
#2329 - Wandb charts show time (minutes), but I want seconds.
Issue -
State: closed - Opened by kailashg26 15 days ago
- 1 comment
#2328 - Add distributed inference for llama3.2 vision
Pull Request -
State: open - Opened by acisseJZhong 16 days ago
- 2 comments
Labels: CLA Signed
#2327 - try fix a bug for symbolic check
Pull Request -
State: open - Opened by ywq880611 17 days ago
- 3 comments
Labels: CLA Signed
#2326 - [Very WiP] R1-Style distributed GRPO
Pull Request -
State: open - Opened by RedTachyon 17 days ago
- 20 comments
Labels: CLA Signed
#2325 - FIRE Relative Positional Encodings
Issue -
State: open - Opened by kaddu341 17 days ago
- 3 comments
#2324 - Grpo & verifiable rewards dataset
Pull Request -
State: closed - Opened by ianbarber 17 days ago
- 3 comments
Labels: CLA Signed
#2323 - Reading TorchProfiler after run
Issue -
State: open - Opened by fabiogeraci 18 days ago
- 6 comments
#2322 - Use checkout@v4 / upload@v4 for docs build
Pull Request -
State: closed - Opened by joecummings 18 days ago
- 1 comment
Labels: CLA Signed
#2321 - Refactor validate missing for LoRA + deprecate param utility
Pull Request -
State: open - Opened by RdoubleA 18 days ago
- 2 comments
Labels: CLA Signed
#2320 - Classifiers (reward models) in torchtune
Issue -
State: open - Opened by EugenHotaj 18 days ago
- 4 comments
#2319 - How to run torchtune on AMD Instinct MI300X
Issue -
State: open - Opened by kailashg26 19 days ago
- 2 comments
#2318 - [WIP] 2D parallelism for training
Pull Request -
State: closed - Opened by joecummings 19 days ago
- 2 comments
Labels: CLA Signed
#2317 - fix state dict hook for early fusion models
Pull Request -
State: closed - Opened by acisseJZhong 19 days ago
- 1 comment
Labels: CLA Signed
#2316 - Call `get_world_size_and_rank` ONCE
Issue -
State: open - Opened by joecummings 19 days ago
#2315 - Rename and document `cleanup_before_training`
Issue -
State: open - Opened by joecummings 19 days ago
#2314 - Disable DSD and fix bitsandbytes test
Pull Request -
State: closed - Opened by RdoubleA 19 days ago
- 2 comments
Labels: CLA Signed
#2313 - Revert DSD to fix breakages
Pull Request -
State: closed - Opened by ebsmothers 19 days ago
- 1 comment
Labels: CLA Signed
#2312 - Investigate the optimal scenario in which to use ``torch_set_num_thread()``
Issue -
State: open - Opened by joecummings 19 days ago
#2311 - Text-to-Image Dataset and Flux Transform
Pull Request -
State: open - Opened by calvinpelletier 20 days ago
- 1 comment
Labels: CLA Signed
#2310 - Unable to reproduce QAT results from Blog
Issue -
State: open - Opened by AbhinavDutta 20 days ago
- 9 comments
#2309 - [ez] Add output_dir field to a couple configs
Pull Request -
State: closed - Opened by ebsmothers 20 days ago
- 1 comment
Labels: CLA Signed
#2308 - [EZ] Only log deprecation warning on rank zero
Pull Request -
State: closed - Opened by RdoubleA 20 days ago
- 1 comment
Labels: CLA Signed
#2307 - Differing component implementation logic across recipes
Issue -
State: open - Opened by EugenHotaj 20 days ago
- 4 comments
Labels: bug, best practice, better engineering, triaged
#2306 - Support for Janus-Pro series of model
Issue -
State: closed - Opened by Ankur-singh 20 days ago
- 2 comments
#2305 - Update LoRA DPO distributed recipe
Issue -
State: closed - Opened by SalmanMohammadi 20 days ago
#2304 - Fix stop tokens in PPO
Pull Request -
State: closed - Opened by RedTachyon 21 days ago
- 8 comments
Labels: CLA Signed
#2303 - Move from PIL to torchvision.io.decode_image
Issue -
State: open - Opened by ebsmothers 21 days ago
- 8 comments
Labels: best practice, community help wanted
#2302 - Flux Model
Pull Request -
State: open - Opened by calvinpelletier 21 days ago
- 1 comment
Labels: CLA Signed
#2301 - Multinode support in torchtune
Pull Request -
State: closed - Opened by joecummings 21 days ago
- 5 comments
Labels: CLA Signed
#2300 - Missing `<|begin_of_text|>` Token in `Llama3Tokenizer`
Issue -
State: open - Opened by seungjun-green 22 days ago
- 3 comments
#2299 - Step based checkpointing
Issue -
State: closed - Opened by xTRam1 24 days ago
- 1 comment
Labels: triage review
#2298 - [WIP] 'tune cat' command for pretty printing configuration files
Pull Request -
State: closed - Opened by Ankur-singh 24 days ago
- 7 comments
Labels: CLA Signed
#2297 - Training never starts - stuck after Loss is intialized
Issue -
State: closed - Opened by datamancerai 25 days ago
- 12 comments
Labels: discussion, triaged
#2296 - Tokens per second calculation
Issue -
State: open - Opened by EugenHotaj 25 days ago
- 8 comments
Labels: best practice, triage review
#2295 - Tune download command not found
Issue -
State: closed - Opened by shaunakjoshi12 25 days ago
- 3 comments
#2294 - How to checkpoint every N steps?
Issue -
State: closed - Opened by tginart 26 days ago
- 1 comment
#2293 - Remove deprecated components for 0.6.0
Pull Request -
State: closed - Opened by RdoubleA 26 days ago
- 1 comment
Labels: CLA Signed
#2292 - Custom DPO losses support
Pull Request -
State: open - Opened by krammnic 26 days ago
- 8 comments
Labels: CLA Signed
#2291 - Proper prefix handling in EarlyFusion sd hooks
Pull Request -
State: closed - Opened by ebsmothers 26 days ago
- 3 comments
Labels: CLA Signed
#2290 - Removing `SimPOLoss`
Pull Request -
State: closed - Opened by SalmanMohammadi 27 days ago
- 1 comment
Labels: CLA Signed
#2288 - Roadmap for distributed recipes using NPU as a backend
Issue -
State: open - Opened by Nicorgi 27 days ago
#2287 - deepseek r1 support?
Issue -
State: open - Opened by johnnynunez 27 days ago
- 10 comments
Labels: enhancement, triage review
#2286 - Documentation for evaluation on a custom dataset for a custom task
Issue -
State: open - Opened by karrtikiyer 28 days ago
- 16 comments
Labels: bug, documentation, discussion, triage review
#2285 - Saving multiple checkpoints per epoch
Issue -
State: open - Opened by EugenHotaj 28 days ago
- 2 comments
Labels: enhancement, triaged
#2284 - Add masking strategies to message transforms
Pull Request -
State: open - Opened by supreethmanyam 28 days ago
- 3 comments
Labels: CLA Signed
#2283 - Inconsistent initialization of RoPE embedding across component builders
Issue -
State: open - Opened by Ankur-singh 28 days ago
Labels: best practice, better engineering
#2282 - Update model builders
Pull Request -
State: closed - Opened by Ankur-singh 29 days ago
- 11 comments
Labels: CLA Signed
#2281 - [RFC] Proposal for `tune cat` Command
Issue -
State: closed - Opened by Ankur-singh 29 days ago
- 2 comments
Labels: rfc, discussion
#2280 - Roadmap for other parallelisms
Issue -
State: open - Opened by rahul-sarvam 30 days ago
- 6 comments
Labels: discussion, triaged