Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / pytorch/torchtune issues and pull requests
#2306 - Support for Janus-Pro series of model
Issue -
State: closed - Opened by Ankur-singh 23 days ago
- 2 comments
#2305 - Update LoRA DPO distributed recipe
Issue -
State: closed - Opened by SalmanMohammadi 23 days ago
#2304 - Fix stop tokens in PPO
Pull Request -
State: closed - Opened by RedTachyon 23 days ago
- 8 comments
Labels: CLA Signed
#2303 - Move from PIL to torchvision.io.decode_image
Issue -
State: open - Opened by ebsmothers 24 days ago
- 12 comments
Labels: best practice, community help wanted
#2302 - Flux Model
Pull Request -
State: closed - Opened by calvinpelletier 24 days ago
- 1 comment
Labels: CLA Signed
#2301 - Multinode support in torchtune
Pull Request -
State: closed - Opened by joecummings 24 days ago
- 5 comments
Labels: CLA Signed
#2300 - Missing `<|begin_of_text|>` Token in `Llama3Tokenizer`
Issue -
State: open - Opened by seungjun-green 24 days ago
- 3 comments
#2299 - Step based checkpointing
Issue -
State: closed - Opened by xTRam1 26 days ago
- 1 comment
Labels: triage review
#2298 - [WIP] 'tune cat' command for pretty printing configuration files
Pull Request -
State: closed - Opened by Ankur-singh 27 days ago
- 7 comments
Labels: CLA Signed
#2297 - Training never starts - stuck after Loss is intialized
Issue -
State: closed - Opened by datamancerai 27 days ago
- 12 comments
Labels: discussion, triaged
#2296 - Tokens per second calculation
Issue -
State: open - Opened by EugenHotaj 28 days ago
- 8 comments
Labels: best practice, triage review
#2295 - Tune download command not found
Issue -
State: closed - Opened by shaunakjoshi12 28 days ago
- 3 comments
#2294 - How to checkpoint every N steps?
Issue -
State: closed - Opened by tginart 28 days ago
- 1 comment
#2293 - Remove deprecated components for 0.6.0
Pull Request -
State: closed - Opened by RdoubleA 29 days ago
- 1 comment
Labels: CLA Signed
#2292 - Custom DPO losses support
Pull Request -
State: open - Opened by krammnic 29 days ago
- 8 comments
Labels: CLA Signed
#2291 - Proper prefix handling in EarlyFusion sd hooks
Pull Request -
State: closed - Opened by ebsmothers 29 days ago
- 3 comments
Labels: CLA Signed
#2290 - Removing `SimPOLoss`
Pull Request -
State: closed - Opened by SalmanMohammadi 29 days ago
- 1 comment
Labels: CLA Signed
#2288 - Roadmap for distributed recipes using NPU as a backend
Issue -
State: open - Opened by Nicorgi 29 days ago
#2287 - deepseek r1 support?
Issue -
State: open - Opened by johnnynunez 30 days ago
- 14 comments
Labels: enhancement, triage review
#2286 - Documentation for evaluation on a custom dataset for a custom task
Issue -
State: open - Opened by karrtikiyer about 1 month ago
- 16 comments
Labels: bug, documentation, discussion, triage review
#2285 - Saving multiple checkpoints per epoch
Issue -
State: open - Opened by EugenHotaj about 1 month ago
- 2 comments
Labels: enhancement, triaged
#2284 - Add masking strategies to message transforms
Pull Request -
State: open - Opened by supreethmanyam about 1 month ago
- 3 comments
Labels: CLA Signed
#2283 - Inconsistent initialization of RoPE embedding across component builders
Issue -
State: open - Opened by Ankur-singh about 1 month ago
Labels: best practice, better engineering
#2282 - Update model builders
Pull Request -
State: closed - Opened by Ankur-singh about 1 month ago
- 11 comments
Labels: CLA Signed
#2281 - [RFC] Proposal for `tune cat` Command
Issue -
State: closed - Opened by Ankur-singh about 1 month ago
- 2 comments
Labels: rfc, discussion
#2280 - Roadmap for other parallelisms
Issue -
State: open - Opened by rahul-sarvam about 1 month ago
- 6 comments
Labels: discussion, triaged
#2279 - _checkpoint_client not installing
Issue -
State: closed - Opened by maxwellreynolds about 1 month ago
- 4 comments
#2279 - _checkpoint_client not installing
Issue -
State: open - Opened by maxwellreynolds about 1 month ago
#2278 - Sample packing for ConcatDataset
Pull Request -
State: closed - Opened by ebsmothers about 1 month ago
- 2 comments
Labels: CLA Signed
#2278 - Sample packing for ConcatDataset
Pull Request -
State: closed - Opened by ebsmothers about 1 month ago
- 2 comments
Labels: CLA Signed
#2277 - Llama3.2 vision does not run with distributed state dict
Issue -
State: open - Opened by acisseJZhong about 1 month ago
- 1 comment
#2277 - Llama3.2 vision does not run with distributed state dict
Issue -
State: open - Opened by acisseJZhong about 1 month ago
- 1 comment
Labels: bug, triaged
#2276 - Construct EarlyFusion's encoder_token_ids on correct device
Pull Request -
State: closed - Opened by ebsmothers about 1 month ago
- 1 comment
Labels: CLA Signed
#2276 - Construct EarlyFusion's encoder_token_ids on correct device
Pull Request -
State: closed - Opened by ebsmothers about 1 month ago
- 1 comment
Labels: CLA Signed
#2275 - Full DPO Distributed
Pull Request -
State: closed - Opened by sam-pi about 1 month ago
- 20 comments
Labels: CLA Signed
#2275 - Full DPO Distributed
Pull Request -
State: closed - Opened by sam-pi about 1 month ago
- 20 comments
Labels: CLA Signed
#2274 - Logging resolved config
Pull Request -
State: closed - Opened by Ankur-singh about 1 month ago
- 6 comments
Labels: CLA Signed
#2273 - The current instantiation does not trigger the initialization of submodules
Issue -
State: open - Opened by dz1iang about 1 month ago
- 4 comments
Labels: discussion, triaged
#2273 - The current instantiation does not trigger the initialization of submodules
Issue -
State: open - Opened by dz1iang about 1 month ago
- 4 comments
Labels: discussion, triaged
#2273 - The current instantiation does not trigger the initialization of submodules
Issue -
State: open - Opened by dz1iang about 1 month ago
- 5 comments
Labels: discussion, triaged
#2273 - The current instantiation does not trigger the initialization of submodules
Issue -
State: open - Opened by dz1iang about 1 month ago
- 4 comments
Labels: discussion, triaged
#2272 - DPO after / on top of LoRA tuning
Issue -
State: open - Opened by albertbn about 1 month ago
- 2 comments
Labels: discussion, triaged
#2272 - DPO after / on top of LoRA tuning
Issue -
State: open - Opened by albertbn about 1 month ago
- 3 comments
Labels: discussion, triaged
#2271 - Fix a bug in set float32 precision
Pull Request -
State: closed - Opened by Nicorgi about 1 month ago
- 3 comments
Labels: CLA Signed
#2271 - Fix a bug in set float32 precision
Pull Request -
State: closed - Opened by Nicorgi about 1 month ago
- 3 comments
Labels: CLA Signed
#2270 - Don't use ``_get_clones``
Issue -
State: open - Opened by ebsmothers about 1 month ago
- 8 comments
Labels: best practice, community help wanted
#2270 - Don't use ``_get_clones``
Issue -
State: open - Opened by ebsmothers about 1 month ago
- 8 comments
Labels: best practice, community help wanted
#2269 - Fix a bug in set float32 precision
Pull Request -
State: closed - Opened by Nicorgi about 1 month ago
- 1 comment
Labels: CLA Signed
#2269 - Fix a bug in set float32 precision
Pull Request -
State: closed - Opened by Nicorgi about 1 month ago
- 1 comment
Labels: CLA Signed
#2268 - About the CLS token for the llama3_2_vision_encoder
Issue -
State: open - Opened by dfloreaa about 1 month ago
- 4 comments
Labels: discussion, triaged
#2268 - About the CLS token for the llama3_2_vision_encoder
Issue -
State: open - Opened by dfloreaa about 1 month ago
- 4 comments
Labels: discussion, triaged
#2267 - Expose FSDP2 MixedPrecisionPolicy params
Issue -
State: open - Opened by EugenHotaj about 1 month ago
- 1 comment
Labels: enhancement, triaged
#2267 - Expose FSDP2 MixedPrecisionPolicy params
Issue -
State: open - Opened by EugenHotaj about 1 month ago
- 1 comment
Labels: enhancement, triaged
#2267 - Expose FSDP2 MixedPrecisionPolicy params
Issue -
State: open - Opened by EugenHotaj about 1 month ago
- 1 comment
Labels: enhancement, triaged
#2266 - [EZ] Pass seed to data sampler.
Pull Request -
State: open - Opened by EugenHotaj about 1 month ago
- 13 comments
Labels: CLA Signed
#2265 - Add AlpacaToMessages to message transforms doc page
Pull Request -
State: closed - Opened by AndrewMead10 about 1 month ago
- 1 comment
Labels: CLA Signed
#2265 - Add AlpacaToMessages to message transforms doc page
Pull Request -
State: closed - Opened by AndrewMead10 about 1 month ago
- 1 comment
Labels: CLA Signed
#2265 - Add AlpacaToMessages to message transforms doc page
Pull Request -
State: closed - Opened by AndrewMead10 about 1 month ago
- 1 comment
Labels: CLA Signed
#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem
Issue -
State: closed - Opened by seekerzz about 1 month ago
- 8 comments
Labels: distributed, triaged
#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem
Issue -
State: closed - Opened by seekerzz about 1 month ago
- 9 comments
Labels: distributed, triaged
#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem
Issue -
State: closed - Opened by seekerzz about 1 month ago
- 9 comments
Labels: distributed, triaged
#2263 - adding support for LR schedule for full distributed finetune
Issue -
State: open - Opened by tginart about 1 month ago
- 4 comments
Labels: best practice, better engineering, triaged
#2262 - Add AlpacaToMessages to example message transforms
Issue -
State: closed - Opened by RdoubleA about 1 month ago
Labels: good first issue, community help wanted, better engineering
#2261 - [RFC] Additional chat loss masking strategies
Issue -
State: open - Opened by RdoubleA about 1 month ago
- 2 comments
Labels: enhancement, good first issue, rfc, discussion, community help wanted
#2260 - Fix tests due to upgrade to cuda126
Pull Request -
State: closed - Opened by acisseJZhong about 1 month ago
- 1 comment
Labels: CLA Signed
#2259 - Downgrade cuda to 12.4
Pull Request -
State: closed - Opened by acisseJZhong about 1 month ago
- 1 comment
Labels: CLA Signed
#2258 - Request: adding `py.typed` for type checkers
Issue -
State: open - Opened by jamesbraza about 1 month ago
- 2 comments
Labels: better engineering, triaged
#2257 - Update QuantizationRecipe to use checkpointer.save_checkpoint
Pull Request -
State: open - Opened by Ankur-singh about 1 month ago
- 7 comments
Labels: CLA Signed
#2256 - Small formatting fix
Pull Request -
State: closed - Opened by krammnic about 1 month ago
- 3 comments
Labels: CLA Signed
#2255 - Qlora uses more memory than regular lora
Issue -
State: open - Opened by AndrewMead10 about 1 month ago
- 11 comments
Labels: triaged
#2254 - Very slow convergence with bf16
Issue -
State: open - Opened by EugenHotaj about 1 month ago
- 20 comments
Labels: discussion, triaged
#2253 - Pytorch 2.4.0 does not support flex_attention
Issue -
State: closed - Opened by yaozengwei about 1 month ago
- 2 comments
#2252 - Fix issue #2243, update the document to show correct usage
Pull Request -
State: closed - Opened by insop about 1 month ago
- 2 comments
Labels: CLA Signed
#2251 - Update the e2e flow tutorial to fix errors of generate
Pull Request -
State: closed - Opened by iseeyuan about 1 month ago
- 2 comments
Labels: CLA Signed
#2250 - Lora and Dora finetuning produces identical results
Issue -
State: open - Opened by AndrewMead10 about 1 month ago
- 8 comments
Labels: bug, high-priority
#2249 - profiling ops on xpu
Pull Request -
State: closed - Opened by songhappy about 1 month ago
- 7 comments
Labels: CLA Signed
#2248 - Log grad norm aggregated over all ranks, not just rank zero
Pull Request -
State: closed - Opened by ebsmothers about 1 month ago
- 1 comment
Labels: CLA Signed
#2247 - Multi-tile support in vision rope
Pull Request -
State: closed - Opened by RdoubleA about 1 month ago
- 2 comments
Labels: CLA Signed
#2246 - Finetuning Llama 3.1 8B Base Model on ChatML Format Dataset – Loss Reaches NaN After 2000 Steps
Issue -
State: open - Opened by abdul-456 about 1 month ago
- 11 comments
Labels: triaged
#2245 - Added Distributed(Tensor Parallel) Inference Recipe
Pull Request -
State: closed - Opened by acisseJZhong about 1 month ago
- 3 comments
Labels: CLA Signed
#2244 - Remove example inputs from aoti_compile_and_package
Pull Request -
State: closed - Opened by angelayi about 1 month ago
- 2 comments
Labels: CLA Signed, fb-exported
#2243 - Potential issue in prompt handling in `generate()` in `torchtune/recipes/generate.py`
Issue -
State: closed - Opened by insop about 1 month ago
- 6 comments
#2242 - [Small fix] Update CUDA version in README
Pull Request -
State: closed - Opened by acisseJZhong about 1 month ago
- 1 comment
Labels: CLA Signed
#2241 - Overriding kv cache entries in torchtune models
Issue -
State: open - Opened by telgamal-1 about 1 month ago
- 2 comments
Labels: discussion, triaged
#2240 - Grad Norm Differences Across Nodes
Issue -
State: closed - Opened by EugenHotaj about 1 month ago
- 4 comments
Labels: discussion
#2239 - Add a "division by zero" check in chunked loss handling in kd_losses.py
Pull Request -
State: closed - Opened by insop about 1 month ago
- 3 comments
Labels: CLA Signed
#2238 - Adds validation loss to LoRA fine tune single device
Pull Request -
State: open - Opened by MaxFrax about 1 month ago
- 12 comments
Labels: CLA Signed
#2237 - Finetune meta-llama/Llama-Guard-3-1B
Issue -
State: open - Opened by jingzhaoou about 1 month ago
- 32 comments
Labels: bug, triaged
#2236 - [EZ] Fix config bug where interpolation happens too early
Pull Request -
State: closed - Opened by EugenHotaj about 1 month ago
- 6 comments
Labels: CLA Signed
#2235 - not use tune run,how can I run the code.
Issue -
State: closed - Opened by belle9217 about 1 month ago
- 4 comments
#2234 - Add Ascend NPU as a backend for single device recipes
Pull Request -
State: closed - Opened by Nicorgi about 1 month ago
- 9 comments
Labels: CLA Signed
#2233 - fix convert_weights not working for Qwen2.5 HF checkpoints
Pull Request -
State: closed - Opened by zhangtemplar about 1 month ago
- 6 comments
Labels: CLA Signed, fb-exported
#2232 - v0.6.0 tracker
Issue -
State: open - Opened by joecummings about 1 month ago
#2231 - Refactored modules/tokenizers to be a subdir of modules/transforms
Pull Request -
State: closed - Opened by Ankur-singh about 2 months ago
- 7 comments
Labels: CLA Signed
#2230 - Add eval config for QWEN2_5 model using 0.5B variant
Pull Request -
State: closed - Opened by Ankur-singh about 2 months ago
- 1 comment
Labels: CLA Signed
#2229 - quantization recipe should mimic checkpointer.save_checkpoint
Issue -
State: open - Opened by felipemello1 about 2 months ago
- 1 comment
Labels: better engineering
#2228 - Set default value for 'subset' parameter in the_cauldron_dataset
Pull Request -
State: closed - Opened by Ankur-singh about 2 months ago
- 1 comment
Labels: CLA Signed
#2227 - Change alpaca_dataset train_on_input doc to match default value
Pull Request -
State: closed - Opened by mirceamironenco about 2 months ago
- 1 comment
Labels: CLA Signed
#2226 - Improvement: define a protocol to handle base loss and all chunked loss.
Issue -
State: open - Opened by insop about 2 months ago
- 1 comment
Labels: enhancement
#2225 - Improvement: add a "division by zero" check in chunked loss handling in kd_losses.py
Issue -
State: closed - Opened by insop about 2 months ago
- 4 comments
Labels: enhancement