Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / pytorch/torchtune issues and pull requests

#2306 - Support for Janus-Pro series of model

Issue - State: closed - Opened by Ankur-singh 23 days ago - 2 comments

#2305 - Update LoRA DPO distributed recipe

Issue - State: closed - Opened by SalmanMohammadi 23 days ago

#2304 - Fix stop tokens in PPO

Pull Request - State: closed - Opened by RedTachyon 23 days ago - 8 comments
Labels: CLA Signed

#2303 - Move from PIL to torchvision.io.decode_image

Issue - State: open - Opened by ebsmothers 24 days ago - 12 comments
Labels: best practice, community help wanted

#2302 - Flux Model

Pull Request - State: closed - Opened by calvinpelletier 24 days ago - 1 comment
Labels: CLA Signed

#2301 - Multinode support in torchtune

Pull Request - State: closed - Opened by joecummings 24 days ago - 5 comments
Labels: CLA Signed

#2300 - Missing `<|begin_of_text|>` Token in `Llama3Tokenizer`

Issue - State: open - Opened by seungjun-green 24 days ago - 3 comments

#2299 - Step based checkpointing

Issue - State: closed - Opened by xTRam1 26 days ago - 1 comment
Labels: triage review

#2298 - [WIP] 'tune cat' command for pretty printing configuration files

Pull Request - State: closed - Opened by Ankur-singh 27 days ago - 7 comments
Labels: CLA Signed

#2297 - Training never starts - stuck after Loss is intialized

Issue - State: closed - Opened by datamancerai 27 days ago - 12 comments
Labels: discussion, triaged

#2296 - Tokens per second calculation

Issue - State: open - Opened by EugenHotaj 28 days ago - 8 comments
Labels: best practice, triage review

#2295 - Tune download command not found

Issue - State: closed - Opened by shaunakjoshi12 28 days ago - 3 comments

#2294 - How to checkpoint every N steps?

Issue - State: closed - Opened by tginart 28 days ago - 1 comment

#2293 - Remove deprecated components for 0.6.0

Pull Request - State: closed - Opened by RdoubleA 29 days ago - 1 comment
Labels: CLA Signed

#2292 - Custom DPO losses support

Pull Request - State: open - Opened by krammnic 29 days ago - 8 comments
Labels: CLA Signed

#2291 - Proper prefix handling in EarlyFusion sd hooks

Pull Request - State: closed - Opened by ebsmothers 29 days ago - 3 comments
Labels: CLA Signed

#2290 - Removing `SimPOLoss`

Pull Request - State: closed - Opened by SalmanMohammadi 29 days ago - 1 comment
Labels: CLA Signed

#2287 - deepseek r1 support?

Issue - State: open - Opened by johnnynunez 30 days ago - 14 comments
Labels: enhancement, triage review

#2286 - Documentation for evaluation on a custom dataset for a custom task

Issue - State: open - Opened by karrtikiyer about 1 month ago - 16 comments
Labels: bug, documentation, discussion, triage review

#2285 - Saving multiple checkpoints per epoch

Issue - State: open - Opened by EugenHotaj about 1 month ago - 2 comments
Labels: enhancement, triaged

#2284 - Add masking strategies to message transforms

Pull Request - State: open - Opened by supreethmanyam about 1 month ago - 3 comments
Labels: CLA Signed

#2283 - Inconsistent initialization of RoPE embedding across component builders

Issue - State: open - Opened by Ankur-singh about 1 month ago
Labels: best practice, better engineering

#2282 - Update model builders

Pull Request - State: closed - Opened by Ankur-singh about 1 month ago - 11 comments
Labels: CLA Signed

#2281 - [RFC] Proposal for `tune cat` Command

Issue - State: closed - Opened by Ankur-singh about 1 month ago - 2 comments
Labels: rfc, discussion

#2280 - Roadmap for other parallelisms

Issue - State: open - Opened by rahul-sarvam about 1 month ago - 6 comments
Labels: discussion, triaged

#2279 - _checkpoint_client not installing

Issue - State: closed - Opened by maxwellreynolds about 1 month ago - 4 comments

#2279 - _checkpoint_client not installing

Issue - State: open - Opened by maxwellreynolds about 1 month ago

#2278 - Sample packing for ConcatDataset

Pull Request - State: closed - Opened by ebsmothers about 1 month ago - 2 comments
Labels: CLA Signed

#2278 - Sample packing for ConcatDataset

Pull Request - State: closed - Opened by ebsmothers about 1 month ago - 2 comments
Labels: CLA Signed

#2277 - Llama3.2 vision does not run with distributed state dict

Issue - State: open - Opened by acisseJZhong about 1 month ago - 1 comment

#2277 - Llama3.2 vision does not run with distributed state dict

Issue - State: open - Opened by acisseJZhong about 1 month ago - 1 comment
Labels: bug, triaged

#2276 - Construct EarlyFusion's encoder_token_ids on correct device

Pull Request - State: closed - Opened by ebsmothers about 1 month ago - 1 comment
Labels: CLA Signed

#2276 - Construct EarlyFusion's encoder_token_ids on correct device

Pull Request - State: closed - Opened by ebsmothers about 1 month ago - 1 comment
Labels: CLA Signed

#2275 - Full DPO Distributed

Pull Request - State: closed - Opened by sam-pi about 1 month ago - 20 comments
Labels: CLA Signed

#2275 - Full DPO Distributed

Pull Request - State: closed - Opened by sam-pi about 1 month ago - 20 comments
Labels: CLA Signed

#2274 - Logging resolved config

Pull Request - State: closed - Opened by Ankur-singh about 1 month ago - 6 comments
Labels: CLA Signed

#2273 - The current instantiation does not trigger the initialization of submodules

Issue - State: open - Opened by dz1iang about 1 month ago - 4 comments
Labels: discussion, triaged

#2273 - The current instantiation does not trigger the initialization of submodules

Issue - State: open - Opened by dz1iang about 1 month ago - 4 comments
Labels: discussion, triaged

#2273 - The current instantiation does not trigger the initialization of submodules

Issue - State: open - Opened by dz1iang about 1 month ago - 5 comments
Labels: discussion, triaged

#2273 - The current instantiation does not trigger the initialization of submodules

Issue - State: open - Opened by dz1iang about 1 month ago - 4 comments
Labels: discussion, triaged

#2272 - DPO after / on top of LoRA tuning

Issue - State: open - Opened by albertbn about 1 month ago - 2 comments
Labels: discussion, triaged

#2272 - DPO after / on top of LoRA tuning

Issue - State: open - Opened by albertbn about 1 month ago - 3 comments
Labels: discussion, triaged

#2271 - Fix a bug in set float32 precision

Pull Request - State: closed - Opened by Nicorgi about 1 month ago - 3 comments
Labels: CLA Signed

#2271 - Fix a bug in set float32 precision

Pull Request - State: closed - Opened by Nicorgi about 1 month ago - 3 comments
Labels: CLA Signed

#2270 - Don't use ``_get_clones``

Issue - State: open - Opened by ebsmothers about 1 month ago - 8 comments
Labels: best practice, community help wanted

#2270 - Don't use ``_get_clones``

Issue - State: open - Opened by ebsmothers about 1 month ago - 8 comments
Labels: best practice, community help wanted

#2269 - Fix a bug in set float32 precision

Pull Request - State: closed - Opened by Nicorgi about 1 month ago - 1 comment
Labels: CLA Signed

#2269 - Fix a bug in set float32 precision

Pull Request - State: closed - Opened by Nicorgi about 1 month ago - 1 comment
Labels: CLA Signed

#2268 - About the CLS token for the llama3_2_vision_encoder

Issue - State: open - Opened by dfloreaa about 1 month ago - 4 comments
Labels: discussion, triaged

#2268 - About the CLS token for the llama3_2_vision_encoder

Issue - State: open - Opened by dfloreaa about 1 month ago - 4 comments
Labels: discussion, triaged

#2267 - Expose FSDP2 MixedPrecisionPolicy params

Issue - State: open - Opened by EugenHotaj about 1 month ago - 1 comment
Labels: enhancement, triaged

#2267 - Expose FSDP2 MixedPrecisionPolicy params

Issue - State: open - Opened by EugenHotaj about 1 month ago - 1 comment
Labels: enhancement, triaged

#2267 - Expose FSDP2 MixedPrecisionPolicy params

Issue - State: open - Opened by EugenHotaj about 1 month ago - 1 comment
Labels: enhancement, triaged

#2266 - [EZ] Pass seed to data sampler.

Pull Request - State: open - Opened by EugenHotaj about 1 month ago - 13 comments
Labels: CLA Signed

#2265 - Add AlpacaToMessages to message transforms doc page

Pull Request - State: closed - Opened by AndrewMead10 about 1 month ago - 1 comment
Labels: CLA Signed

#2265 - Add AlpacaToMessages to message transforms doc page

Pull Request - State: closed - Opened by AndrewMead10 about 1 month ago - 1 comment
Labels: CLA Signed

#2265 - Add AlpacaToMessages to message transforms doc page

Pull Request - State: closed - Opened by AndrewMead10 about 1 month ago - 1 comment
Labels: CLA Signed

#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem

Issue - State: closed - Opened by seekerzz about 1 month ago - 8 comments
Labels: distributed, triaged

#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem

Issue - State: closed - Opened by seekerzz about 1 month ago - 9 comments
Labels: distributed, triaged

#2264 - Training with lora_finetune_distributed is slower than single_device, profile shows that nccl is causing this problem

Issue - State: closed - Opened by seekerzz about 1 month ago - 9 comments
Labels: distributed, triaged

#2263 - adding support for LR schedule for full distributed finetune

Issue - State: open - Opened by tginart about 1 month ago - 4 comments
Labels: best practice, better engineering, triaged

#2262 - Add AlpacaToMessages to example message transforms

Issue - State: closed - Opened by RdoubleA about 1 month ago
Labels: good first issue, community help wanted, better engineering

#2261 - [RFC] Additional chat loss masking strategies

Issue - State: open - Opened by RdoubleA about 1 month ago - 2 comments
Labels: enhancement, good first issue, rfc, discussion, community help wanted

#2260 - Fix tests due to upgrade to cuda126

Pull Request - State: closed - Opened by acisseJZhong about 1 month ago - 1 comment
Labels: CLA Signed

#2259 - Downgrade cuda to 12.4

Pull Request - State: closed - Opened by acisseJZhong about 1 month ago - 1 comment
Labels: CLA Signed

#2258 - Request: adding `py.typed` for type checkers

Issue - State: open - Opened by jamesbraza about 1 month ago - 2 comments
Labels: better engineering, triaged

#2257 - Update QuantizationRecipe to use checkpointer.save_checkpoint

Pull Request - State: open - Opened by Ankur-singh about 1 month ago - 7 comments
Labels: CLA Signed

#2256 - Small formatting fix

Pull Request - State: closed - Opened by krammnic about 1 month ago - 3 comments
Labels: CLA Signed

#2255 - Qlora uses more memory than regular lora

Issue - State: open - Opened by AndrewMead10 about 1 month ago - 11 comments
Labels: triaged

#2254 - Very slow convergence with bf16

Issue - State: open - Opened by EugenHotaj about 1 month ago - 20 comments
Labels: discussion, triaged

#2253 - Pytorch 2.4.0 does not support flex_attention

Issue - State: closed - Opened by yaozengwei about 1 month ago - 2 comments

#2252 - Fix issue #2243, update the document to show correct usage

Pull Request - State: closed - Opened by insop about 1 month ago - 2 comments
Labels: CLA Signed

#2251 - Update the e2e flow tutorial to fix errors of generate

Pull Request - State: closed - Opened by iseeyuan about 1 month ago - 2 comments
Labels: CLA Signed

#2250 - Lora and Dora finetuning produces identical results

Issue - State: open - Opened by AndrewMead10 about 1 month ago - 8 comments
Labels: bug, high-priority

#2249 - profiling ops on xpu

Pull Request - State: closed - Opened by songhappy about 1 month ago - 7 comments
Labels: CLA Signed

#2248 - Log grad norm aggregated over all ranks, not just rank zero

Pull Request - State: closed - Opened by ebsmothers about 1 month ago - 1 comment
Labels: CLA Signed

#2247 - Multi-tile support in vision rope

Pull Request - State: closed - Opened by RdoubleA about 1 month ago - 2 comments
Labels: CLA Signed

#2246 - Finetuning Llama 3.1 8B Base Model on ChatML Format Dataset – Loss Reaches NaN After 2000 Steps

Issue - State: open - Opened by abdul-456 about 1 month ago - 11 comments
Labels: triaged

#2245 - Added Distributed(Tensor Parallel) Inference Recipe

Pull Request - State: closed - Opened by acisseJZhong about 1 month ago - 3 comments
Labels: CLA Signed

#2244 - Remove example inputs from aoti_compile_and_package

Pull Request - State: closed - Opened by angelayi about 1 month ago - 2 comments
Labels: CLA Signed, fb-exported

#2242 - [Small fix] Update CUDA version in README

Pull Request - State: closed - Opened by acisseJZhong about 1 month ago - 1 comment
Labels: CLA Signed

#2241 - Overriding kv cache entries in torchtune models

Issue - State: open - Opened by telgamal-1 about 1 month ago - 2 comments
Labels: discussion, triaged

#2240 - Grad Norm Differences Across Nodes

Issue - State: closed - Opened by EugenHotaj about 1 month ago - 4 comments
Labels: discussion

#2239 - Add a "division by zero" check in chunked loss handling in kd_losses.py

Pull Request - State: closed - Opened by insop about 1 month ago - 3 comments
Labels: CLA Signed

#2238 - Adds validation loss to LoRA fine tune single device

Pull Request - State: open - Opened by MaxFrax about 1 month ago - 12 comments
Labels: CLA Signed

#2237 - Finetune meta-llama/Llama-Guard-3-1B

Issue - State: open - Opened by jingzhaoou about 1 month ago - 32 comments
Labels: bug, triaged

#2236 - [EZ] Fix config bug where interpolation happens too early

Pull Request - State: closed - Opened by EugenHotaj about 1 month ago - 6 comments
Labels: CLA Signed

#2235 - not use tune run,how can I run the code.

Issue - State: closed - Opened by belle9217 about 1 month ago - 4 comments

#2234 - Add Ascend NPU as a backend for single device recipes

Pull Request - State: closed - Opened by Nicorgi about 1 month ago - 9 comments
Labels: CLA Signed

#2233 - fix convert_weights not working for Qwen2.5 HF checkpoints

Pull Request - State: closed - Opened by zhangtemplar about 1 month ago - 6 comments
Labels: CLA Signed, fb-exported

#2232 - v0.6.0 tracker

Issue - State: open - Opened by joecummings about 1 month ago

#2231 - Refactored modules/tokenizers to be a subdir of modules/transforms

Pull Request - State: closed - Opened by Ankur-singh about 2 months ago - 7 comments
Labels: CLA Signed

#2230 - Add eval config for QWEN2_5 model using 0.5B variant

Pull Request - State: closed - Opened by Ankur-singh about 2 months ago - 1 comment
Labels: CLA Signed

#2229 - quantization recipe should mimic checkpointer.save_checkpoint

Issue - State: open - Opened by felipemello1 about 2 months ago - 1 comment
Labels: better engineering

#2228 - Set default value for 'subset' parameter in the_cauldron_dataset

Pull Request - State: closed - Opened by Ankur-singh about 2 months ago - 1 comment
Labels: CLA Signed

#2227 - Change alpaca_dataset train_on_input doc to match default value

Pull Request - State: closed - Opened by mirceamironenco about 2 months ago - 1 comment
Labels: CLA Signed

#2226 - Improvement: define a protocol to handle base loss and all chunked loss.

Issue - State: open - Opened by insop about 2 months ago - 1 comment
Labels: enhancement

#2225 - Improvement: add a "division by zero" check in chunked loss handling in kd_losses.py

Issue - State: closed - Opened by insop about 2 months ago - 4 comments
Labels: enhancement