philschmid/deep-learning-pytorch-huggingface issues and pull requests

#78 - try to run aha-grpo !

Issue - State: closed - Opened by curater 9 days ago - 2 comments

#77 - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

Issue - State: open - Opened by wotulong 14 days ago - 2 comments

#76 - IndexError: pop from an empty deque in run_r1_grpo.py

Issue - State: open - Opened by JunMa11 17 days ago - 3 comments

#75 - R1 grpo

Pull Request - State: closed - Opened by philschmid 18 days ago

#74 - Dpo 2025

Pull Request - State: closed - Opened by philschmid 26 days ago

#73 - Labels leaking in training data

Issue - State: open - Opened by hamzahass about 1 month ago

#72 - Typo fine-tune-llms-in-2025.ipynb

Pull Request - State: open - Opened by ChristianBernhard about 2 months ago - 3 comments

#71 - Modernberg

Pull Request - State: closed - Opened by philschmid about 2 months ago

#70 - 2025

Pull Request - State: closed - Opened by philschmid about 2 months ago

#69 - Added crucial comment in fine-tune-embedding-model-for-rag.ipynb

Pull Request - State: open - Opened by ChristianBernhard 2 months ago

#68 - Fixed typo in fine-tune-embedding-model-for-rag.ipynb

Pull Request - State: open - Opened by ChristianBernhard 2 months ago

#67 - Update gemma-lora-example.ipynb

Pull Request - State: open - Opened by qgallouedec 3 months ago

#66 - [`embeddings`] Update (positive, anchor) to (anchor, positive)

Pull Request - State: closed - Opened by tomaarsen 3 months ago

#65 - Fine-tune-llm-in-2024-with-trl.ipynb for LLAMA3.2

Issue - State: open - Opened by anas-zafar 4 months ago

#64 - Vlm

Pull Request - State: closed - Opened by philschmid 5 months ago

#63 - Fp inference

Pull Request - State: closed - Opened by philschmid 5 months ago

#62 - Update deepseed-flan-t5-summarization.ipynb

Pull Request - State: open - Opened by sharmax-vandana 5 months ago

#61 - ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32

Issue - State: open - Opened by daje0601 5 months ago

#60 - OOM error in FSDP QLORA setup

Issue - State: open - Opened by ss8319 6 months ago

#59 - added spec example

Pull Request - State: closed - Opened by philschmid 6 months ago

#58 - Regarding the OOM issues with fine-tuning Flan-T5-xl

Issue - State: open - Opened by Alsodream 7 months ago

#57 - Update instruction-tune-llama-2-int4.ipynb

Pull Request - State: closed - Opened by Rishav-hub 7 months ago - 1 comment

#56 - Quantization question:

Issue - State: open - Opened by aptum11 8 months ago

#55 - Not able to run training/fsdp-qlora-distributed-llama3.ipynb

Issue - State: closed - Opened by aasthavar 8 months ago - 7 comments

#54 - St

Pull Request - State: closed - Opened by philschmid 9 months ago

#53 - Clean up some typos; simplify some code; update some comments

Pull Request - State: closed - Opened by tomaarsen 9 months ago

#52 - Deprecation warnings.

Issue - State: open - Opened by hohoCode 9 months ago

#51 - Fine-tune-llm-in-2024-with-trl.ipynb not producing the outputs

Issue - State: open - Opened by scigeek72 9 months ago

#50 - Fsdp qlora

Pull Request - State: closed - Opened by philschmid 10 months ago

#49 - Out of Memory: Cannot reproduce T5-XXL run on 8xA10G.

Issue - State: open - Opened by slai-natanijel 11 months ago - 3 comments

#48 - What's the use of "messages" in dpo step?

Issue - State: open - Opened by katopz 11 months ago

#47 - question about DeepSpeedPeftCallback

Issue - State: open - Opened by mickeysun0104 12 months ago

#46 - Gemma

Pull Request - State: closed - Opened by philschmid 12 months ago

#45 - Re. fine-tune-llms-in-2024-with-trl.ipynb

Issue - State: open - Opened by andysingal 12 months ago - 1 comment

#44 - Dpo

Pull Request - State: closed - Opened by philschmid 12 months ago

#43 - Target modules all-linear not found in the base model.

Issue - State: closed - Opened by kassemsabeh about 1 year ago - 6 comments

#42 - Commit Version Bug Fix

Pull Request - State: closed - Opened by YanSte about 1 year ago - 1 comment

#41 - Trl

Pull Request - State: closed - Opened by philschmid about 1 year ago

#40 - flash attention error on instruction tune llama-2 tutorial on Sagemaker notebook

Issue - State: open - Opened by matthewchung74 over 1 year ago - 2 comments

#39 - Precision Issue

Issue - State: open - Opened by zihaohe123 over 1 year ago - 4 comments

#38 - Falcon-180B "forward() got an unexpected keyword argument 'position_ids'"

Issue - State: open - Opened by aittalam over 1 year ago

#37 - Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

Issue - State: open - Opened by ibicdev over 1 year ago - 11 comments

#36 - Ds lora

Pull Request - State: closed - Opened by philschmid over 1 year ago

#35 - Instruction tuning of LLama2 is significantly slower compared to documented 3 hours fine-tuning time on A10G.

Issue - State: open - Opened by mlscientist2 over 1 year ago - 1 comment

#34 - Compute metrics while using SFT trainer

Issue - State: open - Opened by shubhamagarwal92 over 1 year ago - 1 comment

#33 - Cannot load tokenizer for llama2

Issue - State: closed - Opened by smreddy05 over 1 year ago - 1 comment

#32 - LLama 2 Flash Attention Patch Not Working For 70B

Issue - State: open - Opened by mallorbc over 1 year ago - 6 comments

#31 - Gptq

Pull Request - State: closed - Opened by philschmid over 1 year ago

#30 - Fix Flash Attention forward for Llama-2 70b

Pull Request - State: closed - Opened by davidmrau over 1 year ago - 8 comments

#29 - Is the DataCollator necessary in peft-flan-t5-int8-summarization.ipynb ?

Issue - State: open - Opened by brooksbp over 1 year ago

#28 - question about the llama instruction code

Issue - State: closed - Opened by yeontaek over 1 year ago - 8 comments

#27 - improvements

Pull Request - State: closed - Opened by philschmid over 1 year ago

#26 - Llama patch for FlashAttention support fails with use_cache

Issue - State: open - Opened by qmdnls over 1 year ago - 2 comments

#25 - How to create a json file for create_flan_t5_cnn_dataset.py

Issue - State: open - Opened by andysingal over 1 year ago - 1 comment

#24 - gcc/cuda used for training

Issue - State: open - Opened by danyaljj over 1 year ago - 1 comment

#23 - fix

Pull Request - State: closed - Opened by philschmid over 1 year ago

#22 - Flash attention

Pull Request - State: closed - Opened by philschmid over 1 year ago - 4 comments

#21 - Falcon int4

Pull Request - State: closed - Opened by philschmid over 1 year ago

#20 - added container image for training

Pull Request - State: closed - Opened by philschmid over 1 year ago

#19 - CPU offload when not using offload deepspeed config file

Issue - State: open - Opened by siddharthvaria over 1 year ago - 3 comments

#18 - Error when training peft model example

Issue - State: open - Opened by Tachyon5 over 1 year ago - 6 comments

#17 - Colab notebook fails

Issue - State: closed - Opened by TzurV over 1 year ago - 1 comment

#16 - CUDA OOM error while saving the model

Issue - State: closed - Opened by aasthavar almost 2 years ago - 10 comments

#15 - Does deepspeed partition the model to multi GPUs?

Issue - State: open - Opened by vikki7777 almost 2 years ago - 4 comments

#14 - ValueError

Issue - State: open - Opened by Martok10 almost 2 years ago - 4 comments

#13 - Inference on CNN validation set takes 2+ hours on p4dn.24xlarge machine with 8 A100s, 40GB each

Issue - State: open - Opened by sverneka almost 2 years ago - 5 comments

#12 - FLAN-T5 XXL using DeepSpeed fits well for training but gives OOM error for inference.

Issue - State: open - Opened by irshadbhat almost 2 years ago - 2 comments

#11 - Sample inference script for FLAN-T5 XXL using DeepSpeed & Hugging Face.

Issue - State: closed - Opened by irshadbhat almost 2 years ago - 7 comments

#10 - Error when finetuning Flan-T5-XXL on custom dataset

Issue - State: open - Opened by ngun7 almost 2 years ago - 1 comment

#9 - Peft flan

Pull Request - State: closed - Opened by philschmid almost 2 years ago

#8 - fix small bugs of deepseed-flan-t5-summarization.ipynb

Pull Request - State: closed - Opened by yao-matrix almost 2 years ago - 2 comments

GitHub / philschmid/deep-learning-pytorch-huggingface issues and pull requests