AnswerDotAI/fsdp_qlora issues and pull requests

#72 - Add option for local 'custom.jsonl' dataset with llama3 prompt format

Pull Request - State: open - Opened by chrismrutherford 8 days ago

#72 - Add option for local 'custom.jsonl' dataset with llama3 prompt format

Pull Request - State: open - Opened by chrismrutherford 8 days ago

#71 - Fix: RuntimeError, Error(s) in loading state_dict for PeftModelForCau…

Pull Request - State: open - Opened by chwenjun225 21 days ago

#71 - Fix: RuntimeError, Error(s) in loading state_dict for PeftModelForCau…

Pull Request - State: open - Opened by chwenjun225 21 days ago

#70 - `Converting the State Dict.ipynb` - Runtine error because Unexpected keys

Issue - State: closed - Opened by chwenjun225 21 days ago - 1 comment

#70 - `Converting the State Dict.ipynb` - Runtine error because Unexpected keys

Issue - State: closed - Opened by chwenjun225 21 days ago - 1 comment

#69 - How to fine-tune a Vision Language Model (VLM)?

Issue - State: open - Opened by asmith26 21 days ago

#69 - How to fine-tune a Vision Language Model (VLM)?

Issue - State: open - Opened by asmith26 21 days ago

#68 - DoRA training not taking dropout or alpha into account

Issue - State: open - Opened by BenjaminBossan about 1 month ago

#67 - [FEATURE] Profiling Improvements

Pull Request - State: closed - Opened by jeromeku 4 months ago - 7 comments

#67 - [FEATURE] Profiling Improvements

Pull Request - State: closed - Opened by jeromeku 4 months ago - 7 comments

#66 - Add profiling to train.py

Pull Request - State: closed - Opened by austinvhuang 5 months ago - 1 comment

#66 - Add profiling to train.py

Pull Request - State: closed - Opened by austinvhuang 5 months ago - 1 comment

#65 - Create benchmarks_03_2024.md

Pull Request - State: closed - Opened by johnowhitaker 5 months ago

#65 - Create benchmarks_03_2024.md

Pull Request - State: closed - Opened by johnowhitaker 5 months ago

#64 - Add n_bits param to pass to hqq

Pull Request - State: closed - Opened by UmerHA 5 months ago - 1 comment

#64 - Add n_bits param to pass to hqq

Pull Request - State: closed - Opened by UmerHA 5 months ago - 1 comment

#63 - train.py

Issue - State: open - Opened by mylesgoose 5 months ago - 1 comment

#63 - train.py

Issue - State: open - Opened by mylesgoose 5 months ago - 1 comment

#62 - fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase)

Pull Request - State: open - Opened by geronimi73 5 months ago

#62 - fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase)

Pull Request - State: open - Opened by geronimi73 5 months ago

#61 - ValueError report

Issue - State: open - Opened by mxjmtxrm 5 months ago

#61 - ValueError report

Issue - State: open - Opened by mxjmtxrm 5 months ago

#60 - Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference

Issue - State: open - Opened by iseesaw 5 months ago - 4 comments

#60 - Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference

Issue - State: open - Opened by iseesaw 5 months ago - 4 comments

#59 - Question about GPU memory usage.

Issue - State: open - Opened by mxjmtxrm 5 months ago

#59 - Question about GPU memory usage.

Issue - State: open - Opened by mxjmtxrm 5 months ago

#58 - DeepSeek VL support

Issue - State: open - Opened by SinanAkkoyun 5 months ago

#58 - DeepSeek VL support

Issue - State: open - Opened by SinanAkkoyun 5 months ago

#57 - How does one load and do inference on fine-tuned LLama 3 using bnb_dora train script?

Issue - State: open - Opened by pe-hy 5 months ago

#57 - How does one load and do inference on fine-tuned LLama 3 using bnb_dora train script?

Issue - State: open - Opened by pe-hy 5 months ago

#56 - BOFT support?

Issue - State: open - Opened by Xynonners 5 months ago

#56 - BOFT support?

Issue - State: open - Opened by Xynonners 5 months ago

#55 - Can i use this script to pre-train models?

Issue - State: open - Opened by brunocruzfranchi 5 months ago

#55 - Can i use this script to pre-train models?

Issue - State: open - Opened by brunocruzfranchi 5 months ago

#54 - Issues with LLaMA-3-70B

Issue - State: closed - Opened by catid 5 months ago - 1 comment

#53 - llama3?

Issue - State: open - Opened by lianuo 5 months ago

#52 - Llama pro

Pull Request - State: closed - Opened by KeremTurgutlu 5 months ago

#51 - DoRA

Pull Request - State: closed - Opened by KeremTurgutlu 5 months ago

#50 - Results after running

Issue - State: open - Opened by hsb1995 5 months ago

#49 - What if I have three graphics cards?

Issue - State: open - Opened by lianuo 5 months ago - 1 comment

#48 - How to load the saved model?

Issue - State: open - Opened by bilalghanem 5 months ago

#47 - process 0 terminated with signal SIGKILL

Issue - State: open - Opened by hsb1995 6 months ago - 4 comments

#46 - how to inference using 70b? or we need to implement it with the same way to train it by ourself?

Issue - State: open - Opened by yaohwang 6 months ago - 1 comment

#45 - nan when the input length is large

Issue - State: open - Opened by bilalghanem 6 months ago - 5 comments

#44 - Why is o_proj not targetted?

Issue - State: open - Opened by GRcharles 6 months ago

#43 - Question about adding / training Mixtral

Issue - State: open - Opened by chrismrutherford 6 months ago - 1 comment

#42 - Q on comparison with SFTTrainer

Issue - State: open - Opened by RonanKMcGovern 6 months ago

#41 - Dual GPU training instantly powers off my desktop

Issue - State: closed - Opened by rationalism 6 months ago - 6 comments

#40 - /opt/conda/conda-bld/pytorch_1708025847130/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.

Issue - State: open - Opened by yaohwang 6 months ago

#39 - print reserved memory, log allocated & reserved

Pull Request - State: closed - Opened by warner-benjamin 6 months ago

#38 - Bigger context size?

Issue - State: open - Opened by LoganALJones 6 months ago

#37 - train.py script crashes when using HQQ

Issue - State: open - Opened by rationalism 6 months ago - 3 comments

#36 - (minor) add type hints to train.py

Pull Request - State: closed - Opened by Liberatedwinner 6 months ago

#35 - Add dataset_samples argument for alpaca_sample and dummy datasets

Pull Request - State: closed - Opened by warner-benjamin 6 months ago

#34 - Torch Compile?

Issue - State: open - Opened by jzhang38 6 months ago

#33 - Support torch model bin format

Pull Request - State: open - Opened by okdshin 6 months ago

#32 - Example with AMD ROCm/HIP

Issue - State: closed - Opened by ehartford 6 months ago - 4 comments

#31 - Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Issue - State: open - Opened by Iron-Bound 6 months ago

#30 - move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase

Pull Request - State: closed - Opened by geronimi73 6 months ago - 17 comments

#29 - Fine tuning only runs on CPU

Issue - State: open - Opened by diabeticpilot 6 months ago - 4 comments

#28 - RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase

Issue - State: closed - Opened by geronimi73 6 months ago - 3 comments

#27 - Update 00-profile_lora_qlora.ipynb

Pull Request - State: open - Opened by eltociear 6 months ago

#26 - bugs for fine-tune fsdp multinode

Issue - State: open - Opened by batman-do 6 months ago - 1 comment

#25 - Running into CUDA out of memory with hqq_lora

Issue - State: closed - Opened by zabirauf 6 months ago - 3 comments

#24 - ProcessExitedException: process 0 (2x 4090)

Issue - State: open - Opened by Pugio 7 months ago - 39 comments

#23 - Update README.md

Pull Request - State: closed - Opened by Xorlent 7 months ago - 1 comment

#22 - NCCL issue training with two GPUs

Issue - State: open - Opened by deepankarsharma 7 months ago - 2 comments

#21 - Drop trailing slash from last line in command prompts for easier copy…

Pull Request - State: closed - Opened by deepankarsharma 7 months ago - 1 comment

#20 - fix typo in readme

Pull Request - State: closed - Opened by danromuald 7 months ago - 2 comments

#19 - Training from e

Issue - State: closed - Opened by maderix 7 months ago - 1 comment

#18 - [`Docs`] Update bnb installation guidelines for users that want to install bnb from source

Pull Request - State: closed - Opened by younesbelkada 7 months ago - 4 comments

#17 - Add arguments for reentrant_checkpointing & wandb

Pull Request - State: closed - Opened by warner-benjamin 7 months ago

#16 - Add Apache Open-Source License

Pull Request - State: closed - Opened by warner-benjamin 7 months ago

#15 - Release

Pull Request - State: closed - Opened by johnowhitaker 7 months ago

#14 - Add HQQ support and prepare for release

Pull Request - State: closed - Opened by warner-benjamin 7 months ago

#13 - Update README.md

Pull Request - State: closed - Opened by johnowhitaker 7 months ago

#12 - Scaling experiments

Pull Request - State: closed - Opened by KeremTurgutlu 7 months ago - 1 comment

#11 - License

Issue - State: closed - Opened by fakerybakery 7 months ago

#10 - Sfttrainer equivalent

Pull Request - State: closed - Opened by johnowhitaker 8 months ago - 1 comment

#9 - Finetuning benchmarking experiments

Pull Request - State: closed - Opened by KeremTurgutlu 8 months ago - 1 comment

#8 - correct logged lost

Pull Request - State: closed - Opened by johnowhitaker 8 months ago

#7 - Add guanaco dataset

Pull Request - State: closed - Opened by johnowhitaker 8 months ago

#6 - Add `no_sync` support, fix gradient accumulation, and logging and argument improvements

Pull Request - State: closed - Opened by warner-benjamin 8 months ago - 1 comment

#5 - Refactor, add Doc Strings, and add Type Hints for code readability

Pull Request - State: closed - Opened by warner-benjamin 8 months ago - 1 comment

#4 - Low Memory works with QLoRA

Pull Request - State: closed - Opened by warner-benjamin 8 months ago

#3 - Ft enhancements

Pull Request - State: closed - Opened by KeremTurgutlu 8 months ago

#2 - add lr sched

Pull Request - State: closed - Opened by johnowhitaker 8 months ago

#1 - Ft enhancements

Pull Request - State: closed - Opened by johnowhitaker 8 months ago

GitHub / AnswerDotAI/fsdp_qlora issues and pull requests