mallorbc/Finetune_LLMs issues and pull requests

#23 - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB

Issue - State: open - Opened by shifu-learner about 1 year ago - 3 comments

#22 - Unable to find image 'gpt:latest' locally

Issue - State: closed - Opened by csaben about 1 year ago - 1 comment

#21 - Update trl_finetune.py

Pull Request - State: closed - Opened by wjfu99 over 1 year ago - 2 comments

#20 - "nvcc fatal : Unsupported gpu architechture 'compute_89'" with docker image

Issue - State: closed - Opened by ZizoAdam over 1 year ago - 3 comments

#19 - gradient overflow when training 13b Llama Model on 7 a100s

Issue - State: open - Opened by awrd2019 almost 2 years ago - 1 comment

#18 - Can't find a valid checkpoint

Issue - State: closed - Opened by judyhappy almost 2 years ago - 1 comment

#17 - cannot import name 'GPTNeoXForCausalLM' from 'transformers'

Issue - State: closed - Opened by judyhappy almost 2 years ago - 1 comment

#16 - Running super slow on 4 a100 gpus

Issue - State: closed - Opened by awrd2019 almost 2 years ago - 2 comments

#15 - Sends Kill to process when trying to resume a finetune on LLaMA 7B

Issue - State: closed - Opened by Pathos14489 almost 2 years ago - 2 comments

#14 - File: Dockerfile Line:32

Issue - State: closed - Opened by iamnmn9 almost 2 years ago - 1 comment

#13 - [QUESTION] single_texts vs group_texts

Issue - State: closed - Opened by agademic almost 2 years ago - 2 comments

#12 - DeepSpeedZeRoOffload initialize [end]

Issue - State: closed - Opened by arain60gb about 2 years ago - 4 comments

#11 - RuntimeError: Error building extension 'cpu_adam'

Issue - State: closed - Opened by arain60gb about 2 years ago - 5 comments

#10 - How to make the inference of GPT-J run on multiple GPU ?

Issue - State: closed - Opened by 22Mukesh22 about 2 years ago - 2 comments

#9 - RuntimeError: The expanded size of the tensor (50257) must match the existing size (0) at non-singleton dimension 0. Target sizes: [50257]. Tensor sizes: [0]

Issue - State: closed - Opened by CrackerHax over 2 years ago - 2 comments

#8 - `RuntimeError: Error building extension 'cpu_adam'AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

Issue - State: closed - Opened by ghost over 2 years ago - 1 comment

#7 - Training data format for generating Scenario based MCQ's

Issue - State: closed - Opened by shrey10926 almost 3 years ago - 2 comments

#6 - Incorrect block size?

Issue - State: closed - Opened by jdwx almost 3 years ago - 3 comments

#5 - fix: repeated linux kernel OOM killer invocations while finetuning

Pull Request - State: closed - Opened by MihaiBalint about 3 years ago - 1 comment

#4 - fix #3: pin to the newest versions of deepspeed, transformers datasets

Pull Request - State: closed - Opened by MihaiBalint about 3 years ago

Ecosyste.ms: Issues

GitHub / mallorbc/Finetune_LLMs issues and pull requests