jiaweizzhao/GaLore issues and pull requests

#63 - pad_token_id

Issue - State: closed - Opened by xay2001 29 days ago

#62 - the problem of warmup step and num training step

Issue - State: closed - Opened by BIGKnight 2 months ago

#62 - the problem of warmup step and num training step

Issue - State: closed - Opened by BIGKnight 2 months ago

#61 - loss figure data

Issue - State: open - Opened by BaohaoLiao 2 months ago

#61 - loss figure data

Issue - State: open - Opened by BaohaoLiao 2 months ago

#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)

Issue - State: open - Opened by liveck 3 months ago - 1 comment

#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)

Issue - State: open - Opened by liveck 3 months ago - 1 comment

#59 - Results vs FP32

Issue - State: open - Opened by tsengalb99 4 months ago

#59 - Results vs FP32

Issue - State: open - Opened by tsengalb99 4 months ago

#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values

Issue - State: open - Opened by akjindal53244 4 months ago - 1 comment

#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values

Issue - State: open - Opened by akjindal53244 4 months ago - 1 comment

#57 - Figure 1 clarification on batch size and sequence length

Issue - State: open - Opened by psandovalsegura 4 months ago - 1 comment

#57 - Figure 1 clarification on batch size and sequence length

Issue - State: open - Opened by psandovalsegura 4 months ago - 1 comment

#56 - Questions about glue task report scores

Issue - State: open - Opened by MYT677 4 months ago

#56 - Questions about glue task report scores

Issue - State: open - Opened by MYT677 4 months ago

#55 - Support for DDP with multi-gpus

Issue - State: open - Opened by seongjunyun 4 months ago

#55 - Support for DDP with multi-gpus

Issue - State: open - Opened by seongjunyun 4 months ago

#54 - Why not reproject the internal Adam states during update_proj_gap?

Issue - State: open - Opened by liuliu 5 months ago - 2 comments

#54 - Why not reproject the internal Adam states during update_proj_gap?

Issue - State: open - Opened by liuliu 5 months ago - 2 comments

#53 - Does galore save gradient memory?

Issue - State: open - Opened by jinqixiao 5 months ago - 1 comment

#53 - Does galore save gradient memory?

Issue - State: open - Opened by jinqixiao 5 months ago - 1 comment

#52 - (Question) About glue tasks

Issue - State: open - Opened by ZhichaoWang091732 5 months ago - 3 comments

#52 - (Question) About glue tasks

Issue - State: open - Opened by ZhichaoWang091732 5 months ago - 3 comments

#51 - Galore finetuning #stopped

Issue - State: open - Opened by j-datta 5 months ago

#51 - Galore finetuning #stopped

Issue - State: open - Opened by j-datta 5 months ago

#50 - Update galore_projector.py

Pull Request - State: closed - Opened by jetaudio 5 months ago

#50 - Update galore_projector.py

Pull Request - State: closed - Opened by jetaudio 5 months ago

#49 - Memory issue

Issue - State: closed - Opened by fakerybakery 6 months ago - 2 comments

#49 - Memory issue

Issue - State: closed - Opened by fakerybakery 6 months ago - 2 comments

#48 - Extend GaLore Algorithm for General Tensor Decomposition

Pull Request - State: closed - Opened by Robertboy18 6 months ago

#48 - Extend GaLore Algorithm for General Tensor Decomposition

Pull Request - State: closed - Opened by Robertboy18 6 months ago

#47 - IndexError: tuple index out of range

Issue - State: open - Opened by zyushun 6 months ago - 11 comments

#47 - IndexError: tuple index out of range

Issue - State: open - Opened by zyushun 6 months ago - 11 comments

#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

Issue - State: open - Opened by Minami-su 6 months ago - 1 comment

#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

Issue - State: open - Opened by Minami-su 6 months ago - 1 comment

#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

Issue - State: open - Opened by bhavnicksm 6 months ago - 1 comment

#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

Issue - State: open - Opened by bhavnicksm 6 months ago - 1 comment

#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: open - Opened by JamesSand 6 months ago - 2 comments

#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: open - Opened by JamesSand 6 months ago - 2 comments

#43 - Galore unstable on Llama 7B beyond 20K steps

Issue - State: open - Opened by kyleliang919 6 months ago - 1 comment

#43 - Galore unstable on Llama 7B beyond 20K steps

Issue - State: open - Opened by kyleliang919 6 months ago - 1 comment

#42 - Questions about Figure 3 in the original paper

Issue - State: open - Opened by fy817 7 months ago

#42 - Questions about Figure 3 in the original paper

Issue - State: open - Opened by fy817 7 months ago

#41 - ValueError: some parameters appear in more than one parameter group

Issue - State: open - Opened by jiaohuix 7 months ago

#41 - ValueError: some parameters appear in more than one parameter group

Issue - State: open - Opened by jiaohuix 7 months ago

#40 - How many GB memory is required to train the 7b model using DDP mode with galore?

Issue - State: open - Opened by zhangqijun 7 months ago - 1 comment

#40 - How many GB memory is required to train the 7b model using DDP mode with galore?

Issue - State: open - Opened by zhangqijun 7 months ago - 1 comment

#39 - can support llava model ?

Issue - State: open - Opened by awzhgw 7 months ago

#39 - can support llava model ?

Issue - State: open - Opened by awzhgw 7 months ago

#38 - Release of Trained Models

Issue - State: open - Opened by JLake310 7 months ago

#38 - Release of Trained Models

Issue - State: open - Opened by JLake310 7 months ago

#37 - Where is LOMO (fused gradient update) implemented?

Issue - State: closed - Opened by gaotianyu1350 7 months ago - 1 comment

#37 - Where is LOMO (fused gradient update) implemented?

Issue - State: closed - Opened by gaotianyu1350 7 months ago - 1 comment

#36 - Any plan for the first stable release?

Issue - State: open - Opened by wsp317 7 months ago

#36 - Any plan for the first stable release?

Issue - State: open - Opened by wsp317 7 months ago

#35 - Resume function for optimizer

Issue - State: open - Opened by bokyeong1015 7 months ago

#35 - Resume function for optimizer

Issue - State: open - Opened by bokyeong1015 7 months ago

#34 - Support for Jamba (ai21labs/Jamba-v0.1)

Issue - State: open - Opened by creatorrr 8 months ago - 1 comment

#34 - Support for Jamba (ai21labs/Jamba-v0.1)

Issue - State: open - Opened by creatorrr 8 months ago - 1 comment

#33 - Dataset loading issue, integration with Colossal-AI

Issue - State: open - Opened by Edenzzzz 8 months ago - 3 comments

#33 - Dataset loading issue, integration with Colossal-AI

Issue - State: open - Opened by Edenzzzz 8 months ago - 3 comments

#32 - Update README.md

Pull Request - State: closed - Opened by eltociear 8 months ago - 1 comment

#32 - Update README.md

Pull Request - State: closed - Opened by eltociear 8 months ago - 1 comment

#31 - changes c4 to allenai/c4

Pull Request - State: closed - Opened by Explorergt92 8 months ago

#31 - changes c4 to allenai/c4

Pull Request - State: closed - Opened by Explorergt92 8 months ago

#30 - Reproducing Perplexity evaluation

Issue - State: open - Opened by NitzanHod 8 months ago - 2 comments

#30 - Reproducing Perplexity evaluation

Issue - State: open - Opened by NitzanHod 8 months ago - 2 comments

#29 - [WIP] Fused Adam Triton Kernels

Pull Request - State: open - Opened by jeromeku 8 months ago

#29 - [WIP] Fused Adam Triton Kernels

Pull Request - State: open - Opened by jeromeku 8 months ago

#28 - A few questions regarding the results and methodology.

Issue - State: open - Opened by roymiles 8 months ago - 1 comment

#28 - A few questions regarding the results and methodology.

Issue - State: open - Opened by roymiles 8 months ago - 1 comment

#27 - How to get optim_target_modules=["attn", "mlp"] for other model?

Issue - State: closed - Opened by imrankh46 8 months ago - 4 comments

#27 - How to get optim_target_modules=["attn", "mlp"] for other model?

Issue - State: closed - Opened by imrankh46 8 months ago - 4 comments

#26 - linalg.svd: The algorithm failed to converge

Issue - State: closed - Opened by Blueman2 8 months ago - 3 comments

#26 - linalg.svd: The algorithm failed to converge

Issue - State: closed - Opened by Blueman2 8 months ago - 3 comments

#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: closed - Opened by CrazyElements 8 months ago - 7 comments

#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: closed - Opened by CrazyElements 8 months ago - 7 comments

#24 - layerwise optimizer raises TypeError about slice indices

Issue - State: closed - Opened by winglian 8 months ago - 2 comments

#24 - layerwise optimizer raises TypeError about slice indices

Issue - State: closed - Opened by winglian 8 months ago - 2 comments

#23 - Galore is not supported for Deepseed Zero3

Issue - State: closed - Opened by youganglyu 8 months ago - 1 comment

#23 - Galore is not supported for Deepseed Zero3

Issue - State: closed - Opened by youganglyu 8 months ago - 1 comment

#22 - update readme and pip package

Pull Request - State: closed - Opened by jiaweizzhao 8 months ago

#22 - update readme and pip package

Pull Request - State: closed - Opened by jiaweizzhao 8 months ago

#21 - How can i do continued pre-training using this?

Issue - State: open - Opened by Aloukik21 8 months ago - 4 comments

#21 - How can i do continued pre-training using this?

Issue - State: open - Opened by Aloukik21 8 months ago - 4 comments

#20 - GaLore in HuggingFace

Issue - State: open - Opened by IamExperimenting 8 months ago - 12 comments

#20 - GaLore in HuggingFace

Issue - State: open - Opened by IamExperimenting 8 months ago - 12 comments

#19 - Please add Phi-2 Support

Issue - State: open - Opened by calebmor460 8 months ago - 1 comment

#19 - Please add Phi-2 Support

Issue - State: open - Opened by calebmor460 8 months ago - 1 comment

#18 - Remove unused `A` and `B` computation

Pull Request - State: closed - Opened by awgu 8 months ago - 1 comment

#18 - Remove unused `A` and `B` computation

Pull Request - State: closed - Opened by awgu 8 months ago - 1 comment

#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D

Issue - State: closed - Opened by drimeF0 8 months ago

#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D

Issue - State: closed - Opened by drimeF0 8 months ago

#16 - The first optimizer.step() execution cost extremely long time

Issue - State: closed - Opened by xikaluo 8 months ago - 1 comment

#16 - The first optimizer.step() execution cost extremely long time

Issue - State: closed - Opened by xikaluo 8 months ago - 1 comment

#15 - Hyperparameters for SFT?

Issue - State: open - Opened by peterjc123 8 months ago - 4 comments

#15 - Hyperparameters for SFT?

Issue - State: open - Opened by peterjc123 8 months ago - 4 comments

#14 - Confusion about the paper

Issue - State: closed - Opened by CrazyElements 8 months ago - 2 comments

#14 - Confusion about the paper

Issue - State: closed - Opened by CrazyElements 8 months ago - 2 comments

#13 - Clarifying GLUE Benchmark Accuracy: Validation or Test Set?

Issue - State: closed - Opened by monk1337 8 months ago - 1 comment

GitHub / jiaweizzhao/GaLore issues and pull requests