jiaweizzhao/GaLore issues and pull requests

#62 - the problem of warmup step and num training step

Issue - State: closed - Opened by BIGKnight 11 days ago

#62 - the problem of warmup step and num training step

Issue - State: closed - Opened by BIGKnight 11 days ago

#61 - loss figure data

Issue - State: open - Opened by BaohaoLiao 19 days ago

#61 - loss figure data

Issue - State: open - Opened by BaohaoLiao 19 days ago

#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)

Issue - State: open - Opened by liveck about 1 month ago - 1 comment

#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)

Issue - State: open - Opened by liveck about 1 month ago - 1 comment

#59 - Results vs FP32

Issue - State: open - Opened by tsengalb99 about 2 months ago

#59 - Results vs FP32

Issue - State: open - Opened by tsengalb99 about 2 months ago

#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values

Issue - State: open - Opened by akjindal53244 about 2 months ago - 1 comment

#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values

Issue - State: open - Opened by akjindal53244 about 2 months ago - 1 comment

#57 - Figure 1 clarification on batch size and sequence length

Issue - State: open - Opened by psandovalsegura 2 months ago - 1 comment

#57 - Figure 1 clarification on batch size and sequence length

Issue - State: open - Opened by psandovalsegura 2 months ago - 1 comment

#56 - Questions about glue task report scores

Issue - State: open - Opened by MYT677 2 months ago

#56 - Questions about glue task report scores

Issue - State: open - Opened by MYT677 2 months ago

#55 - Support for DDP with multi-gpus

Issue - State: open - Opened by seongjunyun 2 months ago

#55 - Support for DDP with multi-gpus

Issue - State: open - Opened by seongjunyun 2 months ago

#54 - Why not reproject the internal Adam states during update_proj_gap?

Issue - State: open - Opened by liuliu 3 months ago - 2 comments

#54 - Why not reproject the internal Adam states during update_proj_gap?

Issue - State: open - Opened by liuliu 3 months ago - 2 comments

#53 - Does galore save gradient memory?

Issue - State: open - Opened by jinqixiao 3 months ago - 1 comment

#53 - Does galore save gradient memory?

Issue - State: open - Opened by jinqixiao 3 months ago - 1 comment

#52 - (Question) About glue tasks

Issue - State: open - Opened by ZhichaoWang091732 3 months ago - 3 comments

#52 - (Question) About glue tasks

Issue - State: open - Opened by ZhichaoWang091732 3 months ago - 3 comments

#51 - Galore finetuning #stopped

Issue - State: open - Opened by j-datta 4 months ago

#51 - Galore finetuning #stopped

Issue - State: open - Opened by j-datta 4 months ago

#50 - Update galore_projector.py

Pull Request - State: closed - Opened by jetaudio 4 months ago

#50 - Update galore_projector.py

Pull Request - State: closed - Opened by jetaudio 4 months ago

#49 - Memory issue

Issue - State: closed - Opened by fakerybakery 4 months ago - 2 comments

#49 - Memory issue

Issue - State: closed - Opened by fakerybakery 4 months ago - 2 comments

#48 - Extend GaLore Algorithm for General Tensor Decomposition

Pull Request - State: closed - Opened by Robertboy18 4 months ago

#48 - Extend GaLore Algorithm for General Tensor Decomposition

Pull Request - State: closed - Opened by Robertboy18 4 months ago

#47 - IndexError: tuple index out of range

Issue - State: open - Opened by zyushun 4 months ago - 11 comments

#47 - IndexError: tuple index out of range

Issue - State: open - Opened by zyushun 4 months ago - 11 comments

#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

Issue - State: open - Opened by Minami-su 4 months ago - 1 comment

#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

Issue - State: open - Opened by Minami-su 4 months ago - 1 comment

#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

Issue - State: open - Opened by bhavnicksm 4 months ago - 1 comment

#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

Issue - State: open - Opened by bhavnicksm 4 months ago - 1 comment

#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: open - Opened by JamesSand 5 months ago - 2 comments

#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: open - Opened by JamesSand 5 months ago - 2 comments

#43 - Galore unstable on Llama 7B beyond 20K steps

Issue - State: open - Opened by kyleliang919 5 months ago - 1 comment

#43 - Galore unstable on Llama 7B beyond 20K steps

Issue - State: open - Opened by kyleliang919 5 months ago - 1 comment

#42 - Questions about Figure 3 in the original paper

Issue - State: open - Opened by fy817 5 months ago

#42 - Questions about Figure 3 in the original paper

Issue - State: open - Opened by fy817 5 months ago

#41 - ValueError: some parameters appear in more than one parameter group

Issue - State: open - Opened by jiaohuix 5 months ago

#41 - ValueError: some parameters appear in more than one parameter group

Issue - State: open - Opened by jiaohuix 5 months ago

#40 - How many GB memory is required to train the 7b model using DDP mode with galore?

Issue - State: open - Opened by zhangqijun 5 months ago - 1 comment

#40 - How many GB memory is required to train the 7b model using DDP mode with galore?

Issue - State: open - Opened by zhangqijun 5 months ago - 1 comment

#39 - can support llava model ?

Issue - State: open - Opened by awzhgw 5 months ago

#39 - can support llava model ?

Issue - State: open - Opened by awzhgw 5 months ago

#38 - Release of Trained Models

Issue - State: open - Opened by JLake310 5 months ago

#38 - Release of Trained Models

Issue - State: open - Opened by JLake310 5 months ago

#37 - Where is LOMO (fused gradient update) implemented?

Issue - State: closed - Opened by gaotianyu1350 6 months ago - 1 comment

#37 - Where is LOMO (fused gradient update) implemented?

Issue - State: closed - Opened by gaotianyu1350 6 months ago - 1 comment

#36 - Any plan for the first stable release?

Issue - State: open - Opened by wsp317 6 months ago

#36 - Any plan for the first stable release?

Issue - State: open - Opened by wsp317 6 months ago

#35 - Resume function for optimizer

Issue - State: open - Opened by bokyeong1015 6 months ago

#35 - Resume function for optimizer

Issue - State: open - Opened by bokyeong1015 6 months ago

#34 - Support for Jamba (ai21labs/Jamba-v0.1)

Issue - State: open - Opened by creatorrr 6 months ago - 1 comment

#34 - Support for Jamba (ai21labs/Jamba-v0.1)

Issue - State: open - Opened by creatorrr 6 months ago - 1 comment

#33 - Dataset loading issue, integration with Colossal-AI

Issue - State: open - Opened by Edenzzzz 6 months ago - 3 comments

#33 - Dataset loading issue, integration with Colossal-AI

Issue - State: open - Opened by Edenzzzz 6 months ago - 3 comments

#32 - Update README.md

Pull Request - State: closed - Opened by eltociear 6 months ago - 1 comment

#32 - Update README.md

Pull Request - State: closed - Opened by eltociear 6 months ago - 1 comment

#31 - changes c4 to allenai/c4

Pull Request - State: closed - Opened by Explorergt92 6 months ago

#31 - changes c4 to allenai/c4

Pull Request - State: closed - Opened by Explorergt92 6 months ago

#30 - Reproducing Perplexity evaluation

Issue - State: open - Opened by NitzanHod 6 months ago - 2 comments

#30 - Reproducing Perplexity evaluation

Issue - State: open - Opened by NitzanHod 6 months ago - 2 comments

#29 - [WIP] Fused Adam Triton Kernels

Pull Request - State: open - Opened by jeromeku 6 months ago

#29 - [WIP] Fused Adam Triton Kernels

Pull Request - State: open - Opened by jeromeku 6 months ago

#28 - A few questions regarding the results and methodology.

Issue - State: open - Opened by roymiles 6 months ago - 1 comment

#28 - A few questions regarding the results and methodology.

Issue - State: open - Opened by roymiles 6 months ago - 1 comment

#27 - How to get optim_target_modules=["attn", "mlp"] for other model?

Issue - State: closed - Opened by imrankh46 6 months ago - 4 comments

#27 - How to get optim_target_modules=["attn", "mlp"] for other model?

Issue - State: closed - Opened by imrankh46 6 months ago - 4 comments

#26 - linalg.svd: The algorithm failed to converge

Issue - State: closed - Opened by Blueman2 6 months ago - 3 comments

#26 - linalg.svd: The algorithm failed to converge

Issue - State: closed - Opened by Blueman2 6 months ago - 3 comments

#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: closed - Opened by CrazyElements 6 months ago - 7 comments

#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Issue - State: closed - Opened by CrazyElements 6 months ago - 7 comments

#24 - layerwise optimizer raises TypeError about slice indices

Issue - State: closed - Opened by winglian 6 months ago - 2 comments

#24 - layerwise optimizer raises TypeError about slice indices

Issue - State: closed - Opened by winglian 6 months ago - 2 comments

#23 - Galore is not supported for Deepseed Zero3

Issue - State: closed - Opened by youganglyu 6 months ago - 1 comment

#23 - Galore is not supported for Deepseed Zero3

Issue - State: closed - Opened by youganglyu 6 months ago - 1 comment

#22 - update readme and pip package

Pull Request - State: closed - Opened by jiaweizzhao 6 months ago

#22 - update readme and pip package

Pull Request - State: closed - Opened by jiaweizzhao 6 months ago

#21 - How can i do continued pre-training using this?

Issue - State: open - Opened by Aloukik21 6 months ago - 4 comments

#21 - How can i do continued pre-training using this?

Issue - State: open - Opened by Aloukik21 6 months ago - 4 comments

#20 - GaLore in HuggingFace

Issue - State: open - Opened by IamExperimenting 6 months ago - 12 comments

#20 - GaLore in HuggingFace

Issue - State: open - Opened by IamExperimenting 6 months ago - 12 comments

#19 - Please add Phi-2 Support

Issue - State: open - Opened by calebmor460 6 months ago - 1 comment

#19 - Please add Phi-2 Support

Issue - State: open - Opened by calebmor460 6 months ago - 1 comment

#18 - Remove unused `A` and `B` computation

Pull Request - State: closed - Opened by awgu 6 months ago - 1 comment

#18 - Remove unused `A` and `B` computation

Pull Request - State: closed - Opened by awgu 6 months ago - 1 comment

#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D

Issue - State: closed - Opened by drimeF0 6 months ago

#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D

Issue - State: closed - Opened by drimeF0 6 months ago

#16 - The first optimizer.step() execution cost extremely long time

Issue - State: closed - Opened by xikaluo 6 months ago - 1 comment

#16 - The first optimizer.step() execution cost extremely long time

Issue - State: closed - Opened by xikaluo 6 months ago - 1 comment

#15 - Hyperparameters for SFT?

Issue - State: open - Opened by peterjc123 6 months ago - 4 comments

#15 - Hyperparameters for SFT?

Issue - State: open - Opened by peterjc123 6 months ago - 4 comments

#14 - Confusion about the paper

Issue - State: closed - Opened by CrazyElements 6 months ago - 2 comments

#14 - Confusion about the paper

Issue - State: closed - Opened by CrazyElements 6 months ago - 2 comments

#13 - Clarifying GLUE Benchmark Accuracy: Validation or Test Set?

Issue - State: closed - Opened by monk1337 7 months ago - 1 comment

#13 - Clarifying GLUE Benchmark Accuracy: Validation or Test Set?

Issue - State: closed - Opened by monk1337 7 months ago - 1 comment

GitHub / jiaweizzhao/GaLore issues and pull requests