Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / jiaweizzhao/GaLore issues and pull requests
#63 - pad_token_id
Issue -
State: closed - Opened by xay2001 29 days ago
#62 - the problem of warmup step and num training step
Issue -
State: closed - Opened by BIGKnight 2 months ago
#62 - the problem of warmup step and num training step
Issue -
State: closed - Opened by BIGKnight 2 months ago
#61 - loss figure data
Issue -
State: open - Opened by BaohaoLiao 2 months ago
#61 - loss figure data
Issue -
State: open - Opened by BaohaoLiao 2 months ago
#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)
Issue -
State: open - Opened by liveck 3 months ago
- 1 comment
#60 - ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)
Issue -
State: open - Opened by liveck 3 months ago
- 1 comment
#59 - Results vs FP32
Issue -
State: open - Opened by tsengalb99 4 months ago
#59 - Results vs FP32
Issue -
State: open - Opened by tsengalb99 4 months ago
#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values
Issue -
State: open - Opened by akjindal53244 4 months ago
- 1 comment
#58 - Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values
Issue -
State: open - Opened by akjindal53244 4 months ago
- 1 comment
#57 - Figure 1 clarification on batch size and sequence length
Issue -
State: open - Opened by psandovalsegura 4 months ago
- 1 comment
#57 - Figure 1 clarification on batch size and sequence length
Issue -
State: open - Opened by psandovalsegura 4 months ago
- 1 comment
#56 - Questions about glue task report scores
Issue -
State: open - Opened by MYT677 4 months ago
#56 - Questions about glue task report scores
Issue -
State: open - Opened by MYT677 4 months ago
#55 - Support for DDP with multi-gpus
Issue -
State: open - Opened by seongjunyun 4 months ago
#55 - Support for DDP with multi-gpus
Issue -
State: open - Opened by seongjunyun 4 months ago
#54 - Why not reproject the internal Adam states during update_proj_gap?
Issue -
State: open - Opened by liuliu 5 months ago
- 2 comments
#54 - Why not reproject the internal Adam states during update_proj_gap?
Issue -
State: open - Opened by liuliu 5 months ago
- 2 comments
#53 - Does galore save gradient memory?
Issue -
State: open - Opened by jinqixiao 5 months ago
- 1 comment
#53 - Does galore save gradient memory?
Issue -
State: open - Opened by jinqixiao 5 months ago
- 1 comment
#52 - (Question) About glue tasks
Issue -
State: open - Opened by ZhichaoWang091732 5 months ago
- 3 comments
#52 - (Question) About glue tasks
Issue -
State: open - Opened by ZhichaoWang091732 5 months ago
- 3 comments
#51 - Galore finetuning #stopped
Issue -
State: open - Opened by j-datta 5 months ago
#51 - Galore finetuning #stopped
Issue -
State: open - Opened by j-datta 5 months ago
#50 - Update galore_projector.py
Pull Request -
State: closed - Opened by jetaudio 5 months ago
#50 - Update galore_projector.py
Pull Request -
State: closed - Opened by jetaudio 5 months ago
#49 - Memory issue
Issue -
State: closed - Opened by fakerybakery 6 months ago
- 2 comments
#49 - Memory issue
Issue -
State: closed - Opened by fakerybakery 6 months ago
- 2 comments
#48 - Extend GaLore Algorithm for General Tensor Decomposition
Pull Request -
State: closed - Opened by Robertboy18 6 months ago
#48 - Extend GaLore Algorithm for General Tensor Decomposition
Pull Request -
State: closed - Opened by Robertboy18 6 months ago
#47 - IndexError: tuple index out of range
Issue -
State: open - Opened by zyushun 6 months ago
- 11 comments
#47 - IndexError: tuple index out of range
Issue -
State: open - Opened by zyushun 6 months ago
- 11 comments
#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
Issue -
State: open - Opened by Minami-su 6 months ago
- 1 comment
#46 - When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
Issue -
State: open - Opened by Minami-su 6 months ago
- 1 comment
#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision
Issue -
State: open - Opened by bhavnicksm 6 months ago
- 1 comment
#45 - `torch_run.py` lacking autocast and scaling for Automatic Mixed Precision
Issue -
State: open - Opened by bhavnicksm 6 months ago
- 1 comment
#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
Issue -
State: open - Opened by JamesSand 6 months ago
- 2 comments
#44 - Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
Issue -
State: open - Opened by JamesSand 6 months ago
- 2 comments
#43 - Galore unstable on Llama 7B beyond 20K steps
Issue -
State: open - Opened by kyleliang919 6 months ago
- 1 comment
#43 - Galore unstable on Llama 7B beyond 20K steps
Issue -
State: open - Opened by kyleliang919 6 months ago
- 1 comment
#42 - Questions about Figure 3 in the original paper
Issue -
State: open - Opened by fy817 7 months ago
#42 - Questions about Figure 3 in the original paper
Issue -
State: open - Opened by fy817 7 months ago
#41 - ValueError: some parameters appear in more than one parameter group
Issue -
State: open - Opened by jiaohuix 7 months ago
#41 - ValueError: some parameters appear in more than one parameter group
Issue -
State: open - Opened by jiaohuix 7 months ago
#40 - How many GB memory is required to train the 7b model using DDP mode with galore?
Issue -
State: open - Opened by zhangqijun 7 months ago
- 1 comment
#40 - How many GB memory is required to train the 7b model using DDP mode with galore?
Issue -
State: open - Opened by zhangqijun 7 months ago
- 1 comment
#39 - can support llava model ?
Issue -
State: open - Opened by awzhgw 7 months ago
#39 - can support llava model ?
Issue -
State: open - Opened by awzhgw 7 months ago
#38 - Release of Trained Models
Issue -
State: open - Opened by JLake310 7 months ago
#38 - Release of Trained Models
Issue -
State: open - Opened by JLake310 7 months ago
#37 - Where is LOMO (fused gradient update) implemented?
Issue -
State: closed - Opened by gaotianyu1350 7 months ago
- 1 comment
#37 - Where is LOMO (fused gradient update) implemented?
Issue -
State: closed - Opened by gaotianyu1350 7 months ago
- 1 comment
#36 - Any plan for the first stable release?
Issue -
State: open - Opened by wsp317 7 months ago
#36 - Any plan for the first stable release?
Issue -
State: open - Opened by wsp317 7 months ago
#35 - Resume function for optimizer
Issue -
State: open - Opened by bokyeong1015 7 months ago
#35 - Resume function for optimizer
Issue -
State: open - Opened by bokyeong1015 7 months ago
#34 - Support for Jamba (ai21labs/Jamba-v0.1)
Issue -
State: open - Opened by creatorrr 8 months ago
- 1 comment
#34 - Support for Jamba (ai21labs/Jamba-v0.1)
Issue -
State: open - Opened by creatorrr 8 months ago
- 1 comment
#33 - Dataset loading issue, integration with Colossal-AI
Issue -
State: open - Opened by Edenzzzz 8 months ago
- 3 comments
#33 - Dataset loading issue, integration with Colossal-AI
Issue -
State: open - Opened by Edenzzzz 8 months ago
- 3 comments
#32 - Update README.md
Pull Request -
State: closed - Opened by eltociear 8 months ago
- 1 comment
#32 - Update README.md
Pull Request -
State: closed - Opened by eltociear 8 months ago
- 1 comment
#31 - changes c4 to allenai/c4
Pull Request -
State: closed - Opened by Explorergt92 8 months ago
#31 - changes c4 to allenai/c4
Pull Request -
State: closed - Opened by Explorergt92 8 months ago
#30 - Reproducing Perplexity evaluation
Issue -
State: open - Opened by NitzanHod 8 months ago
- 2 comments
#30 - Reproducing Perplexity evaluation
Issue -
State: open - Opened by NitzanHod 8 months ago
- 2 comments
#29 - [WIP] Fused Adam Triton Kernels
Pull Request -
State: open - Opened by jeromeku 8 months ago
#29 - [WIP] Fused Adam Triton Kernels
Pull Request -
State: open - Opened by jeromeku 8 months ago
#28 - A few questions regarding the results and methodology.
Issue -
State: open - Opened by roymiles 8 months ago
- 1 comment
#28 - A few questions regarding the results and methodology.
Issue -
State: open - Opened by roymiles 8 months ago
- 1 comment
#27 - How to get optim_target_modules=["attn", "mlp"] for other model?
Issue -
State: closed - Opened by imrankh46 8 months ago
- 4 comments
#27 - How to get optim_target_modules=["attn", "mlp"] for other model?
Issue -
State: closed - Opened by imrankh46 8 months ago
- 4 comments
#26 - linalg.svd: The algorithm failed to converge
Issue -
State: closed - Opened by Blueman2 8 months ago
- 3 comments
#26 - linalg.svd: The algorithm failed to converge
Issue -
State: closed - Opened by Blueman2 8 months ago
- 3 comments
#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
Issue -
State: closed - Opened by CrazyElements 8 months ago
- 7 comments
#25 - Can't reproduce the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
Issue -
State: closed - Opened by CrazyElements 8 months ago
- 7 comments
#24 - layerwise optimizer raises TypeError about slice indices
Issue -
State: closed - Opened by winglian 8 months ago
- 2 comments
#24 - layerwise optimizer raises TypeError about slice indices
Issue -
State: closed - Opened by winglian 8 months ago
- 2 comments
#23 - Galore is not supported for Deepseed Zero3
Issue -
State: closed - Opened by youganglyu 8 months ago
- 1 comment
#23 - Galore is not supported for Deepseed Zero3
Issue -
State: closed - Opened by youganglyu 8 months ago
- 1 comment
#22 - update readme and pip package
Pull Request -
State: closed - Opened by jiaweizzhao 8 months ago
#22 - update readme and pip package
Pull Request -
State: closed - Opened by jiaweizzhao 8 months ago
#21 - How can i do continued pre-training using this?
Issue -
State: open - Opened by Aloukik21 8 months ago
- 4 comments
#21 - How can i do continued pre-training using this?
Issue -
State: open - Opened by Aloukik21 8 months ago
- 4 comments
#20 - GaLore in HuggingFace
Issue -
State: open - Opened by IamExperimenting 8 months ago
- 12 comments
#20 - GaLore in HuggingFace
Issue -
State: open - Opened by IamExperimenting 8 months ago
- 12 comments
#19 - Please add Phi-2 Support
Issue -
State: open - Opened by calebmor460 8 months ago
- 1 comment
#19 - Please add Phi-2 Support
Issue -
State: open - Opened by calebmor460 8 months ago
- 1 comment
#18 - Remove unused `A` and `B` computation
Pull Request -
State: closed - Opened by awgu 8 months ago
- 1 comment
#18 - Remove unused `A` and `B` computation
Pull Request -
State: closed - Opened by awgu 8 months ago
- 1 comment
#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D
Issue -
State: closed - Opened by drimeF0 8 months ago
#17 - RuntimeError: diag(): Supports 1D or 2D tensors. Got 3D
Issue -
State: closed - Opened by drimeF0 8 months ago
#16 - The first optimizer.step() execution cost extremely long time
Issue -
State: closed - Opened by xikaluo 8 months ago
- 1 comment
#16 - The first optimizer.step() execution cost extremely long time
Issue -
State: closed - Opened by xikaluo 8 months ago
- 1 comment
#15 - Hyperparameters for SFT?
Issue -
State: open - Opened by peterjc123 8 months ago
- 4 comments
#15 - Hyperparameters for SFT?
Issue -
State: open - Opened by peterjc123 8 months ago
- 4 comments
#14 - Confusion about the paper
Issue -
State: closed - Opened by CrazyElements 8 months ago
- 2 comments
#14 - Confusion about the paper
Issue -
State: closed - Opened by CrazyElements 8 months ago
- 2 comments
#13 - Clarifying GLUE Benchmark Accuracy: Validation or Test Set?
Issue -
State: closed - Opened by monk1337 8 months ago
- 1 comment