GitHub / princeton-nlp/LLM-Shearing issues and pull requests
#75 - Dear author, I would like to inquire whether ShearingLLM currently supports pruning for the Llama3 series of large language models?
Issue -
State: open - Opened by BUGBOY101 5 months ago
#74 - The pruned model does not match target structred config
Issue -
State: open - Opened by dat-browny 10 months ago
#73 - Request for Fine-tuning Data for Continued Pre-training
Issue -
State: open - Opened by pupumao 11 months ago
#72 - About the NQ EM Score in Table 2
Issue -
State: open - Opened by chuhac 11 months ago
#71 - Default Initialization of Lambda Parameters to Zero
Issue -
State: open - Opened by lpyhdzx about 1 year ago
- 3 comments
#70 - Open source the pruning mask.
Issue -
State: closed - Opened by Achazwl over 1 year ago
- 2 comments
#69 - Support for Llama-3 / GQA?
Issue -
State: closed - Opened by LorrinWWW over 1 year ago
- 1 comment
#68 - Can LLM-Shearing be used on ViT models?
Issue -
State: open - Opened by n9s8a over 1 year ago
- 1 comment
#67 - about shearing params config
Issue -
State: open - Opened by LoverLost over 1 year ago
- 1 comment
#66 - Why the rope params are ignored while converting hf checkpoint to composer checkpoint?
Issue -
State: open - Opened by ZhiYuanZeng over 1 year ago
- 3 comments
#65 - The dtype of tokenized data should be uint32
Issue -
State: closed - Opened by ZhiYuanZeng over 1 year ago
- 1 comment
#64 - composer model trans to pythia problem
Issue -
State: open - Opened by rzr002 over 1 year ago
#63 - LlamaRMSNorm() layer differs from original llama
Issue -
State: closed - Opened by suhmily over 1 year ago
- 1 comment
#62 - The Project is not implemented for 70B llama?
Issue -
State: open - Opened by zhangzhenyu13 over 1 year ago
- 7 comments
#61 - Start training but only output config information
Issue -
State: open - Opened by Beatlesso over 1 year ago
- 3 comments
#60 - None
Issue -
State: closed - Opened by Beatlesso over 1 year ago
#59 - 有没有不用Slurm跑剪枝的方法?
Issue -
State: closed - Opened by Beatlesso over 1 year ago
#58 - If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
Issue -
State: closed - Opened by rzr002 over 1 year ago
- 5 comments
#57 - Instruction tuning dataset
Issue -
State: closed - Opened by kiucho over 1 year ago
- 2 comments
#56 - save model meet problem
Issue -
State: open - Opened by 18140663659 over 1 year ago
- 1 comment
#55 - Pruning fine-tuned model
Issue -
State: closed - Opened by kiucho over 1 year ago
- 2 comments
#54 - TypeError: buffer is too small for requested array
Issue -
State: open - Opened by 18140663659 over 1 year ago
#53 - Start training but nothing continue
Issue -
State: closed - Opened by logan-zou over 1 year ago
- 6 comments
#52 - missmatch shape
Issue -
State: closed - Opened by coderchem over 1 year ago
#51 - Could you provide tokenized continue-pretraining dataset for reproduction?
Issue -
State: open - Opened by gywlssww over 1 year ago
- 3 comments
#50 - When should we apply hidden_z?
Issue -
State: closed - Opened by sbwww over 1 year ago
- 2 comments
#49 - KeyError: 'state'
Issue -
State: open - Opened by changheecho over 1 year ago
- 2 comments
#48 - Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()
Issue -
State: closed - Opened by rzr002 over 1 year ago
- 1 comment
#47 - Avoid OOM using deepspeed zero-stage
Issue -
State: open - Opened by gywlssww over 1 year ago
- 3 comments
#46 - 在进行Building trainer时,训练会卡住;
Issue -
State: open - Opened by coderchem over 1 year ago
- 1 comment
#45 - duplicate mean values during mask initialization
Issue -
State: closed - Opened by czhang99 over 1 year ago
- 2 comments
#44 - Release sheared model without re-training?
Issue -
State: closed - Opened by sbwww over 1 year ago
- 4 comments
#43 - model.prune_params() NotImplementedError: Could not run 'aten::nonzero'
Issue -
State: open - Opened by YanxiZSQ over 1 year ago
- 3 comments
#42 - The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper
Issue -
State: open - Opened by YWMditto over 1 year ago
- 1 comment
#41 - Metric Scores and NQ Evaluation
Issue -
State: closed - Opened by Spico197 over 1 year ago
- 2 comments
#40 - Missing index.json in dataset shared on drive
Issue -
State: closed - Opened by AnonNoNameAccount over 1 year ago
- 1 comment
#39 - Drive dress error
Issue -
State: closed - Opened by YanxiZSQ over 1 year ago
- 2 comments
#38 - cannot reshape array of size 4 into shape (1,newaxis,8)
Issue -
State: closed - Opened by rzr002 over 1 year ago
- 5 comments
#37 - meta-llama/Llama-2-7b-hf Model Preparation failed
Issue -
State: closed - Opened by rzr002 over 1 year ago
- 1 comment
#36 - wiki proportion finally dominates at the end of the pruning stage
Issue -
State: closed - Opened by lippman1125 over 1 year ago
- 6 comments
#35 - ShearedCodeLLama
Issue -
State: closed - Opened by SinanAkkoyun over 1 year ago
- 3 comments
#34 - LanguageCrossEntropy logs nan when bash pruning.sh
Issue -
State: open - Opened by YanxiZSQ over 1 year ago
- 6 comments
#33 - AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'
Issue -
State: closed - Opened by YanxiZSQ over 1 year ago
- 1 comment
#32 - Pruning crash at iteration 592.
Issue -
State: open - Opened by lippman1125 over 1 year ago
- 6 comments
#31 - Train metrics/train/github_LanguageCrossEntropy: nan
Issue -
State: closed - Opened by lippman1125 over 1 year ago
- 2 comments
#30 - Create cleanshm.sh
Pull Request -
State: closed - Opened by Longyichen over 1 year ago
#29 - KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?
Issue -
State: closed - Opened by SinanAkkoyun over 1 year ago
- 2 comments
#28 - Docker Request
Issue -
State: closed - Opened by TonyZhanghm over 1 year ago
- 1 comment
#27 - Flash-attn dependency issues
Issue -
State: closed - Opened by Forival over 1 year ago
- 1 comment
#26 - Please share the alpaca generate and eval code and script to reproduce the results shared in
Issue -
State: closed - Opened by sanyalsunny111 over 1 year ago
- 4 comments
#25 - Finetuning using LoRA
Issue -
State: closed - Opened by Nimisha-Pabbichetty over 1 year ago
- 5 comments
#24 - Path no use in continue_pretrain.sh
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 9 comments
#23 - NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 3 comments
#22 - How much compute will this take?
Issue -
State: closed - Opened by fakerybakery over 1 year ago
- 7 comments
#21 - sample data generate name
Issue -
State: closed - Opened by sunzhe09 over 1 year ago
- 6 comments
#20 - Question about ComposerMosaicLlama.forward
Issue -
State: closed - Opened by hanlinxuy over 1 year ago
- 4 comments
#19 - Composer Model Transform problems encountered when shearing Pythia 1.4b
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 2 comments
#18 - How can I use it on multiple nodes without slurm?
Issue -
State: closed - Opened by wang99711123 over 1 year ago
- 1 comment
#17 - Use without flash-attn?
Issue -
State: closed - Opened by fakerybakery over 1 year ago
- 3 comments
#16 - FileNotFoundError: No such file or directory:"save_hf_to_composer"
Issue -
State: closed - Opened by sunzhe09 over 1 year ago
- 1 comment
#15 - AssertionError: Currently only supports dynamic loading from each domain for once.
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 17 comments
#14 - Sample.py error when sampling stackexchage
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 4 comments
#13 - Scaling Law for predicted loss
Issue -
State: closed - Opened by AlpinDale over 1 year ago
- 4 comments
#12 - Small typo on Table 2
Issue -
State: closed - Opened by lxww302 over 1 year ago
- 1 comment
#11 - Dynamic Batch Loading v.s. Domain Reweighting
Issue -
State: closed - Opened by lxww302 over 1 year ago
- 1 comment
#10 - Can Sheared-LLaMA beat OpenLLaMA v2 significantly with the same amount of compute ?
Issue -
State: closed - Opened by lxww302 over 1 year ago
- 2 comments
#9 - LanguageCrossEntropy logs nan when bash pruning.sh
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 7 comments
#8 - License is missing
Issue -
State: closed - Opened by casper-hansen over 1 year ago
- 1 comment
#7 - TypeError: load_data() missing 1 required positional argument: 'tokenizer_name'
Issue -
State: closed - Opened by hanlinxuy over 1 year ago
- 3 comments
#6 - Can you provide script without using slurm or sbatch?
Issue -
State: closed - Opened by hanlinxuy over 1 year ago
- 2 comments
#5 - Update README.md
Pull Request -
State: closed - Opened by eltociear over 1 year ago
- 1 comment
#4 - Why update model parameters when training pruning parameters?
Issue -
State: closed - Opened by KaihuaTang over 1 year ago
- 1 comment
#3 - Repeated assignment in l0_module.py
Issue -
State: closed - Opened by Longyichen over 1 year ago
- 2 comments
#2 - Any updates on the code?
Issue -
State: closed - Opened by AlpinDale almost 2 years ago
- 5 comments
#1 - Missing Reference
Issue -
State: closed - Opened by wutaiqiang almost 2 years ago
- 5 comments