princeton-nlp/LLM-Shearing issues and pull requests

#75 - Dear author, I would like to inquire whether ShearingLLM currently supports pruning for the Llama3 series of large language models?

Issue - State: open - Opened by BUGBOY101 5 months ago

#74 - The pruned model does not match target structred config

Issue - State: open - Opened by dat-browny 10 months ago

#73 - Request for Fine-tuning Data for Continued Pre-training

Issue - State: open - Opened by pupumao 11 months ago

#72 - About the NQ EM Score in Table 2

Issue - State: open - Opened by chuhac 11 months ago

#71 - Default Initialization of Lambda Parameters to Zero

Issue - State: open - Opened by lpyhdzx about 1 year ago - 3 comments

#70 - Open source the pruning mask.

Issue - State: closed - Opened by Achazwl over 1 year ago - 2 comments

#69 - Support for Llama-3 / GQA?

Issue - State: closed - Opened by LorrinWWW over 1 year ago - 1 comment

#68 - Can LLM-Shearing be used on ViT models?

Issue - State: open - Opened by n9s8a over 1 year ago - 1 comment

#67 - about shearing params config

Issue - State: open - Opened by LoverLost over 1 year ago - 1 comment

#66 - Why the rope params are ignored while converting hf checkpoint to composer checkpoint?

Issue - State: open - Opened by ZhiYuanZeng over 1 year ago - 3 comments

#65 - The dtype of tokenized data should be uint32

Issue - State: closed - Opened by ZhiYuanZeng over 1 year ago - 1 comment

#64 - composer model trans to pythia problem

Issue - State: open - Opened by rzr002 over 1 year ago

#63 - LlamaRMSNorm() layer differs from original llama

Issue - State: closed - Opened by suhmily over 1 year ago - 1 comment

#62 - The Project is not implemented for 70B llama?

Issue - State: open - Opened by zhangzhenyu13 over 1 year ago - 7 comments

#61 - Start training but only output config information

Issue - State: open - Opened by Beatlesso over 1 year ago - 3 comments

#60 - None

Issue - State: closed - Opened by Beatlesso over 1 year ago

#59 - 有没有不用Slurm跑剪枝的方法？

Issue - State: closed - Opened by Beatlesso over 1 year ago

#58 - If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?

Issue - State: closed - Opened by rzr002 over 1 year ago - 5 comments

#57 - Instruction tuning dataset

Issue - State: closed - Opened by kiucho over 1 year ago - 2 comments

#56 - save model meet problem

Issue - State: open - Opened by 18140663659 over 1 year ago - 1 comment

#55 - Pruning fine-tuned model

Issue - State: closed - Opened by kiucho over 1 year ago - 2 comments

#54 - TypeError: buffer is too small for requested array

Issue - State: open - Opened by 18140663659 over 1 year ago

#53 - Start training but nothing continue

Issue - State: closed - Opened by logan-zou over 1 year ago - 6 comments

#52 - missmatch shape

Issue - State: closed - Opened by coderchem over 1 year ago

#51 - Could you provide tokenized continue-pretraining dataset for reproduction?

Issue - State: open - Opened by gywlssww over 1 year ago - 3 comments

#50 - When should we apply hidden_z?

Issue - State: closed - Opened by sbwww over 1 year ago - 2 comments

#49 - KeyError: 'state'

Issue - State: open - Opened by changheecho over 1 year ago - 2 comments

#48 - Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()

Issue - State: closed - Opened by rzr002 over 1 year ago - 1 comment

#47 - Avoid OOM using deepspeed zero-stage

Issue - State: open - Opened by gywlssww over 1 year ago - 3 comments

#46 - 在进行Building trainer时，训练会卡住；

Issue - State: open - Opened by coderchem over 1 year ago - 1 comment

#45 - duplicate mean values during mask initialization

Issue - State: closed - Opened by czhang99 over 1 year ago - 2 comments

#44 - Release sheared model without re-training?

Issue - State: closed - Opened by sbwww over 1 year ago - 4 comments

#43 - model.prune_params() NotImplementedError: Could not run 'aten::nonzero'

Issue - State: open - Opened by YanxiZSQ over 1 year ago - 3 comments

#42 - The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper

Issue - State: open - Opened by YWMditto over 1 year ago - 1 comment

#41 - Metric Scores and NQ Evaluation

Issue - State: closed - Opened by Spico197 over 1 year ago - 2 comments

#40 - Missing index.json in dataset shared on drive

Issue - State: closed - Opened by AnonNoNameAccount over 1 year ago - 1 comment

#39 - Drive dress error

Issue - State: closed - Opened by YanxiZSQ over 1 year ago - 2 comments

#38 - cannot reshape array of size 4 into shape (1,newaxis,8)

Issue - State: closed - Opened by rzr002 over 1 year ago - 5 comments

#37 - meta-llama/Llama-2-7b-hf Model Preparation failed

Issue - State: closed - Opened by rzr002 over 1 year ago - 1 comment

#36 - wiki proportion finally dominates at the end of the pruning stage

Issue - State: closed - Opened by lippman1125 over 1 year ago - 6 comments

#35 - ShearedCodeLLama

Issue - State: closed - Opened by SinanAkkoyun over 1 year ago - 3 comments

#34 - LanguageCrossEntropy logs nan when bash pruning.sh

Issue - State: open - Opened by YanxiZSQ over 1 year ago - 6 comments

#33 - AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'

Issue - State: closed - Opened by YanxiZSQ over 1 year ago - 1 comment

#32 - Pruning crash at iteration 592.

Issue - State: open - Opened by lippman1125 over 1 year ago - 6 comments

#31 - Train metrics/train/github_LanguageCrossEntropy: nan

Issue - State: closed - Opened by lippman1125 over 1 year ago - 2 comments

#30 - Create cleanshm.sh

Pull Request - State: closed - Opened by Longyichen over 1 year ago

#29 - KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?

Issue - State: closed - Opened by SinanAkkoyun over 1 year ago - 2 comments

#28 - Docker Request

Issue - State: closed - Opened by TonyZhanghm over 1 year ago - 1 comment

#27 - Flash-attn dependency issues

Issue - State: closed - Opened by Forival over 1 year ago - 1 comment

#26 - Please share the alpaca generate and eval code and script to reproduce the results shared in

Issue - State: closed - Opened by sanyalsunny111 over 1 year ago - 4 comments

#25 - Finetuning using LoRA

Issue - State: closed - Opened by Nimisha-Pabbichetty over 1 year ago - 5 comments

#24 - Path no use in continue_pretrain.sh

Issue - State: closed - Opened by Longyichen over 1 year ago - 9 comments

#23 - NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet

Issue - State: closed - Opened by Longyichen over 1 year ago - 3 comments

#22 - How much compute will this take?

Issue - State: closed - Opened by fakerybakery over 1 year ago - 7 comments

#21 - sample data generate name

Issue - State: closed - Opened by sunzhe09 over 1 year ago - 6 comments

#20 - Question about ComposerMosaicLlama.forward

Issue - State: closed - Opened by hanlinxuy over 1 year ago - 4 comments

#19 - Composer Model Transform problems encountered when shearing Pythia 1.4b

Issue - State: closed - Opened by Longyichen over 1 year ago - 2 comments

#18 - How can I use it on multiple nodes without slurm?

Issue - State: closed - Opened by wang99711123 over 1 year ago - 1 comment

#17 - Use without flash-attn?

Issue - State: closed - Opened by fakerybakery over 1 year ago - 3 comments

#16 - FileNotFoundError: No such file or directory:"save_hf_to_composer"

Issue - State: closed - Opened by sunzhe09 over 1 year ago - 1 comment

#15 - AssertionError: Currently only supports dynamic loading from each domain for once.

Issue - State: closed - Opened by Longyichen over 1 year ago - 17 comments

#14 - Sample.py error when sampling stackexchage

Issue - State: closed - Opened by Longyichen over 1 year ago - 4 comments

#13 - Scaling Law for predicted loss

Issue - State: closed - Opened by AlpinDale over 1 year ago - 4 comments

#12 - Small typo on Table 2

Issue - State: closed - Opened by lxww302 over 1 year ago - 1 comment

#11 - Dynamic Batch Loading v.s. Domain Reweighting

Issue - State: closed - Opened by lxww302 over 1 year ago - 1 comment

#10 - Can Sheared-LLaMA beat OpenLLaMA v2 significantly with the same amount of compute ?

Issue - State: closed - Opened by lxww302 over 1 year ago - 2 comments

#9 - LanguageCrossEntropy logs nan when bash pruning.sh

Issue - State: closed - Opened by Longyichen over 1 year ago - 7 comments

#8 - License is missing

Issue - State: closed - Opened by casper-hansen over 1 year ago - 1 comment

#7 - TypeError: load_data() missing 1 required positional argument: 'tokenizer_name'

Issue - State: closed - Opened by hanlinxuy over 1 year ago - 3 comments

#6 - Can you provide script without using slurm or sbatch?

Issue - State: closed - Opened by hanlinxuy over 1 year ago - 2 comments

#5 - Update README.md

Pull Request - State: closed - Opened by eltociear over 1 year ago - 1 comment

GitHub / princeton-nlp/LLM-Shearing issues and pull requests