princeton-nlp/MeZO issues and pull requests

#44 - About gradient accumulation implementation

Issue - State: open - Opened by xvyaward 11 days ago

#43 - zeroshot results of roberta-large

Issue - State: open - Opened by pickpppcc 26 days ago

#42 - loss turns to 0 after several steps for llama2

Issue - State: open - Opened by liuxiaozhu01 2 months ago - 5 comments

#41 - Question about checkpointing

Issue - State: open - Opened by zhaoaustin 2 months ago - 1 comment

#40 - Full finetuning with Roberta-Large

Issue - State: open - Opened by aparna-aketi 4 months ago - 5 comments

#39 - Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA)

Issue - State: open - Opened by Liu-M-H 4 months ago - 3 comments

#38 - question for Linear Probing

Issue - State: open - Opened by zhaoaustin 5 months ago - 2 comments

#37 - question about MeZO-adam

Issue - State: open - Opened by zhaoaustin 6 months ago - 1 comment

#36 - Can you share the dataset class of SST-5, SNLI, TREC datasets?

Issue - State: open - Opened by Ziiiirem 6 months ago - 5 comments

#35 - roberta-large zero shot

Issue - State: open - Opened by itongggg 7 months ago

#34 - can not reproduce the the result of roberta large on dataste sst-2

Issue - State: open - Opened by itongggg 7 months ago - 2 comments

#33 - Maybe need a requirement.txt file to facilitate environment preparation？

Issue - State: open - Opened by lepangdan 9 months ago - 1 comment

#32 - In which file is the code implemented by the algorithm？

Issue - State: open - Opened by 1llss 11 months ago - 1 comment

#31 - Zero Order implementation does not converge in CIFAR-10 dataset.

Issue - State: open - Opened by amritansh6 12 months ago - 1 comment

#31 - Zero Order implementation does not converge in CIFAR-10 dataset.

Issue - State: open - Opened by amritansh6 12 months ago - 1 comment

#30 - Standard FT does not work

Issue - State: open - Opened by YaNgZhAnG-V5 about 1 year ago - 4 comments

#29 - max_seq_length and max_seq_len confusion

Issue - State: open - Opened by davidqqq about 1 year ago - 1 comment

#28 - Cannot reproduce some results of OPT

Issue - State: closed - Opened by WangFei-2019 about 1 year ago - 3 comments

#27 - How to use MeZO in training a simple CIFAR-10 model

Issue - State: open - Opened by Cascol-Chen about 1 year ago - 3 comments

#26 - Add a pip-installable, simple implementation of MeZO (along with a distributed impl. and some tests)

Pull Request - State: open - Opened by lebrice about 1 year ago - 3 comments

#25 - Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA)

Issue - State: open - Opened by Yanjun-Zhao about 1 year ago - 8 comments

#25 - Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA)

Issue - State: open - Opened by Yanjun-Zhao about 1 year ago - 8 comments

#24 - Inconsistent results of MEZO for RoBERTa-large on SST-2

Issue - State: open - Opened by han678 over 1 year ago

#23 - MeZO on ChatGLM6B

Issue - State: closed - Opened by CharonsPluto over 1 year ago - 2 comments

#22 - LoRA & p-tuning with multi-GPU

Issue - State: open - Opened by haozhouamzn over 1 year ago - 3 comments

#21 - Cannot reproduce the results for RoBERTa on SST-2

Issue - State: open - Opened by TrueNobility303 over 1 year ago - 1 comment

#20 - llama2 problem

Issue - State: open - Opened by ghost over 1 year ago - 1 comment

#19 - ValueError: The model did not return a loss from the inputs, only the following keys: logits,past_key_values. For reference, the inputs it received are input_ids,attention_mask.

Issue - State: closed - Opened by thistleknot over 1 year ago - 2 comments

#18 - AttributeError: 'TrainingArguments' object has no attribute 'linear_probing'

Issue - State: closed - Opened by thistleknot over 1 year ago - 4 comments

#17 - Nanogpt implementation

Issue - State: open - Opened by thistleknot over 1 year ago - 3 comments

#16 - Cannot reproduce the results of OPT on SST2

Issue - State: closed - Opened by sglucas over 1 year ago - 15 comments

#15 - Results on WSC and WIC datasets cannot be reproduced on OPT-13B with MeZO

Issue - State: open - Opened by MathIsAll over 1 year ago - 5 comments

#14 - About experimentical setting of 1000 examples

Issue - State: closed - Opened by sglucas over 1 year ago - 2 comments

#13 - MeZO on continue pre-training

Issue - State: open - Opened by shan23chen over 1 year ago - 1 comment

#12 - deepspeed reference on colab

Issue - State: closed - Opened by huu4ontocord over 1 year ago - 2 comments

#11 - Getting a RuntimeError after training with mezo

Issue - State: open - Opened by sowmaster over 1 year ago - 6 comments

#10 - Which trainer to use

Issue - State: open - Opened by HaniItani over 1 year ago - 7 comments

#9 - MeZO running script for roberta-large is not working

Issue - State: closed - Opened by sanyalsunny111 over 1 year ago - 1 comment

#8 - gpt_neo not supported

Issue - State: closed - Opened by thistleknot over 1 year ago - 8 comments

#7 - Best parameters found for datasets

Issue - State: open - Opened by vvvm23 over 1 year ago - 3 comments

#6 - Not convergent in custom dataset.

Issue - State: open - Opened by jcao-ai over 1 year ago - 9 comments

#5 - Can you provide more details about how to run the code?

Issue - State: closed - Opened by kiseliu over 1 year ago - 1 comment

#4 - MeZo can be used in NLG tasks?

Issue - State: open - Opened by anonNo2 over 1 year ago - 5 comments

#3 - Fix typo in run.py

Pull Request - State: closed - Opened by eltociear over 1 year ago

#2 - Impact of Dropout?

Issue - State: closed - Opened by helpmefindaname over 1 year ago - 1 comment

#1 - Any benchmark on (MeZO) v.s. (ZeRO + CpuOffload + Grad checkpointing) ?

Issue - State: closed - Opened by xingchensong over 1 year ago - 2 comments

GitHub / princeton-nlp/MeZO issues and pull requests