CStanKonrad/long_llama issues and pull requests

#25 - Compared to RAG techniques

Issue - State: open - Opened by leezythu 9 months ago

#24 - It's questionable whether the context window has truly been expanded？

Issue - State: open - Opened by Vincentyua 9 months ago

#23 - Help: questions about training on 8k input text length

Issue - State: closed - Opened by Force1ess 10 months ago - 2 comments

#22 - Fix typo

Pull Request - State: closed - Opened by isaacbmiller 11 months ago - 1 comment

#21 - Where is the learnable temperature parameter in cross_batch_attention?

Issue - State: closed - Opened by MarkYangjiayi 11 months ago - 1 comment

#20 - Need clarification on token limit of input used for fine tuning

Issue - State: open - Opened by lokesh-iterate 12 months ago - 2 comments

#19 - 0-shot long-context summarization / QA inference

Issue - State: open - Opened by shi-kejian 12 months ago - 4 comments

#18 - How to integrate the method with GQA?

Issue - State: closed - Opened by NickGao96 12 months ago - 1 comment

#17 - utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM

Issue - State: open - Opened by myname36 12 months ago - 1 comment

#16 - I have some questions

Issue - State: open - Opened by dziulatex 12 months ago - 1 comment

#15 - How much vram needed to finetune 3b model? Is 12gb enough?

Issue - State: open - Opened by universewill 12 months ago - 1 comment

#14 - CrossBatch details in appendix A.2

Issue - State: closed - Opened by hxs91 about 1 year ago - 1 comment

#13 - FoT can only be used for pre-training, can't it be used for instruction fine-tuning?

Issue - State: open - Opened by wujiekd about 1 year ago

#12 - How is the contrastive data pipeline implemented?

Issue - State: open - Opened by MarkYangjiayi about 1 year ago - 8 comments

#11 - Where do i find some function like:

Issue - State: closed - Opened by HuXinjing about 1 year ago - 2 comments

#10 - Code for zero-shot arxiv evaluation

Issue - State: open - Opened by bronyayang about 1 year ago - 1 comment

#9 - Support for gradient_checkpointing

Issue - State: open - Opened by Richar-Du about 1 year ago - 3 comments

#8 - About the use of rotary position coding.

Issue - State: open - Opened by tianyabanbu about 1 year ago - 2 comments

#7 - Does each token requires KNN search during inference?

Issue - State: open - Opened by noanti about 1 year ago - 3 comments

#6 - Comparison with other tuning methods

Issue - State: closed - Opened by FLLLIGHT about 1 year ago - 1 comment

#5 - How's the speed droping when length get large compare with vanilla llama?

Issue - State: open - Opened by lucasjinreal about 1 year ago - 11 comments

#4 - FoT attention and the scaling trick

Issue - State: open - Opened by StrangeTcy about 1 year ago - 3 comments

#3 - Would LongNet be easily applied to the attention with FoT

Issue - State: open - Opened by jebarpg about 1 year ago - 1 comment

#2 - How would you go about instruction finetuning?

Issue - State: open - Opened by jordancole21 about 1 year ago - 13 comments

#1 - Finetuning code?

Issue - State: open - Opened by StrangeTcy about 1 year ago - 7 comments

Ecosyste.ms: Issues

GitHub / CStanKonrad/long_llama issues and pull requests