Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / CStanKonrad/long_llama issues and pull requests
#25 - Compared to RAG techniques
Issue -
State: open - Opened by leezythu 9 months ago
#24 - It's questionable whether the context window has truly been expanded?
Issue -
State: open - Opened by Vincentyua 9 months ago
#23 - Help: questions about training on 8k input text length
Issue -
State: closed - Opened by Force1ess 10 months ago
- 2 comments
#22 - Fix typo
Pull Request -
State: closed - Opened by isaacbmiller 11 months ago
- 1 comment
#21 - Where is the learnable temperature parameter in cross_batch_attention?
Issue -
State: closed - Opened by MarkYangjiayi 11 months ago
- 1 comment
#20 - Need clarification on token limit of input used for fine tuning
Issue -
State: open - Opened by lokesh-iterate 12 months ago
- 2 comments
#19 - 0-shot long-context summarization / QA inference
Issue -
State: open - Opened by shi-kejian 12 months ago
- 4 comments
#18 - How to integrate the method with GQA?
Issue -
State: closed - Opened by NickGao96 12 months ago
- 1 comment
#17 - utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM
Issue -
State: open - Opened by myname36 12 months ago
- 1 comment
#16 - I have some questions
Issue -
State: open - Opened by dziulatex 12 months ago
- 1 comment
#15 - How much vram needed to finetune 3b model? Is 12gb enough?
Issue -
State: open - Opened by universewill 12 months ago
- 1 comment
#14 - CrossBatch details in appendix A.2
Issue -
State: closed - Opened by hxs91 about 1 year ago
- 1 comment
#13 - FoT can only be used for pre-training, can't it be used for instruction fine-tuning?
Issue -
State: open - Opened by wujiekd about 1 year ago
#12 - How is the contrastive data pipeline implemented?
Issue -
State: open - Opened by MarkYangjiayi about 1 year ago
- 8 comments
#11 - Where do i find some function like:
Issue -
State: closed - Opened by HuXinjing about 1 year ago
- 2 comments
#10 - Code for zero-shot arxiv evaluation
Issue -
State: open - Opened by bronyayang about 1 year ago
- 1 comment
#9 - Support for gradient_checkpointing
Issue -
State: open - Opened by Richar-Du about 1 year ago
- 3 comments
#8 - About the use of rotary position coding.
Issue -
State: open - Opened by tianyabanbu about 1 year ago
- 2 comments
#7 - Does each token requires KNN search during inference?
Issue -
State: open - Opened by noanti about 1 year ago
- 3 comments
#6 - Comparison with other tuning methods
Issue -
State: closed - Opened by FLLLIGHT about 1 year ago
- 1 comment
#5 - How's the speed droping when length get large compare with vanilla llama?
Issue -
State: open - Opened by lucasjinreal about 1 year ago
- 11 comments
#4 - FoT attention and the scaling trick
Issue -
State: open - Opened by StrangeTcy about 1 year ago
- 3 comments
#3 - Would LongNet be easily applied to the attention with FoT
Issue -
State: open - Opened by jebarpg about 1 year ago
- 1 comment
#2 - How would you go about instruction finetuning?
Issue -
State: open - Opened by jordancole21 about 1 year ago
- 13 comments
#1 - Finetuning code?
Issue -
State: open - Opened by StrangeTcy about 1 year ago
- 7 comments