mit-han-lab/streaming-llm issues and pull requests

#88 - why recompute can differ from window attention?

Issue - State: open - Opened by habaohaba about 1 month ago

#87 - im confused with the PPL of sliding window with recomputation

Issue - State: open - Opened by coderwayne3025 about 1 month ago

#86 - Can you provide the code related to the visualization in the paper?

Issue - State: open - Opened by micelvrice 2 months ago

#85 - 【question】Does streaming-llm focus on accelerating decoding stage? How about the prefilling stage？

Issue - State: open - Opened by Code24Man 4 months ago

#85 - 【question】Does streaming-llm focus on accelerating decoding stage? How about the prefilling stage？

Issue - State: open - Opened by Code24Man 4 months ago

#84 - Tokenizer issue with Transformers 4.33.0

Issue - State: open - Opened by PedemonteGiacomo 5 months ago

#84 - Tokenizer issue with Transformers 4.33.0

Issue - State: open - Opened by PedemonteGiacomo 5 months ago

#83 - Evaluation code and dataset release inquiry

Issue - State: open - Opened by DerrickYLJ 5 months ago

#83 - Evaluation code and dataset release inquiry

Issue - State: open - Opened by DerrickYLJ 5 months ago

#82 - How to visualize attention logits?

Issue - State: closed - Opened by OStars 5 months ago - 1 comment

#82 - How to visualize attention logits?

Issue - State: closed - Opened by OStars 5 months ago - 1 comment

#81 - what is the difference between window attention and sliding window recomputation

Issue - State: closed - Opened by seeyourcell 6 months ago

#80 - Progressively decreasing attention windows

Issue - State: open - Opened by Vorlent 6 months ago

#80 - Progressively decreasing attention windows

Issue - State: open - Opened by Vorlent 6 months ago

#79 - Using LLaVA model

Issue - State: open - Opened by JesseZZZZZ 6 months ago

#79 - Using LLaVA model

Issue - State: open - Opened by JesseZZZZZ 6 months ago

#78 - why `max_gen_len` is needed when considering `space_needed`?

Issue - State: open - Opened by Mr-lonely0 8 months ago

#78 - why `max_gen_len` is needed when considering `space_needed`?

Issue - State: open - Opened by Mr-lonely0 8 months ago

#77 - How to evaluate ppl?

Issue - State: open - Opened by Jiawei-Yang 8 months ago - 2 comments

#77 - How to evaluate ppl?

Issue - State: open - Opened by Jiawei-Yang 8 months ago - 2 comments

#76 - StreamEval

Issue - State: open - Opened by Zhangchaoran000 10 months ago

#76 - StreamEval

Issue - State: open - Opened by Zhangchaoran000 10 months ago

#75 - Support mistral-7b?

Issue - State: open - Opened by spring1915 10 months ago

#75 - Support mistral-7b?

Issue - State: open - Opened by spring1915 10 months ago

#74 - Run with start_size=0 looks just fine

Issue - State: open - Opened by cyr0930 11 months ago

#73 - question about positions encoding when apply ROLLING KV CACHE WITH ATTENTION SINKS

Issue - State: closed - Opened by bugm 11 months ago - 1 comment

#73 - question about positions encoding when apply ROLLING KV CACHE WITH ATTENTION SINKS

Issue - State: closed - Opened by bugm 11 months ago - 1 comment

#72 - Error happened

Issue - State: open - Opened by ForrestPi 11 months ago - 2 comments

#72 - Error happened

Issue - State: open - Opened by ForrestPi 11 months ago - 2 comments

#71 - Questions about ARC datasets

Issue - State: open - Opened by Zoeyyao27 11 months ago

#71 - Questions about ARC datasets

Issue - State: open - Opened by Zoeyyao27 11 months ago

#70 - How much GPU memory needed to run example ?

Issue - State: open - Opened by fangming-he 12 months ago - 3 comments

#70 - How much GPU memory needed to run example ?

Issue - State: open - Opened by fangming-he 12 months ago - 3 comments

#69 - Is there the way of parallel prompt ?

Issue - State: open - Opened by DavideHe 12 months ago

#69 - Is there the way of parallel prompt ?

Issue - State: open - Opened by DavideHe 12 months ago

#68 - Question about attention sink arising in pretrained models

Issue - State: open - Opened by kevinli573 12 months ago

#67 - Request for Code and Details on Figures 2 and 7

Issue - State: open - Opened by ZhouZineng 12 months ago

#66 - Questions Related to the Application and Results of Attention Sinks After the Paper

Issue - State: open - Opened by dsdanielpark about 1 year ago

#66 - Questions Related to the Application and Results of Attention Sinks After the Paper

Issue - State: open - Opened by dsdanielpark about 1 year ago

#65 - Questions Regarding "Sink Tokens"

Issue - State: open - Opened by clarenceluo78 about 1 year ago

#64 - Doubts in "run_streaming_llama.py" file

Issue - State: open - Opened by Rishab9991 about 1 year ago

#64 - Doubts in "run_streaming_llama.py" file

Issue - State: open - Opened by Rishab9991 about 1 year ago

#63 - Question about Naive Sliding Window

Issue - State: closed - Opened by kevinli573 about 1 year ago - 2 comments

#63 - Question about Naive Sliding Window

Issue - State: closed - Opened by kevinli573 about 1 year ago - 2 comments

#62 - why starting sink token is not a special token '\n'?

Issue - State: closed - Opened by dhcode-cpp about 1 year ago - 2 comments

#61 - Results for Section 3.2 Rolling KV Cache (Without Pretraining)

Issue - State: open - Opened by timljj about 1 year ago - 1 comment

#60 - The position id for q

Issue - State: open - Opened by ofhwei about 1 year ago - 1 comment

#59 - The reason for the importance of the initial token.

Issue - State: open - Opened by freyamom about 1 year ago

#58 - [Feature Request] Support InternLM Model

Issue - State: open - Opened by vansin about 1 year ago - 1 comment

#57 - Can support to ChatGLM2?

Issue - State: open - Opened by KareEnges about 1 year ago

#56 - Enable explictly setting transformer model cache

Pull Request - State: open - Opened by JiaxuanYou about 1 year ago

#55 - question about Table 1 in paper

Issue - State: open - Opened by AresXD about 1 year ago - 1 comment

#54 - question about initial tokens

Issue - State: open - Opened by chaojiewang94 about 1 year ago - 2 comments

#53 - While streaming with sinks, how does the framework change the positional encodings of the KV cache without having to multiply with the Key and Value matrices?

Issue - State: open - Opened by Bhuvanesh09 about 1 year ago - 4 comments

#52 - Finetuning a model in the streaming mode ?

Issue - State: closed - Opened by MohamedAliRashad about 1 year ago - 1 comment

#51 - question about re-computation

Issue - State: closed - Opened by ysanimals about 1 year ago - 4 comments

#50 - Implementation of lama2 7b chat hf model

Issue - State: open - Opened by MuhammadIshaq-AI about 1 year ago - 7 comments

#49 - Implementing lama2 7b

Issue - State: closed - Opened by MuhammadIshaq-AI about 1 year ago

#48 - Is code's position wrong with "kv_cache.evict_for_space" ?

Issue - State: closed - Opened by DavideHe about 1 year ago - 2 comments

#47 - some question about paper

Issue - State: closed - Opened by Vincentyua about 1 year ago - 1 comment

#46 - Does past_key_values be repeatedly compute?

Issue - State: open - Opened by freyamom about 1 year ago - 5 comments

#45 - How to use streaming llm to train a new model? is there any sample code . thansk

Issue - State: closed - Opened by mega-cqz about 1 year ago - 1 comment

#44 - I'm (A Bit) Suspicious of Table 3.

Issue - State: closed - Opened by FrederickGeek8 about 1 year ago - 1 comment

#43 - Questions on the demo results

Issue - State: closed - Opened by BitCalSaul about 1 year ago - 2 comments

#42 - Question on intuition of "attention sink" and "alibi PE"

Issue - State: closed - Opened by bowencohere about 1 year ago - 3 comments

#41 - Question about long input and difference between streaming-llm and dense attention.

Issue - State: closed - Opened by hxs91 about 1 year ago - 2 comments

#40 - RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Issue - State: closed - Opened by chnl about 1 year ago - 2 comments

#39 - Question about evaluation results and demo

Issue - State: closed - Opened by hsm1997 about 1 year ago - 2 comments

#38 - How to answer the question in the middle of long input

Issue - State: open - Opened by yangzhj53 about 1 year ago

#37 - RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU

Issue - State: open - Opened by ZexinLi0w0 about 1 year ago - 4 comments

#36 - Questions about "Run Streaming Llama Chatbot"

Issue - State: closed - Opened by ChuanhongLi about 1 year ago - 3 comments

#35 - Can support to codellama34b?

Issue - State: closed - Opened by willshion about 1 year ago - 1 comment

#34 - Can support to Qwen14B?

Issue - State: closed - Opened by ChenTao98 about 1 year ago - 1 comment

#33 - Confused with four attention mechanism and their performance mentioned by paper

Issue - State: closed - Opened by weizhenhuan about 1 year ago - 5 comments

#32 - The k_seq_dim and v_seq_dim in StartRecentKVCache look related to the type of model

Issue - State: open - Opened by wangxiaochun520 about 1 year ago - 2 comments

#31 - Model paths randomly set

Issue - State: closed - Opened by HyperUpscale about 1 year ago - 1 comment

#30 - 测试了没有提速哇，咋回事呢？

Issue - State: closed - Opened by xxm1668 about 1 year ago - 3 comments

#29 - can support to Baichuan2?

Issue - State: open - Opened by luzhongqiu about 1 year ago

#28 - 有木有类似chatgpt的调用接口？

Issue - State: closed - Opened by xxm1668 about 1 year ago - 1 comment

#27 - How to generate longer token streams?

Issue - State: open - Opened by GenTxt about 1 year ago - 3 comments

#26 - b979594a04f1bbefe1ff21eb8affacef2a186d25

Issue - State: closed - Opened by ghost about 1 year ago

#25 - Strim

Issue - State: closed - Opened by ghost about 1 year ago

#24 - Comparison with SWA in Mistral

Issue - State: open - Opened by casper-hansen about 1 year ago - 12 comments

#23 - output

Issue - State: closed - Opened by 21pl about 1 year ago

#22 - wrong

Issue - State: closed - Opened by QingChengLineOne about 1 year ago - 3 comments

#21 - add suport codellama

Issue - State: closed - Opened by willshion about 1 year ago - 1 comment

#20 - Streaming example: Move input_ids to model device rather than "cuda"

Pull Request - State: closed - Opened by tomaarsen about 1 year ago - 1 comment

#19 - hi

Issue - State: closed - Opened by Kompiuter89 about 1 year ago

#18 - Metal Support

Issue - State: closed - Opened by jordo1138 about 1 year ago - 7 comments

#17 - I keep getting a 403 forbidden

Issue - State: closed - Opened by odfhgodhfighdf about 1 year ago

#16 - Update mt_bench.jsonl

Pull Request - State: closed - Opened by t562 about 1 year ago

#15 - [Feature Request] Release StreamEval dataset and evaluation code in OpenCompass

Issue - State: open - Opened by vansin about 1 year ago - 2 comments

#14 - TypeError: llama_pos_shift_attention_forward() got an unexpected keyword argument 'padding_mask'

Issue - State: closed - Opened by MartinKratochvilProgramy about 1 year ago - 4 comments

#13 - Have you run any passkey retrieval tests on streaming-llm?

Issue - State: open - Opened by RonanKMcGovern about 1 year ago - 2 comments

#12 - Questions on "streaming-llm" Paper

Issue - State: closed - Opened by llsj14 about 1 year ago - 2 comments

#11 - 'CUDA_VISIBLE_DEVICES' is not recognized as an internal or external command, operable program or batch file.

Issue - State: closed - Opened by IntrovertsBedroom about 1 year ago - 1 comment

#10 - Convert demo video from MOV to MP4

Pull Request - State: closed - Opened by cosmojg about 1 year ago

#9 - The video included in the README does not play in Firefox

Issue - State: closed - Opened by cosmojg about 1 year ago

#8 - Google Colab installation

Issue - State: closed - Opened by narita63755930 about 1 year ago - 10 comments

#7 - window_size attention pretrain

Issue - State: closed - Opened by wawpaopao about 1 year ago - 3 comments

GitHub / mit-han-lab/streaming-llm issues and pull requests