Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / fasterdecoding/snapkv issues and pull requests
#28 - Bug on Qwen2-VL
Issue -
State: open - Opened by LiJunscs 23 days ago
#27 - The first generation token output sees the whole cache key and value
Issue -
State: open - Opened by PengWenChen 27 days ago
- 3 comments
#26 - Llama-3 Implementation
Issue -
State: closed - Opened by kunlun531 2 months ago
#25 - why not use the last token for kv cache compression
Issue -
State: open - Opened by Arist12 2 months ago
#24 - Question: is key_state_compressed used for inference?
Issue -
State: open - Opened by jq-wei 2 months ago
- 1 comment
#23 - What happens to the total KV length > max-compacity length during response generation?
Issue -
State: open - Opened by PengWenChen 3 months ago
- 1 comment
#22 - Group Query Attention
Issue -
State: open - Opened by SimJeg 4 months ago
- 4 comments
#21 - Question on H2O experiment reproduction
Issue -
State: open - Opened by CUHKSZzxy 6 months ago
#20 - Closed issue
Issue -
State: closed - Opened by JulietLJY 7 months ago
#19 - Could you provide the code for visualization the Hit Rate?
Issue -
State: open - Opened by Dominic789654 7 months ago
#18 - Can snapkv compress kv in case different user questions are posed towards the same context?
Issue -
State: open - Opened by namespace-Pt 7 months ago
- 1 comment
#17 - observation window size and consistency between layers
Issue -
State: closed - Opened by Cooperx521 8 months ago
- 1 comment
#16 - Question on GQA implementation
Issue -
State: open - Opened by cyLi-Tiger 8 months ago
- 1 comment
#15 - Can I use the SnapKV without the flash-attention ?
Issue -
State: closed - Opened by pengshuang 8 months ago
- 1 comment
#14 - What prompt was used in Needle in a Haystack test?
Issue -
State: closed - Opened by 66RING 8 months ago
- 1 comment
#13 - expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3
Issue -
State: closed - Opened by seeyourcell 8 months ago
- 5 comments
#12 - Can't not run longbench!
Issue -
State: open - Opened by HarryWu99 8 months ago
- 3 comments
#11 - why only decode do compress?
Issue -
State: open - Opened by CSEEduanyu 9 months ago
#10 - Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
Issue -
State: closed - Opened by CSEEduanyu 9 months ago
- 1 comment
#10 - Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
Issue -
State: closed - Opened by CSEEduanyu 9 months ago
- 1 comment
#9 - It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.
Issue -
State: closed - Opened by 66RING 9 months ago
- 1 comment
#8 - Observation
Pull Request -
State: closed - Opened by leeyeehoo 9 months ago
#7 - yl: remove unnessecary
Pull Request -
State: closed - Opened by leeyeehoo 9 months ago
#6 - yl: fix a bug
Pull Request -
State: closed - Opened by leeyeehoo 9 months ago
#5 - yl: fix typo
Pull Request -
State: closed - Opened by leeyeehoo 9 months ago
#4 - Grouped query attention implementation
Issue -
State: closed - Opened by guozhiyu 9 months ago
- 1 comment
#3 - maybe a bug in `update_kv` function
Issue -
State: open - Opened by HarryWu99 9 months ago
- 1 comment
#2 - The effect of Clustering via Pooling may be greater?
Issue -
State: open - Opened by HarryWu99 9 months ago
- 1 comment
#1 - Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]
Issue -
State: open - Opened by MarsJacobs 9 months ago
- 8 comments