Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / fasterdecoding/snapkv issues and pull requests
#23 - What happens to the total KV length > max-compacity length during response generation?
Issue -
State: open - Opened by PengWenChen about 1 month ago
- 1 comment
#22 - Group Query Attention
Issue -
State: open - Opened by SimJeg about 2 months ago
- 2 comments
#21 - Question on H2O experiment reproduction
Issue -
State: open - Opened by CUHKSZzxy 4 months ago
#20 - Closed issue
Issue -
State: closed - Opened by JulietLJY 5 months ago
#19 - Could you provide the code for visualization the Hit Rate?
Issue -
State: open - Opened by Dominic789654 5 months ago
#18 - Can snapkv compress kv in case different user questions are posed towards the same context?
Issue -
State: open - Opened by namespace-Pt 5 months ago
- 1 comment
#17 - observation window size and consistency between layers
Issue -
State: closed - Opened by Cooperx521 5 months ago
- 1 comment
#16 - Question on GQA implementation
Issue -
State: open - Opened by cyLi-Tiger 5 months ago
- 1 comment
#15 - Can I use the SnapKV without the flash-attention ?
Issue -
State: closed - Opened by pengshuang 5 months ago
- 1 comment
#14 - What prompt was used in Needle in a Haystack test?
Issue -
State: closed - Opened by 66RING 6 months ago
- 1 comment
#13 - expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3
Issue -
State: closed - Opened by seeyourcell 6 months ago
- 3 comments
#12 - Can't not run longbench!
Issue -
State: open - Opened by HarryWu99 6 months ago
- 3 comments
#11 - why only decode do compress?
Issue -
State: open - Opened by CSEEduanyu 6 months ago
#10 - Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
Issue -
State: closed - Opened by CSEEduanyu 7 months ago
- 1 comment
#10 - Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
Issue -
State: closed - Opened by CSEEduanyu 7 months ago
- 1 comment
#9 - It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.
Issue -
State: closed - Opened by 66RING 7 months ago
- 1 comment
#8 - Observation
Pull Request -
State: closed - Opened by leeyeehoo 7 months ago
#7 - yl: remove unnessecary
Pull Request -
State: closed - Opened by leeyeehoo 7 months ago
#6 - yl: fix a bug
Pull Request -
State: closed - Opened by leeyeehoo 7 months ago
#5 - yl: fix typo
Pull Request -
State: closed - Opened by leeyeehoo 7 months ago
#4 - Grouped query attention implementation
Issue -
State: closed - Opened by guozhiyu 7 months ago
- 1 comment
#3 - maybe a bug in `update_kv` function
Issue -
State: open - Opened by HarryWu99 7 months ago
- 1 comment
#2 - The effect of Clustering via Pooling may be greater?
Issue -
State: open - Opened by HarryWu99 7 months ago
- 1 comment
#1 - Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]
Issue -
State: open - Opened by MarsJacobs 7 months ago
- 8 comments