jy-yuan/KIVI issues and pull requests

#28 - Multi GPUs

Issue - State: open - Opened by yisunlp 23 days ago - 5 comments

#27 - Unable to Reproduce Results for LongBench

Issue - State: open - Opened by ilil96 26 days ago - 2 comments

#26 - How can the code support 1bit quantization.

Issue - State: closed - Opened by yuhuixu1993 29 days ago - 2 comments

#25 - Develop

Pull Request - State: open - Opened by Davids048 about 2 months ago

#24 - ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

Issue - State: open - Opened by xzwj1699 3 months ago

#23 - How to understand the code: triton_quantize_and_pack_along_last_dim(value_states_full[:, :, :1, :].contiguous(), self.group_size, self.v_bits)

Issue - State: closed - Opened by chenyehuang 3 months ago - 3 comments

#22 - Difference between gemv_forward_cuda and gemv_forward_cuda_outer_rim?

Issue - State: closed - Opened by yifeikong 3 months ago - 2 comments

#21 - Add missing flash_attn_func import in llama_kivi model

Pull Request - State: closed - Opened by yifeikong 3 months ago

#20 - NameError: name 'flash_attn_func' is not defined

Issue - State: closed - Opened by zwhong714 3 months ago - 1 comment

#19 - [FIX] use flash attention in example.py

Pull Request - State: closed - Opened by Davids048 3 months ago

#18 - The difference in batch size leads to different results in LongBench testing

Issue - State: open - Opened by Felixvillas 4 months ago - 5 comments

#17 - run example.py with llama2-7B-hf only save 500MB kv cache memory conpared to base transformers ?

Issue - State: open - Opened by riou-chen 4 months ago - 2 comments

#16 - CUDA version

Issue - State: closed - Opened by hensiesp32 4 months ago - 4 comments

#15 - Why the model inference slowly when Mistral-7B-Instruct-v0.2 apply the kivi?

Issue - State: closed - Opened by lichongod 4 months ago - 7 comments

#14 - Where is the falcon_kivi?

Issue - State: closed - Opened by Felixvillas 5 months ago - 4 comments

#13 - which commit of lm-eval-harness the lmeval branch is based on?

Issue - State: closed - Opened by condy0919 5 months ago - 3 comments

#12 - An error occurred while using "evaluate. load (" act_match ")"

Issue - State: closed - Opened by Felixvillas 5 months ago - 1 comment

#11 - Which file I need to run to obtain the result in Figure 4？

Issue - State: closed - Opened by Felixvillas 5 months ago - 2 comments

#10 - not support evaluation with ROCM

Issue - State: open - Opened by ym-guan 5 months ago - 1 comment

#9 - Spport for ChatGLM3

Issue - State: open - Opened by redscv 5 months ago - 1 comment

#8 - Provide an accuracy testing interface?

Issue - State: closed - Opened by ascendpoet 5 months ago - 1 comment

#7 - Discrepancy in Reproduced Results for LLaMA2 on "qmsum" and "qasper" tasks.

Issue - State: closed - Opened by ilur98 5 months ago - 2 comments

#6 - W/ or w/o Weight quantization?

Issue - State: closed - Opened by deephanson94 5 months ago - 4 comments

#5 - [fix] add the missing comma in pyproject.toml to enable correct pip i…

Pull Request - State: closed - Opened by wln20 6 months ago - 1 comment

#4 - Integrate KIVI into inference frameworks?

Issue - State: closed - Opened by andakai 6 months ago - 1 comment

#3 - LlamaConfig.attention_dropout does not exist in transformers==4.35.2

Issue - State: closed - Opened by RalphMao 6 months ago - 1 comment

#2 - Could you please open-source the code for the calculation and visualization of the statistic information of KV Cache?

Issue - State: closed - Opened by wln20 7 months ago - 3 comments

#1 - Can this be used with any autogressive model?

Issue - State: closed - Opened by hello-fri-end 7 months ago - 1 comment

Ecosyste.ms: Issues

GitHub / jy-yuan/KIVI issues and pull requests