spcl/quarot issues and pull requests

#63 - Hadamard transform in the mlp_output

Issue - State: open - Opened by seamoonlight-YBY 9 days ago

#62 - The code is different from the illustrations in the paper.

Issue - State: closed - Opened by seamoonlight-YBY 12 days ago - 2 comments

#61 - Can your int4 linear kernel do group-wise quantization?

Issue - State: closed - Opened by liuyt929 22 days ago - 1 comment

#60 - Why is there no clamp function in weight quantization

Issue - State: closed - Opened by guaniu22 about 1 month ago - 1 comment

#59 - Adding support for Llama 3.1 and Llama 3.2 models

Pull Request - State: open - Opened by CryVeck about 2 months ago

#58 - Support more models.

Pull Request - State: open - Opened by JamesTheZ about 2 months ago

#57 - Possible bug in subtraction dimension?

Issue - State: open - Opened by veritas9872 about 2 months ago - 1 comment

#56 - [discussion] Why use randomized H matrix? not just normal H matrix?

Issue - State: closed - Opened by qijiaxing 2 months ago - 1 comment

#55 - Problem when reproducing w4a4kv4 quantization

Issue - State: open - Opened by ZijianYY 2 months ago - 3 comments

#54 - Bump transformers from 4.36.0 to 4.38.0

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies

#53 - Mistake

Pull Request - State: closed - Opened by CryVeck 2 months ago

#52 - Reproducing Table 7 (weight-only quantization results)

Issue - State: closed - Opened by SShock92 2 months ago - 2 comments

#51 - Question of kernels on NVIDIA RTX 3090

Issue - State: closed - Opened by guaniu22 2 months ago - 1 comment

#50 - How to save the quanted model

Issue - State: open - Opened by cquxl 3 months ago - 2 comments

#49 - GPTQ dequantization

Issue - State: closed - Opened by JeevanBhoot 3 months ago - 3 comments

#48 - H100 Support

Issue - State: closed - Opened by carlguo866 3 months ago - 1 comment

#47 - question about quantization group size

Issue - State: closed - Opened by mxjmtxrm 3 months ago - 1 comment

#46 - Question about Rotation

Issue - State: closed - Opened by blgimagineb 3 months ago

#45 - Accuracy drop after rotating model

Issue - State: closed - Opened by mxjmtxrm 3 months ago - 1 comment

#44 - question about Hadamard dimension

Issue - State: closed - Opened by mxjmtxrm 3 months ago - 5 comments

#43 - Reproducing paper Table 8

Issue - State: closed - Opened by mjyun01 4 months ago - 1 comment

#42 - How is perplexity calculated with the KV cache?

Issue - State: closed - Opened by tsengalb99 4 months ago - 1 comment

#41 - [Q] Having not matched size Hadamard matrix

Issue - State: closed - Opened by Coco58323 5 months ago - 5 comments

#40 - apply_exact_had_to_linear for v_proj.bias if v_proj.bias is not None

Issue - State: closed - Opened by dyou-dev 5 months ago - 3 comments

#39 - questions about the rotate

Issue - State: closed - Opened by Gloria2tt 5 months ago - 1 comment

#38 - [Inference speed] Speed up on prefilling stage, slow down on decoding stage

Issue - State: closed - Opened by ChenMnZ 6 months ago - 3 comments

#37 - Inference

Issue - State: closed - Opened by zhentingqi 6 months ago - 2 comments

#36 - A question regarding the rotation matching pairs

Issue - State: closed - Opened by Menace-Dragon 6 months ago - 1 comment

#35 - Mistral support

Issue - State: closed - Opened by DavidePaglieri 6 months ago - 1 comment

#34 - Accuracy drop after `fuse_layer_norms`

Issue - State: closed - Opened by Niko-zyf 6 months ago - 1 comment

#33 - mlp_sizes seem wrong in qlinear_benchmark.py

Issue - State: closed - Opened by yyfcc17 7 months ago - 4 comments

#33 - mlp_sizes seem wrong in qlinear_benchmark.py

Issue - State: closed - Opened by yyfcc17 7 months ago - 4 comments

#33 - mlp_sizes seem wrong in qlinear_benchmark.py

Issue - State: closed - Opened by yyfcc17 7 months ago - 4 comments

#32 - When is online Hadamard applied during evaluation?

Issue - State: closed - Opened by pavelgolikov 7 months ago - 1 comment

#31 - args.distribute_model seems to be undefined

Issue - State: closed - Opened by WeiMa01 7 months ago - 3 comments

#31 - args.distribute_model seems to be undefined

Issue - State: closed - Opened by WeiMa01 7 months ago - 3 comments

#30 - Outputs of OPT models become different after fusing LayerNorm.

Issue - State: closed - Opened by SShock92 7 months ago - 3 comments

#30 - Outputs of OPT models become different after fusing LayerNorm.

Issue - State: closed - Opened by SShock92 7 months ago - 3 comments

#29 - opt model with layernorm, the input of layernorm can use hadamard transform?

Issue - State: closed - Opened by JiangYongYu1 7 months ago - 4 comments

#29 - opt model with layernorm, the input of layernorm can use hadamard transform?

Issue - State: closed - Opened by JiangYongYu1 7 months ago - 4 comments

#28 - Relations with SpinQuant?

Issue - State: closed - Opened by RanchiZhao 7 months ago - 3 comments

#28 - Relations with SpinQuant?

Issue - State: closed - Opened by RanchiZhao 7 months ago - 3 comments

#27 - Does QuaRot only support Llama and OPT style LLM?

Issue - State: closed - Opened by NicoNico6 7 months ago - 1 comment

#27 - Does QuaRot only support Llama and OPT style LLM?

Issue - State: closed - Opened by NicoNico6 7 months ago - 1 comment

#26 - Question about Hadamard transformation and outlier reduction

Issue - State: closed - Opened by KimythAnly 7 months ago - 2 comments

#26 - Question about Hadamard transformation and outlier reduction

Issue - State: closed - Opened by KimythAnly 7 months ago - 2 comments

#25 - Other quantization results of rotated model

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 8 comments

#25 - Other quantization results of rotated model

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 8 comments

#24 - How to get models with only offline rotation (or models for weight-only quantization)

Issue - State: closed - Opened by Tracin 8 months ago - 6 comments

#24 - How to get models with only offline rotation (or models for weight-only quantization)

Issue - State: closed - Opened by Tracin 8 months ago - 6 comments

#23 - Question about exact_had_to_linear

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 1 comment

#23 - Question about exact_had_to_linear

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 1 comment

#22 - accuracy of weight only quantization decrease significantly after weight rotation

Issue - State: closed - Opened by luchangli03 8 months ago - 12 comments

#22 - accuracy of weight only quantization decrease significantly after weight rotation

Issue - State: closed - Opened by luchangli03 8 months ago - 12 comments

#21 - Question about rotation.

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 3 comments

#21 - Question about rotation.

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 3 comments

#20 - How to deal with GQA?

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 2 comments

#20 - How to deal with GQA?

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 2 comments

#19 - multi GPU inference

Issue - State: closed - Opened by hensiesp32 8 months ago - 1 comment

#19 - multi GPU inference

Issue - State: closed - Opened by hensiesp32 8 months ago - 1 comment

#18 - How to get a fake quantized model?

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 1 comment

#18 - How to get a fake quantized model?

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 1 comment

#17 - Fix LayerNorm fusion for tied embeddings

Pull Request - State: closed - Opened by smpanaro 8 months ago - 1 comment

#17 - Fix LayerNorm fusion for tied embeddings

Pull Request - State: closed - Opened by smpanaro 8 months ago - 1 comment

#16 - Wrong result obtained in case of w4a16 quantization？

Issue - State: closed - Opened by hyx1999 8 months ago - 2 comments

#16 - Wrong result obtained in case of w4a16 quantization？

Issue - State: closed - Opened by hyx1999 8 months ago - 2 comments

#15 - Questions related to Compile the QuaRot on CPU and Model Saving

Issue - State: closed - Opened by HuangOwen 8 months ago - 1 comment

#15 - Questions related to Compile the QuaRot on CPU and Model Saving

Issue - State: closed - Opened by HuangOwen 8 months ago - 1 comment

#14 - Question about reproducing Fig.1

Issue - State: closed - Opened by xinghaow99 9 months ago - 4 comments

#14 - Question about reproducing Fig.1

Issue - State: closed - Opened by xinghaow99 9 months ago - 4 comments

#13 - Can we directly load a QuaRot-GPTQ quantized model and do lm_eval evaluation?

Issue - State: closed - Opened by Shuai-Xie 9 months ago - 1 comment

#13 - Can we directly load a QuaRot-GPTQ quantized model and do lm_eval evaluation?

Issue - State: closed - Opened by Shuai-Xie 9 months ago - 1 comment

#12 - opt model ppl bug

Issue - State: closed - Opened by zhsky2017 9 months ago - 3 comments

#12 - opt model ppl bug

Issue - State: closed - Opened by zhsky2017 9 months ago - 3 comments

#11 - Questions on online quantization

Issue - State: closed - Opened by lzhangzz 9 months ago - 4 comments

#11 - Questions on online quantization

Issue - State: closed - Opened by lzhangzz 9 months ago - 4 comments

#10 - Online hadamard bug

Issue - State: closed - Opened by nailimixaM 9 months ago

#10 - Online hadamard bug

Issue - State: closed - Opened by nailimixaM 9 months ago

#9 - Some questions

Issue - State: closed - Opened by catid 10 months ago - 1 comment

#9 - Some questions

Issue - State: closed - Opened by catid 10 months ago - 1 comment

#8 - Question about whether it is necessary to fuse layernorm to linear

Issue - State: closed - Opened by Oliver-ss 10 months ago - 14 comments

#7 - [Small Bug] The embedding fusion is not necessary for LLaMA models.

Issue - State: closed - Opened by ChenMnZ 10 months ago - 6 comments

#6 - [question] Is it possible to quantize Mixtral?

Issue - State: closed - Opened by accupham 10 months ago - 3 comments

GitHub / spcl/quarot issues and pull requests