opengvlab/omniquant issues and pull requests

#107 - Adapt omniquant to transformers 4.41.0

Issue - State: closed - Opened by zijunx 3 months ago - 1 comment

#106 - A single A100-80G can't run Llama-2-70b model?

Issue - State: open - Opened by JustVelkhana 4 months ago - 1 comment

#105 - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Issue - State: open - Opened by forcekkk 4 months ago

#104 - I encounter a error: "AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'",when i run code with llama-1-7b. It happened in int_llama_layer.py: self.rotary_emb = copy.deepcopy(org_module.rotary_emb)

Issue - State: open - Opened by WX-yh 7 months ago - 2 comments

#103 - I encounter a error: "AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'",when i run code with llama-1-7b.

Issue - State: open - Opened by WX-yh 7 months ago - 1 comment

#102 - Fail to reproduce the result of w2a16 using llama2 7b

Issue - State: open - Opened by stackByStack 8 months ago - 1 comment

#101 - Will the qwen2.5 model be supported in the future？

Issue - State: closed - Opened by qingkongby 9 months ago - 1 comment

#100 - OmniQuant belong to PTQ or QAT？

Issue - State: closed - Opened by guojilei 9 months ago - 1 comment

#99 - OmniQuant belong to PTQ or QAT？

Issue - State: closed - Opened by guojilei 9 months ago

#98 - Obtaining fake quantized weights used in actual evaluation

Issue - State: closed - Opened by pavelgolikov 10 months ago

#97 - Add support for Llama3.1

Pull Request - State: closed - Opened by shubhra 10 months ago - 1 comment

#96 - Changes to support Llama3.1

Pull Request - State: closed - Opened by shubhra 11 months ago

#95 - Obtained different PPL for Wikitext and C4 compared to results reported in the paper

Issue - State: open - Opened by yc2367 11 months ago - 2 comments

#94 - Performance gap with Llama-2-7B

Issue - State: closed - Opened by Xzk7 11 months ago - 1 comment

#93 - The llama-2-7b model can't quant in this code

Issue - State: closed - Opened by Hzqskywkr 11 months ago - 3 comments

#92 - How to generalize LET to llama3?

Issue - State: closed - Opened by zjq0455 11 months ago

#91 - Error when evaluating MMLU

Issue - State: open - Opened by zjq0455 11 months ago - 2 comments

#90 - how to enable llama3-8b int4 awq models

Issue - State: open - Opened by FlexLaughing 11 months ago

#89 - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

Issue - State: open - Opened by mcpaulgeorge 12 months ago - 5 comments

#88 - The version of transformers, auto_gptq, autoawq

Issue - State: open - Opened by zhangfzR 12 months ago

#87 - Quantify tinyllama-1.1B-Chat-v1.0, a CUDA assertion error occurred

Issue - State: open - Opened by causeof-you about 1 year ago

#86 - [New Feature] Seek MLA Supported by Smooth

Issue - State: open - Opened by RanchiZhao about 1 year ago

#85 - question about let

Issue - State: open - Opened by mxjmtxrm about 1 year ago

#84 - [Model Request] MiniCPM

Issue - State: open - Opened by RanchiZhao about 1 year ago

#83 - The llama-1-65b model seems unstable in this code

Issue - State: closed - Opened by Xingrun-Xing about 1 year ago - 2 comments

#82 - Questions about quantization

Issue - State: closed - Opened by mxjmtxrm about 1 year ago

#81 - Questions about quantization

Issue - State: open - Opened by mxjmtxrm about 1 year ago - 1 comment

#80 - How to accelerate the inference speed with real_quant

Issue - State: closed - Opened by j2kim99 about 1 year ago - 3 comments

#79 - Which bug do you fix for auto_gptq

Issue - State: open - Opened by BaohaoLiao about 1 year ago - 1 comment

#78 - Some questions about the results of weight only quantification in the paper

Issue - State: closed - Opened by everloom about 1 year ago

#77 - Questions regarding Infusing Omniquant into MLC

Issue - State: open - Opened by BuildBackBuehler over 1 year ago - 3 comments

#76 - OPT-30B

Issue - State: open - Opened by Arthur-Ling over 1 year ago

#75 - Llama-3-8B

Issue - State: open - Opened by hsb1995 over 1 year ago - 5 comments

#74 - Is activation get quantized on-the-fly?

Issue - State: closed - Opened by XA23i over 1 year ago - 5 comments

#73 - Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode

Issue - State: closed - Opened by hsb1995 over 1 year ago - 1 comment

#72 - TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids

Issue - State: open - Opened by luchangli03 over 1 year ago

#71 - AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary'

Issue - State: closed - Opened by luchangli03 over 1 year ago - 1 comment

#70 - W4A4 in llama2-7b

Issue - State: closed - Opened by chenzx921020 over 1 year ago - 5 comments

#69 - When reproducing evaluation results for Llama-2-13b w4a4, I got nan

Issue - State: closed - Opened by NewDriverLee over 1 year ago - 4 comments

#68 - KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

Issue - State: closed - Opened by zfstr over 1 year ago - 2 comments

#67 - Other Task

Issue - State: open - Opened by hsb1995 over 1 year ago - 1 comment

#66 - seq_len is deprecated and unused in transformers>=4.38.0

Issue - State: closed - Opened by Lokshaw-Chau over 1 year ago - 1 comment

#65 - Checksums didn't match for dataset source files

Issue - State: closed - Opened by hsb1995 over 1 year ago - 7 comments

#64 - RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).

Issue - State: closed - Opened by zkf331 over 1 year ago - 3 comments

#63 - OPT Model Reproduction Discrepancies

Issue - State: closed - Opened by fantasysee over 1 year ago - 2 comments

#62 - CUDA extension not installed

Issue - State: closed - Opened by Arthur-Ling over 1 year ago - 2 comments

#61 - Difference between fake quant and real quant

Issue - State: closed - Opened by YihengBrianWu over 1 year ago - 1 comment

#60 - reproduce evaluation results

Issue - State: closed - Opened by oujieww over 1 year ago - 9 comments

#59 - How to properly evaluate W6A6 models using checkpoint from the mode zoo

Issue - State: closed - Opened by ChengZhang-98 over 1 year ago - 2 comments

#58 - [WIP][quantize] add gptq post-quantization

Pull Request - State: open - Opened by xingchensong over 1 year ago

#57 - AutoGPTQ or AutoGPTQ-bugfix?

Issue - State: closed - Opened by Alvant over 1 year ago - 8 comments

#56 - [quantizer] add Odyssey-style symmetric quantization

Pull Request - State: closed - Opened by xingchensong over 1 year ago - 2 comments

#55 - License

Issue - State: closed - Opened by fakerybakery over 1 year ago - 2 comments

#54 - [datautils] fix c4 dataset

Pull Request - State: closed - Opened by xingchensong over 1 year ago - 2 comments

#53 - The ckpt of Quantized OPT model is not be found

Issue - State: open - Opened by liuxy1103 over 1 year ago - 6 comments

#52 - Quantize Llama-2-Chat Models with Weights and Activation-Quantization

Issue - State: closed - Opened by DRXD1000 over 1 year ago - 2 comments

#51 - [Llama-2-7B-chat] ppl of w4a8 is nan

Issue - State: closed - Opened by xingchensong over 1 year ago - 4 comments

#50 - How to use AutoGPTQ to achieve real quantization?

Issue - State: closed - Opened by AboveParadise over 1 year ago - 3 comments

#49 - Bugfix/attention mask and implementation

Pull Request - State: closed - Opened by Alvant over 1 year ago - 1 comment

#48 - [fix] 'QuantLlamaDecoderLayer' object has no attribute 'model_attn'

Pull Request - State: closed - Opened by xingchensong over 1 year ago - 1 comment

#47 - [fix] attention_mask may appear None for newer versions of LLaMA

Pull Request - State: closed - Opened by xingchensong over 1 year ago - 1 comment

#46 - attention_mask may appear None for newer versions of LLaMA?

Issue - State: closed - Opened by Alvant over 1 year ago - 3 comments

#45 - [Model Request] upstage/SOLAR-10.7B-v1.0

Issue - State: closed - Opened by joseph777111 over 1 year ago - 1 comment

#44 - TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'

Issue - State: closed - Opened by xianwujie over 1 year ago - 1 comment

#43 - Fix GPU memory leak in training loop

Pull Request - State: closed - Opened by mutichung over 1 year ago - 1 comment

#42 - Update omniquant.py

Pull Request - State: closed - Opened by brisker over 1 year ago - 4 comments

#41 - general question about LLM kv-cache quantization

Issue - State: closed - Opened by brisker over 1 year ago - 1 comment

#40 - [Model Request] Mixtral-8x7B-v0.1

Issue - State: closed - Opened by joseph777111 over 1 year ago - 3 comments

#39 - AttributeError: 'Attention' object has no attribute 'W_pack'

Issue - State: open - Opened by yrf200112 over 1 year ago

#38 - potential bug about matmul quantization process?

Issue - State: closed - Opened by brisker over 1 year ago - 1 comment

#37 - Quantize LLAMA-2-7b-chat to W4A4

Issue - State: closed - Opened by nmyuchen over 1 year ago - 4 comments

#36 - Update omniquant.py

Pull Request - State: closed - Opened by brisker over 1 year ago - 1 comment

#35 - Problems with memory usage and model loading

Issue - State: closed - Opened by Forival over 1 year ago - 1 comment

#34 - about decode speed and gpu memory usage

Issue - State: closed - Opened by tro0o over 1 year ago - 1 comment

#33 - Enforce minimum CLIPMIN value for the scale.

Pull Request - State: closed - Opened by radi-cho over 1 year ago - 1 comment

#32 - Quick Clarification Question on C4 PPL

Issue - State: closed - Opened by HanGuo97 over 1 year ago - 7 comments

#31 - Loss is NAN, stopping training

Issue - State: closed - Opened by Forival over 1 year ago - 2 comments

#30 - Is evaluation on MMLU dataset supported?

Issue - State: closed - Opened by brisker over 1 year ago - 13 comments

#29 - RuntimeError when quantize bloom using our code

Issue - State: open - Opened by Louym over 1 year ago

#28 - Fix ChatModule initalization with model_lib_path argument

Pull Request - State: closed - Opened by kaushikthedeveloper almost 2 years ago - 1 comment

#26 - Regarding the Initialization of `smooth_scale` for the Q*K Operation

Issue - State: closed - Opened by superdocker almost 2 years ago - 2 comments

#25 - Results Errors

Issue - State: closed - Opened by yileijin almost 2 years ago - 10 comments

#24 - Reduce shape for per group weight calibration

Issue - State: closed - Opened by Alvant almost 2 years ago - 2 comments

#23 - Failed to compile AutoGPTQ-bugfix

Issue - State: closed - Opened by caseylai almost 2 years ago - 1 comment

#22 - How to add a new model for OmniQuant?

Issue - State: closed - Opened by gesanqiu almost 2 years ago - 5 comments

#21 - Cannot compile with mlc-llm

Issue - State: open - Opened by 0x1997 almost 2 years ago - 2 comments

#20 - Model File Formats: .pth, .bin vs. GGUF

Issue - State: open - Opened by sebvannistel almost 2 years ago

#19 - Slow decoding compared to AWQ

Issue - State: closed - Opened by abhinavkulkarni almost 2 years ago - 7 comments

#18 - Runing quantized models with MLC-LLM error

Issue - State: closed - Opened by silvacarl2 almost 2 years ago - 3 comments

#17 - Runing Falcon-180B on a single A100 80GB where/what is main.py?

Issue - State: closed - Opened by silvacarl2 almost 2 years ago - 2 comments

#16 - ‼️Llama2-70b not working

Issue - State: closed - Opened by zhiwei-dong almost 2 years ago - 8 comments

#15 - mlc notebook provided doesnt run on colab

Issue - State: closed - Opened by githubpradeep almost 2 years ago - 3 comments

#14 - falcon 180B generates garbage on A100

Issue - State: closed - Opened by githubpradeep almost 2 years ago - 5 comments

#13 - quantize custom model trained on alpaca-liked dataset

Issue - State: closed - Opened by ghost almost 2 years ago - 3 comments

#12 - aug_loss option in OmniQuant Scripts

Issue - State: closed - Opened by MarsJacobs almost 2 years ago - 15 comments

#11 - How to quantize a llama structure model and run it with sampling process?

Issue - State: closed - Opened by gesanqiu almost 2 years ago - 3 comments

#10 - Quant script for large models like 180B and 70B models?

Issue - State: closed - Opened by yhyu13 almost 2 years ago - 3 comments

#8 - why quantize opt-1.3b or llama 7b with W8A8 config loss is nan?

Issue - State: closed - Opened by MeJerry215 almost 2 years ago - 17 comments

#5 - how to run Android app of release v0.0.1

Issue - State: closed - Opened by 946166920 almost 2 years ago - 1 comment

#3 - how to inference in llama.cpp?

Issue - State: closed - Opened by lucasjinreal almost 2 years ago - 11 comments

GitHub / opengvlab/omniquant issues and pull requests