GitHub / opengvlab/omniquant issues and pull requests
#107 - Adapt omniquant to transformers 4.41.0
Issue -
State: closed - Opened by zijunx 3 months ago
- 1 comment
#106 - A single A100-80G can't run Llama-2-70b model?
Issue -
State: open - Opened by JustVelkhana 4 months ago
- 1 comment
#104 - I encounter a error: "AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'",when i run code with llama-1-7b. It happened in int_llama_layer.py: self.rotary_emb = copy.deepcopy(org_module.rotary_emb)
Issue -
State: open - Opened by WX-yh 7 months ago
- 2 comments
#103 - I encounter a error: "AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'",when i run code with llama-1-7b.
Issue -
State: open - Opened by WX-yh 7 months ago
- 1 comment
#102 - Fail to reproduce the result of w2a16 using llama2 7b
Issue -
State: open - Opened by stackByStack 8 months ago
- 1 comment
#101 - Will the qwen2.5 model be supported in the future?
Issue -
State: closed - Opened by qingkongby 9 months ago
- 1 comment
#100 - OmniQuant belong to PTQ or QAT?
Issue -
State: closed - Opened by guojilei 9 months ago
- 1 comment
#99 - OmniQuant belong to PTQ or QAT?
Issue -
State: closed - Opened by guojilei 9 months ago
#98 - Obtaining fake quantized weights used in actual evaluation
Issue -
State: closed - Opened by pavelgolikov 10 months ago
#97 - Add support for Llama3.1
Pull Request -
State: closed - Opened by shubhra 10 months ago
- 1 comment
#96 - Changes to support Llama3.1
Pull Request -
State: closed - Opened by shubhra 11 months ago
#95 - Obtained different PPL for Wikitext and C4 compared to results reported in the paper
Issue -
State: open - Opened by yc2367 11 months ago
- 2 comments
#94 - Performance gap with Llama-2-7B
Issue -
State: closed - Opened by Xzk7 11 months ago
- 1 comment
#93 - The llama-2-7b model can't quant in this code
Issue -
State: closed - Opened by Hzqskywkr 11 months ago
- 3 comments
#92 - How to generalize LET to llama3?
Issue -
State: closed - Opened by zjq0455 11 months ago
#91 - Error when evaluating MMLU
Issue -
State: open - Opened by zjq0455 11 months ago
- 2 comments
#90 - how to enable llama3-8b int4 awq models
Issue -
State: open - Opened by FlexLaughing 11 months ago
#89 - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
Issue -
State: open - Opened by mcpaulgeorge 12 months ago
- 5 comments
#88 - The version of transformers, auto_gptq, autoawq
Issue -
State: open - Opened by zhangfzR 12 months ago
#87 - Quantify tinyllama-1.1B-Chat-v1.0, a CUDA assertion error occurred
Issue -
State: open - Opened by causeof-you about 1 year ago
#86 - [New Feature] Seek MLA Supported by Smooth
Issue -
State: open - Opened by RanchiZhao about 1 year ago
#85 - question about let
Issue -
State: open - Opened by mxjmtxrm about 1 year ago
#84 - [Model Request] MiniCPM
Issue -
State: open - Opened by RanchiZhao about 1 year ago
#83 - The llama-1-65b model seems unstable in this code
Issue -
State: closed - Opened by Xingrun-Xing about 1 year ago
- 2 comments
#82 - Questions about quantization
Issue -
State: closed - Opened by mxjmtxrm about 1 year ago
#81 - Questions about quantization
Issue -
State: open - Opened by mxjmtxrm about 1 year ago
- 1 comment
#80 - How to accelerate the inference speed with real_quant
Issue -
State: closed - Opened by j2kim99 about 1 year ago
- 3 comments
#79 - Which bug do you fix for auto_gptq
Issue -
State: open - Opened by BaohaoLiao about 1 year ago
- 1 comment
#78 - Some questions about the results of weight only quantification in the paper
Issue -
State: closed - Opened by everloom about 1 year ago
#77 - Questions regarding Infusing Omniquant into MLC
Issue -
State: open - Opened by BuildBackBuehler over 1 year ago
- 3 comments
#76 - OPT-30B
Issue -
State: open - Opened by Arthur-Ling over 1 year ago
#75 - Llama-3-8B
Issue -
State: open - Opened by hsb1995 over 1 year ago
- 5 comments
#74 - Is activation get quantized on-the-fly?
Issue -
State: closed - Opened by XA23i over 1 year ago
- 5 comments
#73 - Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode
Issue -
State: closed - Opened by hsb1995 over 1 year ago
- 1 comment
#72 - TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids
Issue -
State: open - Opened by luchangli03 over 1 year ago
#71 - AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary'
Issue -
State: closed - Opened by luchangli03 over 1 year ago
- 1 comment
#70 - W4A4 in llama2-7b
Issue -
State: closed - Opened by chenzx921020 over 1 year ago
- 5 comments
#69 - When reproducing evaluation results for Llama-2-13b w4a4, I got nan
Issue -
State: closed - Opened by NewDriverLee over 1 year ago
- 4 comments
#68 - KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
Issue -
State: closed - Opened by zfstr over 1 year ago
- 2 comments
#67 - Other Task
Issue -
State: open - Opened by hsb1995 over 1 year ago
- 1 comment
#66 - seq_len is deprecated and unused in transformers>=4.38.0
Issue -
State: closed - Opened by Lokshaw-Chau over 1 year ago
- 1 comment
#65 - Checksums didn't match for dataset source files
Issue -
State: closed - Opened by hsb1995 over 1 year ago
- 7 comments
#64 - RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
Issue -
State: closed - Opened by zkf331 over 1 year ago
- 3 comments
#63 - OPT Model Reproduction Discrepancies
Issue -
State: closed - Opened by fantasysee over 1 year ago
- 2 comments
#62 - CUDA extension not installed
Issue -
State: closed - Opened by Arthur-Ling over 1 year ago
- 2 comments
#61 - Difference between fake quant and real quant
Issue -
State: closed - Opened by YihengBrianWu over 1 year ago
- 1 comment
#60 - reproduce evaluation results
Issue -
State: closed - Opened by oujieww over 1 year ago
- 9 comments
#59 - How to properly evaluate W6A6 models using checkpoint from the mode zoo
Issue -
State: closed - Opened by ChengZhang-98 over 1 year ago
- 2 comments
#58 - [WIP][quantize] add gptq post-quantization
Pull Request -
State: open - Opened by xingchensong over 1 year ago
#57 - AutoGPTQ or AutoGPTQ-bugfix?
Issue -
State: closed - Opened by Alvant over 1 year ago
- 8 comments
#56 - [quantizer] add Odyssey-style symmetric quantization
Pull Request -
State: closed - Opened by xingchensong over 1 year ago
- 2 comments
#55 - License
Issue -
State: closed - Opened by fakerybakery over 1 year ago
- 2 comments
#54 - [datautils] fix c4 dataset
Pull Request -
State: closed - Opened by xingchensong over 1 year ago
- 2 comments
#53 - The ckpt of Quantized OPT model is not be found
Issue -
State: open - Opened by liuxy1103 over 1 year ago
- 6 comments
#52 - Quantize Llama-2-Chat Models with Weights and Activation-Quantization
Issue -
State: closed - Opened by DRXD1000 over 1 year ago
- 2 comments
#51 - [Llama-2-7B-chat] ppl of w4a8 is nan
Issue -
State: closed - Opened by xingchensong over 1 year ago
- 4 comments
#50 - How to use AutoGPTQ to achieve real quantization?
Issue -
State: closed - Opened by AboveParadise over 1 year ago
- 3 comments
#49 - Bugfix/attention mask and implementation
Pull Request -
State: closed - Opened by Alvant over 1 year ago
- 1 comment
#48 - [fix] 'QuantLlamaDecoderLayer' object has no attribute 'model_attn'
Pull Request -
State: closed - Opened by xingchensong over 1 year ago
- 1 comment
#47 - [fix] attention_mask may appear None for newer versions of LLaMA
Pull Request -
State: closed - Opened by xingchensong over 1 year ago
- 1 comment
#46 - attention_mask may appear None for newer versions of LLaMA?
Issue -
State: closed - Opened by Alvant over 1 year ago
- 3 comments
#45 - [Model Request] upstage/SOLAR-10.7B-v1.0
Issue -
State: closed - Opened by joseph777111 over 1 year ago
- 1 comment
#44 - TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'
Issue -
State: closed - Opened by xianwujie over 1 year ago
- 1 comment
#43 - Fix GPU memory leak in training loop
Pull Request -
State: closed - Opened by mutichung over 1 year ago
- 1 comment
#42 - Update omniquant.py
Pull Request -
State: closed - Opened by brisker over 1 year ago
- 4 comments
#41 - general question about LLM kv-cache quantization
Issue -
State: closed - Opened by brisker over 1 year ago
- 1 comment
#40 - [Model Request] Mixtral-8x7B-v0.1
Issue -
State: closed - Opened by joseph777111 over 1 year ago
- 3 comments
#39 - AttributeError: 'Attention' object has no attribute 'W_pack'
Issue -
State: open - Opened by yrf200112 over 1 year ago
#38 - potential bug about matmul quantization process?
Issue -
State: closed - Opened by brisker over 1 year ago
- 1 comment
#37 - Quantize LLAMA-2-7b-chat to W4A4
Issue -
State: closed - Opened by nmyuchen over 1 year ago
- 4 comments
#36 - Update omniquant.py
Pull Request -
State: closed - Opened by brisker over 1 year ago
- 1 comment
#35 - Problems with memory usage and model loading
Issue -
State: closed - Opened by Forival over 1 year ago
- 1 comment
#34 - about decode speed and gpu memory usage
Issue -
State: closed - Opened by tro0o over 1 year ago
- 1 comment
#33 - Enforce minimum CLIPMIN value for the scale.
Pull Request -
State: closed - Opened by radi-cho over 1 year ago
- 1 comment
#32 - Quick Clarification Question on C4 PPL
Issue -
State: closed - Opened by HanGuo97 over 1 year ago
- 7 comments
#31 - Loss is NAN, stopping training
Issue -
State: closed - Opened by Forival over 1 year ago
- 2 comments
#30 - Is evaluation on MMLU dataset supported?
Issue -
State: closed - Opened by brisker over 1 year ago
- 13 comments
#29 - RuntimeError when quantize bloom using our code
Issue -
State: open - Opened by Louym over 1 year ago
#28 - Fix ChatModule initalization with model_lib_path argument
Pull Request -
State: closed - Opened by kaushikthedeveloper almost 2 years ago
- 1 comment
#26 - Regarding the Initialization of `smooth_scale` for the Q*K Operation
Issue -
State: closed - Opened by superdocker almost 2 years ago
- 2 comments
#25 - Results Errors
Issue -
State: closed - Opened by yileijin almost 2 years ago
- 10 comments
#24 - Reduce shape for per group weight calibration
Issue -
State: closed - Opened by Alvant almost 2 years ago
- 2 comments
#23 - Failed to compile AutoGPTQ-bugfix
Issue -
State: closed - Opened by caseylai almost 2 years ago
- 1 comment
#22 - How to add a new model for OmniQuant?
Issue -
State: closed - Opened by gesanqiu almost 2 years ago
- 5 comments
#21 - Cannot compile with mlc-llm
Issue -
State: open - Opened by 0x1997 almost 2 years ago
- 2 comments
#20 - Model File Formats: .pth, .bin vs. GGUF
Issue -
State: open - Opened by sebvannistel almost 2 years ago
#19 - Slow decoding compared to AWQ
Issue -
State: closed - Opened by abhinavkulkarni almost 2 years ago
- 7 comments
#18 - Runing quantized models with MLC-LLM error
Issue -
State: closed - Opened by silvacarl2 almost 2 years ago
- 3 comments
#17 - Runing Falcon-180B on a single A100 80GB where/what is main.py?
Issue -
State: closed - Opened by silvacarl2 almost 2 years ago
- 2 comments
#16 - ‼️Llama2-70b not working
Issue -
State: closed - Opened by zhiwei-dong almost 2 years ago
- 8 comments
#15 - mlc notebook provided doesnt run on colab
Issue -
State: closed - Opened by githubpradeep almost 2 years ago
- 3 comments
#14 - falcon 180B generates garbage on A100
Issue -
State: closed - Opened by githubpradeep almost 2 years ago
- 5 comments
#13 - quantize custom model trained on alpaca-liked dataset
Issue -
State: closed - Opened by ghost almost 2 years ago
- 3 comments
#12 - aug_loss option in OmniQuant Scripts
Issue -
State: closed - Opened by MarsJacobs almost 2 years ago
- 15 comments
#11 - How to quantize a llama structure model and run it with sampling process?
Issue -
State: closed - Opened by gesanqiu almost 2 years ago
- 3 comments
#10 - Quant script for large models like 180B and 70B models?
Issue -
State: closed - Opened by yhyu13 almost 2 years ago
- 3 comments
#8 - why quantize opt-1.3b or llama 7b with W8A8 config loss is nan?
Issue -
State: closed - Opened by MeJerry215 almost 2 years ago
- 17 comments
#5 - how to run Android app of release v0.0.1
Issue -
State: closed - Opened by 946166920 almost 2 years ago
- 1 comment
#3 - how to inference in llama.cpp?
Issue -
State: closed - Opened by lucasjinreal almost 2 years ago
- 11 comments