casper-hansen/AutoAWQ issues and pull requests

#714 - "Clarification on Multimodal Model Quantization and Default Calibration Dataset"

Issue - State: open - Opened by donghong1 3 days ago

#713 - while quantizing the llava-7b model,an error occurred: AttributeError: 'NoneType' object has no attribute 'shape'

Issue - State: open - Opened by donghong1 3 days ago

#712 - Support for Newer Mistral models for fused.

Issue - State: open - Opened by OPPEYRADY 3 days ago

#711 - Fixed issue with the slow implementation warning

Pull Request - State: open - Opened by Egor-Krivov 9 days ago

#710 - Support for QWEN 2.5 VL

Issue - State: open - Opened by ViswaVasanth-2002 9 days ago

#709 - Add license to quantized DeepSeek model

Issue - State: closed - Opened by BaohaoLiao 10 days ago - 1 comment

#708 - Support InternVL2.5 VLM

Issue - State: open - Opened by BenasdTW 11 days ago

#707 - Error after loading deepseekv3_cpu

Issue - State: open - Opened by Tortoise17 13 days ago - 3 comments

#706 - Add Qwen2.5-VL

Pull Request - State: open - Opened by seungwoos 14 days ago - 17 comments

#705 - Add computed position embedding external

Pull Request - State: closed - Opened by seungwoos 15 days ago

#704 - AutoAWQ Windows Fix – Get It Running!

Issue - State: open - Opened by WackyArt 21 days ago

#703 - ModuleNotFoundError: No module named 'torch'

Issue - State: open - Opened by Xephier102 23 days ago - 4 comments

#702 - 3-bit or 6-bit quantization

Issue - State: open - Opened by khurramusman-10xe 30 days ago - 3 comments

#701 - Question on Datasets, GPU Resources, and Time for Quantizing deepseek-r1-distill-qwen Models

Issue - State: closed - Opened by yechenzhi 30 days ago - 1 comment

#700 - bump to 028

Pull Request - State: closed - Opened by casper-hansen about 1 month ago

#699 - fix workflow build

Pull Request - State: closed - Opened by casper-hansen about 1 month ago

#698 - automatically load dtype from config

Pull Request - State: closed - Opened by casper-hansen about 1 month ago

#697 - add ability to define torch_dtype

Pull Request - State: closed - Opened by casper-hansen about 1 month ago

#696 - fix bug when using FSDP

Pull Request - State: closed - Opened by kaixuanliu about 1 month ago - 2 comments

#695 - Enable triton on XPU devices

Pull Request - State: closed - Opened by Egor-Krivov about 1 month ago

#694 - add support for telechat2

Pull Request - State: open - Opened by 1096125073 about 1 month ago

#693 - Load Qwen/Qwen2-7B-Instruct-AWQ error: RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Issue - State: open - Opened by liuzhishan about 1 month ago - 3 comments

#692 - Failed to convert Qwen2-VL-7B-Instruct LORA model

Issue - State: open - Opened by songyang23 about 1 month ago - 2 comments

#691 - Same Memory (VRAM) with different batch_size, Prefill Length, Decode Length.

Issue - State: open - Opened by rayzr0123 about 1 month ago

#690 - keep getting error regarding missing positional argument 'attention_mask'

Issue - State: open - Opened by BBC-Esq about 1 month ago - 8 comments

#689 - Multi-GPU/CPU offloading is still not working as intended

Issue - State: open - Opened by haitham-boxmind about 1 month ago

#688 - Added DeepSeek V3 support.

Pull Request - State: closed - Opened by LagPixelLOL about 2 months ago - 28 comments

#687 - llama3.1 quantization

Issue - State: open - Opened by sunjianxide about 2 months ago

#686 - Deepseek

Issue - State: open - Opened by ehartford about 2 months ago - 2 comments

#685 - Support AutoAWQForSequenceClassification ？

Issue - State: open - Opened by ShelterWFF about 2 months ago

#684 - Any chance at supporting Nemotron?

Issue - State: open - Opened by ambroser53 2 months ago

#683 - prepare_inputs_for_generation and position_embeddings

Issue - State: open - Opened by ambroser53 2 months ago

#682 - Please support cohere2 model

Issue - State: open - Opened by Orion-zhen 2 months ago

#681 - [WIP] Pixtral

Pull Request - State: open - Opened by casper-hansen 2 months ago

#680 - improve type hinting and fix use_cache

Pull Request - State: closed - Opened by casper-hansen 2 months ago

#679 - How to use multiple GPU nodes during quantization

Issue - State: open - Opened by ghntd 2 months ago - 1 comment

#678 - Latest version of `autoawq[kernels]` requires CUDA dev toolchain

Issue - State: closed - Opened by davidmezzetti 2 months ago - 7 comments

#677 - How to convert AWQ matmul to onnxruntime MatmulNBits

Issue - State: open - Opened by fuyao2024 2 months ago

#676 - Support building on the RISC-V platform.

Pull Request - State: open - Opened by alter-xp 2 months ago

#675 - can I quantize a model to 8bit model using autoawq?

Issue - State: open - Opened by cute149q 2 months ago

#674 - bump to 0.2.7.post3

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#673 - pin huggingface_hub

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#672 - install hub main branch

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#671 - Fix missing embed_tokens

Pull Request - State: closed - Opened by casper-hansen 3 months ago - 1 comment

#670 - Question: Error when substituting the quantized matrix multiplication operator.

Issue - State: open - Opened by grysgreat 3 months ago - 2 comments

#669 - AttributeError: 'NoneType' object has no attribute 'shape' for Llava models

Issue - State: closed - Opened by NicolasDrapier 3 months ago - 2 comments

#668 - multi-gpu fix

Pull Request - State: closed - Opened by casper-hansen 3 months ago - 4 comments

#667 - qwen2.5-72b-instruct quant has error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1

Issue - State: closed - Opened by ArlanCooper 3 months ago - 20 comments

#666 - Failed to convert qwen1.5-32b model！

Issue - State: closed - Opened by tensorflowt 3 months ago - 2 comments

#665 - vllm output garbled characters

Issue - State: closed - Opened by dcdmm 3 months ago - 4 comments

#664 - fix "Expected all tensors to be on the same device"

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#663 - Bug when i quantize llama3.1 70b in multiple gpu(A40 *5)

Issue - State: closed - Opened by Paxwell-Paxwell 3 months ago - 1 comment

#662 - Bugs in AWQ models deployed in multiple GPUs.

Issue - State: closed - Opened by Phoenix-Shen 3 months ago - 3 comments

#660 - 'Qwen2Model' object has no attribute 'rotary_emb'

Issue - State: closed - Opened by Alex-DeepL 3 months ago - 3 comments

#659 - fix exaone import

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#658 - ModuleNotFoundError: No module named 'transformers.models.exaone' - on 0.2.7.post2

Issue - State: closed - Opened by gauravjain14 3 months ago - 3 comments

#657 - probability tensor contains either inf, nan or element < 0

Issue - State: open - Opened by alvaropastor7 3 months ago - 2 comments

#656 - Fused attention: Switch to Flash Decoding

Pull Request - State: closed - Opened by casper-hansen 3 months ago - 2 comments

#655 - "llama_model_load: error loading model: check_tensor_dims: tensor 'token_embd.weight' not found" after computing AWQ scales and applying them to the gguf model

Issue - State: open - Opened by Autism-al 3 months ago - 1 comment

#654 - Add support for Phi (3.5) MoE

Pull Request - State: open - Opened by danieldk 3 months ago

#653 - Post release 2 - All additional packages goes into extras

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#652 - Cannot copy out of meta tensor; no data! when half process

Issue - State: closed - Opened by Paxwell-Paxwell 3 months ago - 1 comment

#651 - Add EXAONE support

Pull Request - State: closed - Opened by lgai-exaone 3 months ago - 5 comments

#650 - post release 1

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#649 - Minimum of torch 2.2.0 during build

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#648 - Only build once

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#647 - New release (0.2.7) + Fix build

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#646 - how to push model to huggingface ?

Issue - State: open - Opened by new-Sunset-shimmer 3 months ago - 2 comments

#645 - awq version inference slower than unquantization version fp16 on qwen-14b-chat

Issue - State: closed - Opened by shanying2017 3 months ago - 4 comments

#644 - Replace custom sharding with save_torch_state_dict from huggingface_hub

Pull Request - State: closed - Opened by casper-hansen 3 months ago

#643 - "zero_point":False in quant_fig dict

Issue - State: open - Opened by Cornelii 3 months ago - 1 comment

#642 - Does AutoAWQ support to quantize GLM-4-9B-Chat and ChatGLM3-6B two models?

Issue - State: open - Opened by shawn9977 3 months ago

#641 - qwen2_vl isn't supported yet.

Issue - State: closed - Opened by thesby 4 months ago - 10 comments

#640 - DeepSeek-Coder-V2-Lite-Instruct Error!

Issue - State: open - Opened by tohnee 4 months ago - 1 comment

#639 - Mixtral training

Issue - State: open - Opened by ChristianPala 4 months ago

#638 - Example for quantize in multiple GPU's

Issue - State: closed - Opened by devops724 4 months ago - 2 comments

#637 - AutoAWQ smooth + INC RTN (HPU)

Pull Request - State: open - Opened by yiliu30 4 months ago

#636 - After using autoawq to quantify the model, an error occurs when inferring the model

Issue - State: open - Opened by xuanzhangyang 4 months ago

#635 - ImportError: /root/anaconda3/envs/test2/lib/python3.10/site-packages/awq_inference_engine.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Issue - State: open - Opened by Jackson-cj 4 months ago - 1 comment

#634 - [Feature Request]. Support Reward Model

Issue - State: open - Opened by liziniu 4 months ago

#633 - Cannot install AutoAWQ with Poetry

Issue - State: closed - Opened by kubni 4 months ago - 2 comments

#632 - Cannot install AutoAWQ

Issue - State: closed - Opened by Vedant-Bisen 4 months ago - 1 comment

#631 - Enable Intel GPU path and lora finetune and change examples to support different devices

Pull Request - State: closed - Opened by jiqing-feng 4 months ago - 5 comments

#630 - fix for "two devices" issue due to RoPE changes

Pull Request - State: closed - Opened by davedgd 5 months ago - 13 comments

#629 - Unable to Run on Colab

Issue - State: open - Opened by JosephGatto 5 months ago - 1 comment

#627 - problem in gemma 2 27b

Issue - State: open - Opened by Alireza3242 5 months ago

#626 - How to Split AWQ Weights?

Issue - State: open - Opened by Azure-Tang 5 months ago

#625 - Model Support

Issue - State: open - Opened by SinanAkkoyun 5 months ago - 1 comment

#624 - Can AutoAWQ support W8A16 quantization?

Issue - State: open - Opened by wangzhongren-code 5 months ago

#623 - MMLU eval failed in ROCM

Issue - State: open - Opened by chunniunai220ml 5 months ago

#622 - Will t5 support be included?

Issue - State: open - Opened by Jason202268 5 months ago - 1 comment

#621 - How is Llava quantized ?

Issue - State: open - Opened by Abhranta 5 months ago - 4 comments

#620 - Optimize Triton for MI300 (2-5x higher throughput)

Pull Request - State: closed - Opened by casper-hansen 5 months ago

#619 - Why do you handle the dataset in this way？

Issue - State: open - Opened by lzcchl 5 months ago - 1 comment

#608 - AWQ Triton kernels. Make `autoawq-kernels` optional.

Pull Request - State: closed - Opened by casper-hansen 5 months ago - 3 comments

#607 - device_map defaults to auto

Pull Request - State: closed - Opened by casper-hansen 6 months ago

#605 - support minicpm3.0

Pull Request - State: closed - Opened by LDLINGLINGLING 6 months ago - 2 comments

#604 - Modified the data processing method to adapt to the awq of minicpmv2.6

Pull Request - State: closed - Opened by LDLINGLINGLING 6 months ago - 1 comment

#602 - assert self.in_features % self.group_size == 0

Issue - State: open - Opened by LDLINGLINGLING 6 months ago - 3 comments

#599 - add qwen2vl support

Pull Request - State: closed - Opened by kq-chen 6 months ago - 1 comment

GitHub / casper-hansen/AutoAWQ issues and pull requests