mobiusml/hqq issues and pull requests

#145 - Bug fix: Specify device/map_location when loading weights using device="cpu"

Pull Request - State: closed - Opened by aikkala 6 days ago - 3 comments

#144 - quantization with transformers of RWKV/v6-Finch-1B6-HF

Issue - State: open - Opened by blap 10 days ago - 3 comments

#143 - How to quantize a model and use vllm to do the inference

Issue - State: open - Opened by ZeleiShao 16 days ago - 1 comment

#142 - Inquire about questions related to quantitative analysis

Issue - State: closed - Opened by xiezhipeng-git 19 days ago - 1 comment

#141 - Question about the fused quantized kernel

Issue - State: closed - Opened by cat538 23 days ago - 3 comments

#140 - Custom Quantization Configurations error

Issue - State: closed - Opened by 2U1 28 days ago

#139 - Quantized aria with vllm serving

Issue - State: open - Opened by argentum047101 about 1 month ago - 2 comments

#138 - Quantizing the model makes slower

Issue - State: closed - Opened by 2U1 about 1 month ago - 26 comments

#137 - Loading Quantized Model on 2 GPU's

Issue - State: open - Opened by mxtsai about 1 month ago - 6 comments

#136 - eval_mmlu ?

Issue - State: open - Opened by mistletoe1024 about 2 months ago - 11 comments

#135 - not run

Issue - State: open - Opened by werruww about 2 months ago - 6 comments

#134 - Saving quantized Aria weights

Issue - State: open - Opened by leon-seidel 3 months ago - 2 comments

#133 - model(aria): do not restrict sdpa to MATH in prefill phase

Pull Request - State: closed - Opened by xffxff 3 months ago - 1 comment

#130 - 8bit + Aten + compile

Issue - State: open - Opened by zhangy659 3 months ago - 7 comments

#129 - cache_size_limit reached

Issue - State: closed - Opened by zhangy659 3 months ago - 22 comments

#128 - 4bit slower?

Issue - State: closed - Opened by zhangy659 3 months ago - 3 comments

#127 - [rfc][dont merge] Use the skip_guard_eval stance to remove torch.compile guard overhead

Pull Request - State: open - Opened by anijain2305 3 months ago - 3 comments

#126 - Fix SDPA context manager perf regression

Pull Request - State: closed - Opened by msaroufim 3 months ago - 1 comment

#125 - Support for HQQ Quantization: Compatibility with LLava and Qwen Models?

Issue - State: closed - Opened by NEWbie0709 4 months ago - 10 comments

#124 - Group size and restrictions: documentation and implementation contradict each other

Issue - State: open - Opened by Maykeye 4 months ago - 5 comments

#123 - slow loading process of pretrained model for finetuning in transformers

Issue - State: closed - Opened by jiaqiw09 4 months ago - 4 comments

#122 - KeyError: 'offload_meta'

Issue - State: closed - Opened by kadirnar 4 months ago - 1 comment

#121 - Fix filename in `setup_torch.py`

Pull Request - State: closed - Opened by larin92 5 months ago - 8 comments

#120 - CUDA error when trying to use llama3.1 8B 4bit quantized model sample

Issue - State: closed - Opened by PatrickDahlin 5 months ago - 8 comments

#119 - integrated into gpt-fast

Issue - State: closed - Opened by kaizizzzzzz 5 months ago - 1 comment

#118 - Hqq vs gguf

Issue - State: closed - Opened by blap 5 months ago - 3 comments

#116 - torch.compile() the quantization method

Pull Request - State: open - Opened by rationalism 5 months ago - 6 comments

#115 - question about fine tune 1bit-quanitzed model

Issue - State: closed - Opened by zxbjushuai 5 months ago - 35 comments

#114 - Issue when loading the quantized model

Issue - State: closed - Opened by NEWbie0709 5 months ago - 5 comments

#113 - Question about Quantization

Issue - State: closed - Opened by NEWbie0709 5 months ago - 4 comments

#112 - docs: update Readme.md

Pull Request - State: closed - Opened by eltociear 5 months ago

#111 - Quesiton on the speed for generating the response

Issue - State: closed - Opened by NEWbie0709 5 months ago - 18 comments

#110 - `hqq/backends/torchao.py` line 177, KeyError: 'scale'

Issue - State: closed - Opened by egorsmkv 5 months ago - 13 comments

#109 - zero and scale quant

Issue - State: closed - Opened by kaizizzzzzz 6 months ago - 1 comment

#108 - RuntimeError: Expected in.dtype() == at::kInt to be true, but got false.

Issue - State: closed - Opened by egorsmkv 6 months ago - 9 comments

#107 - TypeError: Object of type dtype is not JSON serializable

Issue - State: closed - Opened by zxbjushuai 6 months ago - 11 comments

#106 - Add recommended inductor config for speedup

Pull Request - State: closed - Opened by yiliu30 6 months ago - 1 comment

#105 - Warning: failed to import the BitBlas backend

Issue - State: closed - Opened by jinz2014 6 months ago - 7 comments

#104 - Easy way to run lm evaluation harness

Issue - State: closed - Opened by pythonLoader 6 months ago - 1 comment

#103 - Expected in.dtype() == at::kInt to be true, but got false

Issue - State: closed - Opened by jonashaag 6 months ago - 14 comments

#102 - Bug of the saved model when applying zero and scale quantization

Issue - State: closed - Opened by kaizizzzzzz 6 months ago - 1 comment

#101 - Support Gemma quantization

Issue - State: closed - Opened by kaizizzzzzz 6 months ago - 2 comments

#100 - Weight Sharding

Issue - State: closed - Opened by winglian 6 months ago - 2 comments

#99 - RuntimeError: Expected in.dtype() == at::kInt to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Issue - State: closed - Opened by kadirnar 6 months ago - 1 comment

#98 - Use GPTQModel for GPTQ quantization: 2x faster + better PPL

Pull Request - State: closed - Opened by Qubitium 7 months ago - 2 comments

#97 - 3-bit quantization weight data type issue

Issue - State: closed - Opened by BeichenHuang 7 months ago - 10 comments

#96 - About the implentation of .cpu()

Issue - State: open - Opened by reflectionie 7 months ago - 1 comment

#95 - OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory

Issue - State: closed - Opened by kadirnar 7 months ago - 1 comment

#94 - bitblas introduces dependency on CUDA version

Issue - State: closed - Opened by zodiacg 7 months ago - 3 comments

#93 - Add way to save quantize config and can be loaded again

Pull Request - State: closed - Opened by fahadh4ilyas 7 months ago - 8 comments

#92 - module 'torch.library' has no attribute 'custom_op'

Issue - State: closed - Opened by fahadh4ilyas 7 months ago - 4 comments

#91 - Fix hf load

Pull Request - State: closed - Opened by fahadh4ilyas 7 months ago - 3 comments

#90 - 2-bit quantization representation

Issue - State: closed - Opened by kaizizzzzzz 7 months ago - 3 comments

#89 - Weird problem in loading quantized_model + lora_adpter

Issue - State: closed - Opened by kaizizzzzzz 7 months ago

#88 - 1 bit inference

Issue - State: closed - Opened by kaizizzzzzz 7 months ago - 4 comments

#87 - Group_Size setting

Issue - State: closed - Opened by kaizizzzzzz 8 months ago - 1 comment

#86 - Activation quantization

Issue - State: closed - Opened by kaizizzzzzz 8 months ago - 9 comments

#85 - hqq+ lora ValueError || ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True'

Issue - State: closed - Opened by tellyoung 8 months ago - 3 comments

#84 - Is HQQLinearLoRAWithFakeQuant differentiable?

Issue - State: closed - Opened by lippman1125 8 months ago - 1 comment

#83 - Question about quantization.

Issue - State: closed - Opened by mxjmtxrm 8 months ago - 2 comments

#82 - Running HQQ Quantized Models on CPU

Issue - State: closed - Opened by 49Simon 8 months ago - 3 comments

#81 - AttributeError: 'HQQLinearTorchWeightOnlynt4' object has no attribute 'weight'

Issue - State: closed - Opened by ChuanhongLi 8 months ago - 7 comments

#80 - [Question] Model Outputting Gibberish After Quantization

Issue - State: closed - Opened by DefinitlyEvil 8 months ago - 4 comments

#79 - AttributeError: 'LlamaForCausalLM' object has no attribute '_setup_cache'

Issue - State: closed - Opened by ChuanhongLi 8 months ago - 3 comments
Labels: bug

#78 - HQQ for convolutional layers

Issue - State: closed - Opened by danishansari 8 months ago - 6 comments

#77 - prepare_for_inference error

Issue - State: closed - Opened by BeichenHuang 8 months ago - 17 comments

#76 - No module named 'hqq.engine' Error.

Issue - State: closed - Opened by yixuantt 8 months ago - 2 comments

#75 - Not able to save quantized model

Issue - State: closed - Opened by BeichenHuang 9 months ago - 5 comments

#74 - Can the quantization process be on CPU?

Issue - State: closed - Opened by mxjmtxrm 9 months ago - 4 comments

#73 - Does it support Hqq optimization algorithm in diffusion models?

Issue - State: closed - Opened by kadirnar 9 months ago - 1 comment

#72 - Compatibility Issue: TypeError for Union Type Hints with Python Versions Below 3.10

Issue - State: closed - Opened by hjh0119 9 months ago - 1 comment

#71 - Add multi-gpu support for `from_quantized` call

Issue - State: closed - Opened by mobicham 9 months ago - 1 comment
Labels: enhancement

#70 - Problem in load from saved model

Issue - State: closed - Opened by uisikdag 9 months ago - 2 comments

#69 - axis fix

Pull Request - State: closed - Opened by envomp 9 months ago - 2 comments

#67 - pass along cache_dir during snapshot download

Pull Request - State: closed - Opened by andysalerno 9 months ago - 1 comment

#66 - Performance of quantized model

Issue - State: closed - Opened by thhung 9 months ago - 1 comment

#65 - Issue with torchao patching with loaded model

Issue - State: closed - Opened by rohit-gupta 9 months ago - 8 comments
Labels: bug

#64 - torch.compile() for quantized model

Issue - State: closed - Opened by DHKim0428 9 months ago - 3 comments

#62 - How to load quantized model with flash_attn?

Issue - State: closed - Opened by mxjmtxrm 10 months ago - 2 comments

#61 - load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained?

Issue - State: closed - Opened by icoicqico 10 months ago - 12 comments

#59 - Supported Model in README

Issue - State: closed - Opened by sanjeev-bhandari 10 months ago - 1 comment

#58 - smaple code doesn't run

Issue - State: closed - Opened by LiangA 10 months ago - 6 comments

#57 - directly loading weights in specified device

Pull Request - State: closed - Opened by viraatdas 10 months ago - 9 comments

#56 - HQQ + Brevitas

Issue - State: closed - Opened by Giuseppe5 10 months ago - 1 comment
Labels: question

#55 - Issue with HQQLinear Layer in Stable Diffusion Model on Aten Backend

Issue - State: closed - Opened by DHKim0428 10 months ago - 7 comments

#54 - Readme save_quantized issue

Issue - State: closed - Opened by BeichenHuang 10 months ago - 1 comment

#52 - Support MPS

Issue - State: closed - Opened by benglewis 10 months ago - 5 comments

#51 - Initializing the model from state_dict

Pull Request - State: closed - Opened by envomp 10 months ago - 6 comments

#50 - Initializing the model from state_dict

Issue - State: closed - Opened by envomp 10 months ago - 3 comments

#49 - Request for amd support

Issue - State: closed - Opened by Wintoplay 10 months ago - 5 comments

#48 - forward cache_dir in HQQWrapper.from_quantized()

Pull Request - State: closed - Opened by MarkBenjamin 10 months ago - 1 comment

#47 - TypeError when load from_pretrain

Issue - State: closed - Opened by ghost 10 months ago - 10 comments

#46 - tensorflow or keras implementation

Issue - State: closed - Opened by patelprateek 10 months ago - 2 comments
Labels: enhancement

#45 - Difference between blog post and implementation

Issue - State: closed - Opened by dacorvo 10 months ago - 1 comment

#44 - Why does the 2bit 34b model take up 19GB of GPU memory

Issue - State: closed - Opened by Minami-su 10 months ago - 7 comments

#43 - How to accelerate the inference speed of 1bit+lora model

Issue - State: closed - Opened by Minami-su 10 months ago - 4 comments
Labels: enhancement

#42 - How to merge lora with 1bit model?

Issue - State: closed - Opened by Minami-su 10 months ago - 1 comment

#41 - An error occurred when I was training a 1bit model using lora........(element 0 of tensors does not require grad and does not have a grad_fn)

Issue - State: closed - Opened by Minami-su 10 months ago - 16 comments
Labels: help wanted

#40 - transfer learning?

Issue - State: closed - Opened by NickyDark1 10 months ago - 2 comments
Labels: question

#39 - `.to` is not supported for HQQ-quantized models

Issue - State: closed - Opened by Abdullah-kwl 10 months ago - 5 comments
Labels: help wanted

GitHub / mobiusml/hqq issues and pull requests