panqiwei/autogptq issues and pull requests

#412 - [BUG] Wrong generations in batch mode with exllamav2

Issue - State: closed - Opened by gingsi about 1 year ago - 5 comments
Labels: bug

#411 - Fix windows (no triton) and cpu-only support

Pull Request - State: closed - Opened by fxmarty about 1 year ago - 1 comment

#410 - Improve message about buffer size in exllama v1 backend

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#409 - Fix CPU inference

Pull Request - State: closed - Opened by yangw1234 about 1 year ago - 2 comments

#408 - Fix quantize method with None mask

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#407 - Fix windows support

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#406 - Using Exllama backend requires all the modules to be on GPU - how?

Issue - State: closed - Opened by tigerinus about 1 year ago - 5 comments
Labels: bug

#405 - Allow specifying GPU used for quantisation, overriding hardcoded cuda:0

Pull Request - State: open - Opened by TheBloke about 1 year ago - 1 comment

#404 - [BUG] After upgrading to last version A new error apeared

Issue - State: closed - Opened by ParisNeo about 1 year ago - 4 comments
Labels: bug

#403 - [BUG] ImportError: DLL load failed while importing exllama_kernels: The specified module could not be found.

Issue - State: closed - Opened by Mradr about 1 year ago - 3 comments
Labels: bug

#402 - Issue when loading autgptq - CUDA extension not installed and exllama_kernels not installed

Issue - State: closed - Opened by ditchtech about 1 year ago - 12 comments
Labels: bug

#401 - With transformers 4.35.0 Flash Attention 2 breaks quantization, with exception `AttributeError: 'NoneType' object has no attribute 'to'` on ` attention_masks.append(kwargs["attention_mask"].to(self.data_device))`

Issue - State: closed - Opened by TheBloke about 1 year ago - 3 comments
Labels: bug

#400 - [BUG] libcudart.so.12 issues with latest v0.5.0

Issue - State: open - Opened by winglian about 1 year ago - 3 comments
Labels: bug

#399 - Problems with cQIGen on windows

Issue - State: closed - Opened by Shroedinger about 1 year ago - 16 comments

#398 - [BUG] Importing `AutoGPTQForCausalLM` on Colab causes `ImportError`

Issue - State: closed - Opened by yumemio about 1 year ago - 5 comments
Labels: bug

#397 - Update README and version following 0.5.0 release

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#385 - Add fix for CPU Inference

Pull Request - State: open - Opened by vivekkhandelwal1 about 1 year ago - 1 comment

#384 - Pin to accelerate>=0.22

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#383 - Allow using a model with basename `model`, use_safetensors defaults to True

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#382 - Improve RoCm support

Pull Request - State: closed - Opened by fxmarty about 1 year ago - 2 comments

#381 - [BUG] setup.py fails if gekko, pandas, numpy are not installed

Issue - State: closed - Opened by fxmarty about 1 year ago - 1 comment
Labels: bug

#380 - [FEATURE] 支持 fuyu-8b 量化

Issue - State: open - Opened by xunfeng1980 about 1 year ago
Labels: enhancement, chinese

#379 - Fix QiGen kernel generation

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#379 - Fix QiGen kernel generation

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#378 - 关于量化使用的数据最后没有eos的问题

Issue - State: open - Opened by pipixia244 about 1 year ago
Labels: chinese

#377 - [BUG] 使用 autogptq int4 量化 Qwen-Chat-14B 后，发现温度低于 <= 0.5 报错

Issue - State: closed - Opened by xunfeng1980 about 1 year ago - 3 comments
Labels: bug, chinese

#376 - [`core`/ `QLinear`] Support CPU inference

Pull Request - State: open - Opened by younesbelkada about 1 year ago - 9 comments

#375 - auto_gptq.nn_modules.qlinear.qlinear_cuda:CUDA extension not installed.

Issue - State: closed - Opened by ParisNeo about 1 year ago - 17 comments
Labels: bug

#374 - Unrecognized tensor type ID: Autocast CUDA [BUG]

Issue - State: closed - Opened by Andrew011002 about 1 year ago - 14 comments
Labels: bug

#373 - [BUG] Missing source distribution in pypi for version 0.4.2

Issue - State: closed - Opened by levkk about 1 year ago - 4 comments
Labels: bug

#372 - quantize baichuan2-13b error

Issue - State: open - Opened by yijinsheng about 1 year ago - 1 comment

#371 - How to generate pytorch_model.bin.index.json or model.safetensors.index.json

Issue - State: open - Opened by Fraudsterrrr about 1 year ago
Labels: chinese

#370 - [BUG]RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Issue - State: closed - Opened by dlutsniper about 1 year ago - 15 comments
Labels: bug

#369 - BUILDING FOR ROCM 5.7.1

Issue - State: closed - Opened by letsdothis-roguethink about 1 year ago - 4 comments

#368 - The Qlinear output has slightly difference for each inference

Issue - State: closed - Opened by LightDXY about 1 year ago - 1 comment

#367 - Modify qlinear_cuda for tracing the GPTQ model

Pull Request - State: closed - Opened by vivekkhandelwal1 about 1 year ago - 8 comments

#366 - [BUG] Mac support

Issue - State: closed - Opened by Candouber about 1 year ago - 2 comments
Labels: bug

#365 - [BUG]

Issue - State: closed - Opened by MotoyaTakashi about 1 year ago - 2 comments
Labels: bug

#364 - Save and Load sharded gptq checkpoint

Pull Request - State: open - Opened by PanQiWei about 1 year ago - 5 comments

#363 - Why inference gets slower by going down to lower bits?(in comparison with ggml)

Issue - State: closed - Opened by Darshvino about 1 year ago - 1 comment
Labels: bug

#362 - Add support for Mistral models.

Pull Request - State: open - Opened by LaaZa about 1 year ago

#361 - PEFT initialization fix

Pull Request - State: closed - Opened by alex4321 about 1 year ago

#360 - [FEATURE] GPTQ VectorQuantMatmul Kernel Documentation

Issue - State: open - Opened by jeromeku about 1 year ago - 1 comment
Labels: enhancement

#359 - [FEATURE] Mistral Support

Issue - State: open - Opened by GTimothee about 1 year ago - 3 comments
Labels: enhancement

#358 - ImportError: cannot import name 'PEFT_TYPE_TO_MODEL_MAPPING' from 'peft.peft_model'

Issue - State: open - Opened by texasdave2 about 1 year ago - 1 comment

#357 - [BUG] Type mismatch in exllamav2 QLinear activation

Issue - State: closed - Opened by cyang49 about 1 year ago - 12 comments
Labels: bug

#355 - import exllama QuantLinear instead of exllamav2's in `pack_model`

Pull Request - State: closed - Opened by PanQiWei about 1 year ago

#354 - Revert "fix bug(breaking change) remove (zeors -= 1)"

Pull Request - State: closed - Opened by PanQiWei about 1 year ago - 3 comments

#353 - pack method missing in QuantLinear exllamav2

Issue - State: closed - Opened by adiprasad about 1 year ago - 2 comments
Labels: bug

#352 - Quant with larger context length

Issue - State: open - Opened by adiprasad about 1 year ago - 3 comments
Labels: enhancement, question

#351 - Question about MPT support

Issue - State: closed - Opened by bonoshunki about 1 year ago - 1 comment

#350 - Benchmark each GEMM/GEMV kernels independently

Issue - State: open - Opened by stephen-youn about 1 year ago - 2 comments
Labels: enhancement

#349 - exllamav2 integration

Pull Request - State: closed - Opened by SunMarc about 1 year ago

#348 - The Path to v1.0.0

Issue - State: open - Opened by PanQiWei about 1 year ago
Labels: enhancement

#347 - Use `adapter_name` for `get_gptq_peft_model` with `train_mode=True`

Pull Request - State: closed - Opened by alex4321 about 1 year ago

#346 - [BUG] Issues with tensor types while finetuning the quantized model through LoRA

Issue - State: closed - Opened by alex4321 about 1 year ago - 11 comments
Labels: bug

#345 - How to quantize from local checkpoint

Issue - State: open - Opened by dionman about 1 year ago - 6 comments

#344 - [BUG]

Issue - State: closed - Opened by linqingfan about 1 year ago - 2 comments
Labels: bug

#343 - [BUG]RuntimeError: FWD: Unsupported hidden_size or types: 4096BFloat16FloatFloatFloatFloat

Issue - State: open - Opened by edisonwd about 1 year ago - 1 comment
Labels: bug

#342 - fix max_input_len = max_input_len

Pull Request - State: closed - Opened by IliaZenkov about 1 year ago

#341 - [BUG] 关于使用auto-gptq作为teacher model 做蒸馏导致的excepted all tensor on the same devices问题

Issue - State: open - Opened by HaoWuSR about 1 year ago - 1 comment
Labels: bug, chinese

#340 - ROCM 5.6: no known conversion from 'const half *' [BUG]

Issue - State: closed - Opened by Jipok about 1 year ago - 7 comments
Labels: bug

#339 - Building cuda extension requires PyTorch(>=1.13.0) been installed, please install PyTorch first!

Issue - State: open - Opened by msh01 about 1 year ago - 11 comments
Labels: bug

#338 - 使用auto-gptq量化过的模型，推理速度变慢，在设置use_triton=True的情况下，推理速度更慢了

Issue - State: open - Opened by yzw-yzw about 1 year ago - 1 comment
Labels: chinese

#337 - What do you consider a good dataset size/rows for quantization ?

Issue - State: open - Opened by nadimintikrish about 1 year ago - 1 comment
Labels: question

#336 - dataset='c4' , how do i quantize model for custom dataset

Issue - State: closed - Opened by ChethanN01 about 1 year ago - 4 comments

#335 - Ignore unknown parameters in quantize_config.json

Pull Request - State: closed - Opened by z80maniac about 1 year ago - 1 comment

#334 - Support for mosaic MPT models

Issue - State: open - Opened by imthebilliejoe about 1 year ago - 2 comments
Labels: enhancement

#333 - [BUG] Error when building from source on Linux

Issue - State: closed - Opened by RBNXI about 1 year ago - 6 comments
Labels: bug

#332 - raise FileNotFoundError(f"Could not find model in {model_name_or_path}") FileNotFoundError: Could not find model in TheBloke/Llama-2-7b-Chat-GPTQ

Issue - State: closed - Opened by Rahmat711 about 1 year ago - 2 comments
Labels: bug

#331 - [BUG] Question about cuda kernels, a potential bug

Issue - State: open - Opened by ChenMnZ about 1 year ago - 1 comment
Labels: bug, help wanted

#330 - [BUG] The error of kernels cannot be omitted.

Issue - State: closed - Opened by ChenMnZ about 1 year ago
Labels: bug

#328 - 4-bit quantization find GeneralQuantLinear element is torch.int32, why

Issue - State: closed - Opened by jimmyforrest about 1 year ago - 5 comments

#327 - 用quant_with_alpaca.py量化微调过的llama2-13b报错AttributeError: 'QuantLinear' object has no attribute 'q4'

Issue - State: open - Opened by yzw-yzw about 1 year ago - 3 comments
Labels: chinese

#326 - Add support for Falcon as part of Transformers 4.33.0, including new Falcon 180B

Pull Request - State: closed - Opened by TheBloke about 1 year ago

#325 - fix bug(breaking change) remove (zeors -= 1)

Pull Request - State: closed - Opened by qwopqwop200 about 1 year ago - 4 comments

#324 - [BUG] CUDA extension not installed error when running AutoGPTQ in Docker

Issue - State: closed - Opened by yachty66 about 1 year ago - 10 comments
Labels: bug

#323 - [BUG] "The temp_state buffer is too small in the exllama backend" error, even after adding "model = exllama_set_max_input_length(model, 4096) "

Issue - State: closed - Opened by Tamil-Arasan-31 about 1 year ago - 21 comments
Labels: bug

#322 - [BUG] RuntimeError: no device index

Issue - State: closed - Opened by itechbear about 1 year ago - 4 comments
Labels: bug

#321 - [BUG] output Nonsense compared to llama.cpp

Issue - State: open - Opened by YerongLi about 1 year ago - 4 comments
Labels: bug

#320 - [BUG] Code breaks when 2 models are loaded in

Issue - State: open - Opened by daniel-kukiela about 1 year ago
Labels: bug

#319 - Support sharded quantized model files in `from_quantized`

Issue - State: open - Opened by shakealeg about 1 year ago - 5 comments
Labels: enhancement, help wanted

#318 - [BUG] Another issue with the temp_state buffer, but only witch batch of 64

Issue - State: open - Opened by daniel-kukiela about 1 year ago - 2 comments
Labels: bug

#317 - [Question] AutoModelForCausalLM.from_pretrained failed for mt5 model, how to quantize mt5 by AutoGPTQ

Issue - State: closed - Opened by DuoduoLi about 1 year ago
Labels: bug

#316 - [Discussion] batch generation example

Issue - State: open - Opened by YerongLi about 1 year ago - 3 comments

#315 - How to use exllama_set_max_input_length() with the HF models

Issue - State: closed - Opened by daniel-kukiela about 1 year ago - 3 comments

#314 - Non-FP16 Support

Issue - State: closed - Opened by HanGuo97 about 1 year ago - 1 comment
Labels: enhancement

#312 - Calibration process & calibration dataset used to perform GPTQ

Issue - State: open - Opened by ht0rohit about 1 year ago - 3 comments
Labels: documentation, question

#311 - fix typo in max_input_length

Pull Request - State: closed - Opened by SunMarc about 1 year ago

#310 - fix model type changed after calling .to() method

Pull Request - State: closed - Opened by PanQiWei about 1 year ago

#309 - Install skip qigen(windows)

Pull Request - State: closed - Opened by qwopqwop200 about 1 year ago - 2 comments

#308 - [BUG] 'BaseQuantizeConfig' object has no attribute 'get' when deploying with OpenLLM

Issue - State: open - Opened by jaotheboss about 1 year ago - 2 comments
Labels: bug

#307 - Error with loading the saved quantized model

Issue - State: closed - Opened by akkasi over 1 year ago - 5 comments
Labels: bug

#306 - [BUG] nan average_loss when running quantize_with_alpaca.py

Issue - State: open - Opened by jaysonph over 1 year ago - 5 comments
Labels: bug

#305 - Fix g_idx in fused kernel

Pull Request - State: open - Opened by chu-tianxiang over 1 year ago

#304 - [FEATURE] Support of Qwen-VL

Issue - State: closed - Opened by JustinLin610 over 1 year ago
Labels: enhancement

#303 - Update qwen.py for Qwen-VL

Pull Request - State: closed - Opened by JustinLin610 over 1 year ago - 2 comments

GitHub / panqiwei/autogptq issues and pull requests