panqiwei/autogptq issues and pull requests

#513 - Inference speed is 4x slower than full fp16 model when group size is enabled

Issue - State: open - Opened by ymurenko 11 months ago

#512 - loss is high and Inference result is incorrect

Issue - State: open - Opened by shiqingzhangCSU 11 months ago

#511 - LLaMa 2 perplexity eval error: 'Cache only has 0 layers, attempted to access layers with index 0'

Issue - State: open - Opened by DavidePaglieri 11 months ago
Labels: bug

#510 - [BUG] Rocm can not compile, error: no viable conversion from 'half' to 'fp16'

Issue - State: open - Opened by 8XXD8 11 months ago
Labels: bug

#509 - GPTQ LoRA Training is not working on me

Issue - State: open - Opened by YooSungHyun 11 months ago

#508 - Dequantize to fp16?

Issue - State: open - Opened by chromecast56 11 months ago

#507 - [BUG] Qwen-14B-Chat-Int4 GPTQ model is slower than original model Qwen-14B-Chat greatly

Issue - State: open - Opened by micronetboy 11 months ago
Labels: bug

#506 - [BUG]ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

Issue - State: open - Opened by oreojason 11 months ago
Labels: bug

#505 - [FEATURE] Quantization of the Language Model Pedestal for LLAVA Multimodal Models

Issue - State: open - Opened by a2382625920 11 months ago
Labels: enhancement

#504 - [BUG]RuntimeError: The temp_state buffer is too small in the exllama backend for GPTQ with act-order.

Issue - State: open - Opened by Essence9999 11 months ago
Labels: bug

#503 - Fail compile autogptq in ppc64le rhel8

Issue - State: open - Opened by jesulo 11 months ago - 1 comment
Labels: bug

#502 - 部署AutoGPTQ量化后的Qwen-7B-Chat-Int4，报错RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Issue - State: open - Opened by LiuGuBiGu 11 months ago
Labels: chinese

#501 - [BUG]

Issue - State: closed - Opened by 12306ylg 11 months ago
Labels: bug

#500 - install auto-gptq error

Issue - State: open - Opened by jesulo 11 months ago - 1 comment
Labels: bug

#499 - [BUG] qwen-14B int8 inference slow

Issue - State: open - Opened by Originhhh 11 months ago
Labels: bug

#496 - 卸载通义千问量化版，GPU显存不释放

Issue - State: open - Opened by running-frog 11 months ago
Labels: bug, chinese

#495 - fix the support of Qwen

Pull Request - State: open - Opened by hzhwcmhf 11 months ago

#494 - TypeError: 'NoneType' object is not subscriptable when inferencing

Issue - State: closed - Opened by Enjia 11 months ago - 1 comment

#493 - [Minor] peft bug fix: HF peft version and tokenizer path in peft scripts

Pull Request - State: open - Opened by realAsma 11 months ago - 1 comment

#492 - [BUG] TRL SFT - AutoGPTQ Quantization Issues

Issue - State: closed - Opened by ChrisCates 11 months ago - 1 comment
Labels: bug

#491 - Change deci_lm model type to deci

Pull Request - State: open - Opened by LaaZa 11 months ago

#490 - Does AutoGPTQ currently support Ascend NPUs？

Issue - State: open - Opened by Dbassqwer 11 months ago - 1 comment
Labels: enhancement

#489 - NVM

Issue - State: closed - Opened by zachNA2 11 months ago

#488 - [BUG] Qwen-4B-Chat Lora 微调 14B 模型后，转GPTQ 量化模型后，vLLM方式运行，有5%的概率，返回为空字符串

Issue - State: open - Opened by micronetboy 12 months ago
Labels: bug

#488 - [BUG] Qwen-4B-Chat Lora 微调 14B 模型后，转GPTQ 量化模型后，vLLM方式运行，有5%的概率，返回为空字符串

Issue - State: open - Opened by micronetboy 12 months ago
Labels: bug, chinese

#487 - [BUG] Qwen/Qwen-14B-Chat Lora 微调后，合并模型，保存到 ./merged_14b。在转 gptq Int4 量化时报错

Issue - State: closed - Opened by micronetboy 12 months ago
Labels: bug

#486 - AssertionError

Issue - State: closed - Opened by virentakia 12 months ago - 9 comments
Labels: bug

#485 - Update version & install instructions

Pull Request - State: closed - Opened by fxmarty 12 months ago

#484 - Support inference with AWQ models

Pull Request - State: open - Opened by fxmarty 12 months ago - 3 comments

#483 - Fix compatibility with transformers 4.36

Pull Request - State: closed - Opened by fxmarty 12 months ago - 1 comment

#482 - "Illegal instructions (core dumped)" whenever loading a model with Auto-GPTQ [BUG]

Issue - State: closed - Opened by The1Bill 12 months ago - 7 comments
Labels: bug

#481 - Add support for DeciLM models.

Pull Request - State: closed - Opened by LaaZa 12 months ago

#480 - Add support for Mixtral models.

Pull Request - State: closed - Opened by LaaZa 12 months ago - 5 comments

#479 - Only make_quant on inside_layer_modules.

Pull Request - State: closed - Opened by LaaZa 12 months ago

#478 - [BUG] RuntimeError: cusolver error: CUSOLVER_STATUS_NOT_INITIALIZED, when calling `cusolverDnCreate(handle)`

Issue - State: open - Opened by zhangzai666 12 months ago - 2 comments
Labels: bug

#477 - why zero_points need to - 1 befere pack, and +1 in cuda kernel?

Issue - State: open - Opened by yyfcc17 12 months ago - 1 comment

#476 - Support for Mixtral?

Issue - State: closed - Opened by RandomInternetPreson 12 months ago

#475 - TF32 Support

Issue - State: open - Opened by HanGuo97 12 months ago - 4 comments

#474 - Stop trying to convert a list to int in setup.py when trying to retrieve cores_info

Pull Request - State: closed - Opened by wemoveon2 12 months ago - 1 comment

#473 - Incorrect conversion, int does not support lists, there is an additional []

Issue - State: open - Opened by Trangle 12 months ago

#472 - what is the purpose of the examples in the quantize method

Issue - State: open - Opened by javierquin 12 months ago

#471 - Add option to disable qigen at build

Pull Request - State: closed - Opened by fxmarty 12 months ago

#470 - make build successful on Jetson device(L4T)

Pull Request - State: closed - Opened by mikeshi80 12 months ago - 7 comments

#469 - In jetson orin agx, there is no cores in cpuinfo

Issue - State: closed - Opened by mikeshi80 12 months ago - 2 comments

#468 - Unable to build on Threadripper Ubuntu Proxmox VM

Issue - State: closed - Opened by henriklied 12 months ago - 2 comments
Labels: bug

#467 - Quantization with lora weights

Issue - State: open - Opened by xinyual 12 months ago - 5 comments

#466 - Implemented cross-platform processor counting

Pull Request - State: open - Opened by hillct 12 months ago - 4 comments

#465 - Update _base.py - Remote (.bin) model load fix

Pull Request - State: closed - Opened by Shades-en 12 months ago

#464 - Update _base.py - Remote (.bin) model load fix

Pull Request - State: closed - Opened by Shades-en 12 months ago

#463 - [BUG] v0.5.1-release can't support aarch64 platform

Issue - State: closed - Opened by st7109 12 months ago - 3 comments
Labels: bug

#462 - Quantization config name

Issue - State: closed - Opened by upunaprosk 12 months ago - 1 comment

#461 - why "target_modules" does not recognize any parameters?

Issue - State: open - Opened by daehuikim 12 months ago - 5 comments

#460 - https://github.com/Ph0rk0z/text-generation-webui-testing/commit/367ec0aa5ed5c2bf42b782f75e3b01c4e4993d95

Issue - State: open - Opened by expapa 12 months ago - 1 comment

#459 - Quantize llama2 70b error: ZeroDivisionError: float division by zero

Issue - State: open - Opened by leizhao1234 almost 1 year ago - 1 comment

#458 - pack_model takes too long time

Issue - State: open - Opened by westboy123 about 1 year ago - 3 comments

#457 - [BUG] Model Not Supported Error

Issue - State: open - Opened by jFkd1 about 1 year ago - 1 comment
Labels: bug

#456 - What the `desc_act` actually means?

Issue - State: open - Opened by Mmmofan about 1 year ago

#455 - [BUG] Error while loading "Qwen-VL-Chat-Int4" model using AutoModelForCausalLM

Issue - State: open - Opened by lziiid about 1 year ago - 2 comments
Labels: bug

#454 - can gptq run on cuda 11.7 & torch2.0

Issue - State: closed - Opened by LCorleone about 1 year ago - 1 comment

#453 - Fix unnessary vram usage while injecting fused attn

Pull Request - State: open - Opened by lszxb about 1 year ago - 5 comments

#452 - Int8 version of Yi-34B are extremly slow on A100

Issue - State: open - Opened by lucasjinreal about 1 year ago - 1 comment

#451 - 使用int8量化qwen-14b模型后，首字响应时间相比没量化和int4量化的模型都慢了很多

Issue - State: open - Opened by LimpidEarth about 1 year ago - 1 comment
Labels: chinese

#451 - 使用int8量化qwen-14b模型后，首字响应时间相比没量化和int4量化的模型都慢了很多

Issue - State: open - Opened by LimpidEarth about 1 year ago
Labels: chinese

#450 - Trying to adapt the cogvlm model, but encountering errors.

Issue - State: open - Opened by Minami-su about 1 year ago - 6 comments

#448 - How to achieve autogptq model streaming output

Issue - State: open - Opened by wengyuan722 about 1 year ago - 7 comments

#447 - [FEATURE] CUDA11.8 prebuilt binary

Issue - State: closed - Opened by lucasjinreal about 1 year ago - 2 comments
Labels: enhancement

#446 - [BUG] Cuda 11.7 cannot start int4 model

Issue - State: open - Opened by mayu123mayu about 1 year ago - 1 comment
Labels: bug

#445 - [FEATURE] CPU only version (no cuda or rocm)

Issue - State: open - Opened by rohezal about 1 year ago - 2 comments
Labels: enhancement

#444 - Support for StableLM Epoch models.

Pull Request - State: closed - Opened by LaaZa about 1 year ago - 1 comment

#443 - question about int4

Issue - State: closed - Opened by fancyerii about 1 year ago

#442 - Support for LongLLaMA models.

Pull Request - State: open - Opened by LaaZa about 1 year ago - 3 comments

#441 - [FEATURE] Support long_llama

Issue - State: open - Opened by blap about 1 year ago
Labels: enhancement

#440 - [BUG]How to quantize in multiple GPUs？

Issue - State: open - Opened by lonngxiang about 1 year ago - 2 comments
Labels: bug

#439 - The "pack" procedure is extremely slow

Issue - State: closed - Opened by zhang-ge-hao about 1 year ago - 7 comments
Labels: bug

#438 - Fix typos in tests

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#437 - Allow fp32 input to GPTQ linear

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#436 - [BUG] ---> 54 query_states, key_states, value_states = torch.split(qkv_states, self.hidden_size, dim=2)

Issue - State: open - Opened by janelu9 about 1 year ago - 3 comments
Labels: bug, chinese

#435 - LLAMA70B模型在A100单卡上的4bit量化需要跑多久?

Issue - State: open - Opened by CSEEduanyu about 1 year ago
Labels: chinese

#434 - Update README.md

Pull Request - State: closed - Opened by brthor about 1 year ago - 3 comments

#433 - Is it possible to create a docker image with auto-gptq on mac without GPU?

Issue - State: closed - Opened by Prots about 1 year ago - 5 comments

#432 - 代码与示例一致，模型改成Qwen-7B-Chat，量化报错：ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)，请问是什么原因呢？

Issue - State: open - Opened by sunyclj about 1 year ago - 1 comment
Labels: bug, chinese

#431 - [BUG] Build fails on ARM platforms

Issue - State: closed - Opened by hillct about 1 year ago - 4 comments
Labels: bug

#430 - Cuda 12 support

Issue - State: closed - Opened by ParisNeo about 1 year ago - 3 comments
Labels: bug

#429 - [BUG] ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

Issue - State: closed - Opened by daehuikim about 1 year ago - 4 comments
Labels: bug

#428 - TypeError: init() got an unexpected keyword argument 'weight_dtype'

Issue - State: closed - Opened by Minami-su about 1 year ago - 1 comment

#427 - Can the model quantified using GPTQ run normally with cuda on version 10.2?

Issue - State: open - Opened by Oubaaa about 1 year ago - 2 comments

#426 - AttributeError: module 'triton' has no attribute 'OutOfResources'

Issue - State: closed - Opened by Minami-su about 1 year ago - 1 comment

#425 - Support loading sharded quantized checkpoints.

Pull Request - State: open - Opened by LaaZa about 1 year ago - 15 comments

#424 - FileNotFoundError: [Errno 2] No such file or directory: 'python'

Issue - State: open - Opened by amaze28 about 1 year ago - 6 comments
Labels: bug

#423 - Fix triton unexpected keyword

Pull Request - State: closed - Opened by LaaZa about 1 year ago

#422 - H_inv is not updated

Issue - State: closed - Opened by MilesQLi about 1 year ago - 2 comments
Labels: bug

#421 - Precise PyTorch version

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#420 - [BUG] Memory errors for Zephyr 7B beta on A100

Issue - State: open - Opened by p-christ about 1 year ago - 2 comments
Labels: bug

#419 - Fix workflows to use pip instead of conda

Pull Request - State: closed - Opened by fxmarty about 1 year ago

#418 - 使用多卡加载gptq int8量化的qwen 14B模型推理时报错AttributeError: can't set attribute

Issue - State: closed - Opened by LimpidEarth about 1 year ago
Labels: bug, chinese

#417 - Add support for Xverse models.

Pull Request - State: closed - Opened by LaaZa about 1 year ago

#416 - 请问使用autogptq量化14b模型大概耗时多久。需要多少数据量呢

Issue - State: open - Opened by zhangzai666 about 1 year ago
Labels: chinese

#415 - How long does it take to quantify. How many pieces of data and events

Issue - State: open - Opened by zhangzai666 about 1 year ago

#414 - [BUG] 0.5.0 CUDA wheels did not build

Issue - State: open - Opened by henk717 about 1 year ago - 8 comments
Labels: bug

#413 - Add support for Yi models.

Pull Request - State: closed - Opened by LaaZa about 1 year ago - 1 comment

GitHub / panqiwei/autogptq issues and pull requests