Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / panqiwei/autogptq issues and pull requests
#513 - Inference speed is 4x slower than full fp16 model when group size is enabled
Issue -
State: open - Opened by ymurenko 11 months ago
#512 - loss is high and Inference result is incorrect
Issue -
State: open - Opened by shiqingzhangCSU 11 months ago
#511 - LLaMa 2 perplexity eval error: 'Cache only has 0 layers, attempted to access layers with index 0'
Issue -
State: open - Opened by DavidePaglieri 11 months ago
Labels: bug
#510 - [BUG] Rocm can not compile, error: no viable conversion from '__half' to '__fp16'
Issue -
State: open - Opened by 8XXD8 11 months ago
Labels: bug
#509 - GPTQ LoRA Training is not working on me
Issue -
State: open - Opened by YooSungHyun 11 months ago
#508 - Dequantize to fp16?
Issue -
State: open - Opened by chromecast56 11 months ago
#507 - [BUG] Qwen-14B-Chat-Int4 GPTQ model is slower than original model Qwen-14B-Chat greatly
Issue -
State: open - Opened by micronetboy 11 months ago
Labels: bug
#506 - [BUG]ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.
Issue -
State: open - Opened by oreojason 11 months ago
Labels: bug
#505 - [FEATURE] Quantization of the Language Model Pedestal for LLAVA Multimodal Models
Issue -
State: open - Opened by a2382625920 11 months ago
Labels: enhancement
#504 - [BUG]RuntimeError: The temp_state buffer is too small in the exllama backend for GPTQ with act-order.
Issue -
State: open - Opened by Essence9999 11 months ago
Labels: bug
#503 - Fail compile autogptq in ppc64le rhel8
Issue -
State: open - Opened by jesulo 11 months ago
- 1 comment
Labels: bug
#502 - 部署AutoGPTQ量化后的Qwen-7B-Chat-Int4,报错RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Issue -
State: open - Opened by LiuGuBiGu 11 months ago
Labels: chinese
#501 - [BUG]
Issue -
State: closed - Opened by 12306ylg 11 months ago
Labels: bug
#500 - install auto-gptq error
Issue -
State: open - Opened by jesulo 11 months ago
- 1 comment
Labels: bug
#499 - [BUG] qwen-14B int8 inference slow
Issue -
State: open - Opened by Originhhh 11 months ago
Labels: bug
#496 - 卸载通义千问量化版,GPU显存不释放
Issue -
State: open - Opened by running-frog 11 months ago
Labels: bug, chinese
#495 - fix the support of Qwen
Pull Request -
State: open - Opened by hzhwcmhf 11 months ago
#494 - TypeError: 'NoneType' object is not subscriptable when inferencing
Issue -
State: closed - Opened by Enjia 11 months ago
- 1 comment
#493 - [Minor] peft bug fix: HF peft version and tokenizer path in peft scripts
Pull Request -
State: open - Opened by realAsma 11 months ago
- 1 comment
#492 - [BUG] TRL SFT - AutoGPTQ Quantization Issues
Issue -
State: closed - Opened by ChrisCates 11 months ago
- 1 comment
Labels: bug
#491 - Change deci_lm model type to deci
Pull Request -
State: open - Opened by LaaZa 11 months ago
#490 - Does AutoGPTQ currently support Ascend NPUs?
Issue -
State: open - Opened by Dbassqwer 11 months ago
- 1 comment
Labels: enhancement
#489 - NVM
Issue -
State: closed - Opened by zachNA2 11 months ago
#488 - [BUG] Qwen-4B-Chat Lora 微调 14B 模型后,转GPTQ 量化模型后,vLLM方式运行,有5%的概率,返回为 空字符串
Issue -
State: open - Opened by micronetboy 12 months ago
Labels: bug
#488 - [BUG] Qwen-4B-Chat Lora 微调 14B 模型后,转GPTQ 量化模型后,vLLM方式运行,有5%的概率,返回为 空字符串
Issue -
State: open - Opened by micronetboy 12 months ago
Labels: bug, chinese
#487 - [BUG] Qwen/Qwen-14B-Chat Lora 微调后,合并模型,保存到 ./merged_14b。在转 gptq Int4 量化时报错
Issue -
State: closed - Opened by micronetboy 12 months ago
Labels: bug
#486 - AssertionError
Issue -
State: closed - Opened by virentakia 12 months ago
- 9 comments
Labels: bug
#485 - Update version & install instructions
Pull Request -
State: closed - Opened by fxmarty 12 months ago
#484 - Support inference with AWQ models
Pull Request -
State: open - Opened by fxmarty 12 months ago
- 3 comments
#483 - Fix compatibility with transformers 4.36
Pull Request -
State: closed - Opened by fxmarty 12 months ago
- 1 comment
#482 - "Illegal instructions (core dumped)" whenever loading a model with Auto-GPTQ [BUG]
Issue -
State: closed - Opened by The1Bill 12 months ago
- 7 comments
Labels: bug
#481 - Add support for DeciLM models.
Pull Request -
State: closed - Opened by LaaZa 12 months ago
#480 - Add support for Mixtral models.
Pull Request -
State: closed - Opened by LaaZa 12 months ago
- 5 comments
#479 - Only make_quant on inside_layer_modules.
Pull Request -
State: closed - Opened by LaaZa 12 months ago
#478 - [BUG] RuntimeError: cusolver error: CUSOLVER_STATUS_NOT_INITIALIZED, when calling `cusolverDnCreate(handle)`
Issue -
State: open - Opened by zhangzai666 12 months ago
- 2 comments
Labels: bug
#477 - why zero_points need to - 1 befere pack, and +1 in cuda kernel?
Issue -
State: open - Opened by yyfcc17 12 months ago
- 1 comment
#476 - Support for Mixtral?
Issue -
State: closed - Opened by RandomInternetPreson 12 months ago
#475 - TF32 Support
Issue -
State: open - Opened by HanGuo97 12 months ago
- 4 comments
#474 - Stop trying to convert a list to int in setup.py when trying to retrieve cores_info
Pull Request -
State: closed - Opened by wemoveon2 12 months ago
- 1 comment
#473 - Incorrect conversion, int does not support lists, there is an additional []
Issue -
State: open - Opened by Trangle 12 months ago
#472 - what is the purpose of the examples in the quantize method
Issue -
State: open - Opened by javierquin 12 months ago
#471 - Add option to disable qigen at build
Pull Request -
State: closed - Opened by fxmarty 12 months ago
#470 - make build successful on Jetson device(L4T)
Pull Request -
State: closed - Opened by mikeshi80 12 months ago
- 7 comments
#469 - In jetson orin agx, there is no cores in cpuinfo
Issue -
State: closed - Opened by mikeshi80 12 months ago
- 2 comments
#468 - Unable to build on Threadripper Ubuntu Proxmox VM
Issue -
State: closed - Opened by henriklied 12 months ago
- 2 comments
Labels: bug
#467 - Quantization with lora weights
Issue -
State: open - Opened by xinyual 12 months ago
- 5 comments
#466 - Implemented cross-platform processor counting
Pull Request -
State: open - Opened by hillct 12 months ago
- 4 comments
#465 - Update _base.py - Remote (.bin) model load fix
Pull Request -
State: closed - Opened by Shades-en 12 months ago
#464 - Update _base.py - Remote (.bin) model load fix
Pull Request -
State: closed - Opened by Shades-en 12 months ago
#463 - [BUG] v0.5.1-release can't support aarch64 platform
Issue -
State: closed - Opened by st7109 12 months ago
- 3 comments
Labels: bug
#462 - Quantization config name
Issue -
State: closed - Opened by upunaprosk 12 months ago
- 1 comment
#461 - why "target_modules" does not recognize any parameters?
Issue -
State: open - Opened by daehuikim 12 months ago
- 5 comments
#460 - https://github.com/Ph0rk0z/text-generation-webui-testing/commit/367ec0aa5ed5c2bf42b782f75e3b01c4e4993d95
Issue -
State: open - Opened by expapa 12 months ago
- 1 comment
#459 - Quantize llama2 70b error: ZeroDivisionError: float division by zero
Issue -
State: open - Opened by leizhao1234 almost 1 year ago
- 1 comment
#458 - pack_model takes too long time
Issue -
State: open - Opened by westboy123 about 1 year ago
- 3 comments
#457 - [BUG] Model Not Supported Error
Issue -
State: open - Opened by jFkd1 about 1 year ago
- 1 comment
Labels: bug
#456 - What the `desc_act` actually means?
Issue -
State: open - Opened by Mmmofan about 1 year ago
#455 - [BUG] Error while loading "Qwen-VL-Chat-Int4" model using AutoModelForCausalLM
Issue -
State: open - Opened by lziiid about 1 year ago
- 2 comments
Labels: bug
#454 - can gptq run on cuda 11.7 & torch2.0
Issue -
State: closed - Opened by LCorleone about 1 year ago
- 1 comment
#453 - Fix unnessary vram usage while injecting fused attn
Pull Request -
State: open - Opened by lszxb about 1 year ago
- 5 comments
#452 - Int8 version of Yi-34B are extremly slow on A100
Issue -
State: open - Opened by lucasjinreal about 1 year ago
- 1 comment
#451 - 使用int8量化qwen-14b模型后,首字响应时间相比没量化和int4量化的模型都慢了很多
Issue -
State: open - Opened by LimpidEarth about 1 year ago
- 1 comment
Labels: chinese
#451 - 使用int8量化qwen-14b模型后,首字响应时间相比没量化和int4量化的模型都慢了很多
Issue -
State: open - Opened by LimpidEarth about 1 year ago
Labels: chinese
#450 - Trying to adapt the cogvlm model, but encountering errors.
Issue -
State: open - Opened by Minami-su about 1 year ago
- 6 comments
#448 - How to achieve autogptq model streaming output
Issue -
State: open - Opened by wengyuan722 about 1 year ago
- 7 comments
#447 - [FEATURE] CUDA11.8 prebuilt binary
Issue -
State: closed - Opened by lucasjinreal about 1 year ago
- 2 comments
Labels: enhancement
#446 - [BUG] Cuda 11.7 cannot start int4 model
Issue -
State: open - Opened by mayu123mayu about 1 year ago
- 1 comment
Labels: bug
#445 - [FEATURE] CPU only version (no cuda or rocm)
Issue -
State: open - Opened by rohezal about 1 year ago
- 2 comments
Labels: enhancement
#444 - Support for StableLM Epoch models.
Pull Request -
State: closed - Opened by LaaZa about 1 year ago
- 1 comment
#443 - question about int4
Issue -
State: closed - Opened by fancyerii about 1 year ago
#442 - Support for LongLLaMA models.
Pull Request -
State: open - Opened by LaaZa about 1 year ago
- 3 comments
#441 - [FEATURE] Support long_llama
Issue -
State: open - Opened by blap about 1 year ago
Labels: enhancement
#440 - [BUG]How to quantize in multiple GPUs?
Issue -
State: open - Opened by lonngxiang about 1 year ago
- 2 comments
Labels: bug
#439 - The "pack" procedure is extremely slow
Issue -
State: closed - Opened by zhang-ge-hao about 1 year ago
- 7 comments
Labels: bug
#438 - Fix typos in tests
Pull Request -
State: closed - Opened by fxmarty about 1 year ago
#437 - Allow fp32 input to GPTQ linear
Pull Request -
State: closed - Opened by fxmarty about 1 year ago
#436 - [BUG] ---> 54 query_states, key_states, value_states = torch.split(qkv_states, self.hidden_size, dim=2)
Issue -
State: open - Opened by janelu9 about 1 year ago
- 3 comments
Labels: bug, chinese
#435 - LLAMA70B模型在A100单卡上的4bit量化需要跑多久?
Issue -
State: open - Opened by CSEEduanyu about 1 year ago
Labels: chinese
#434 - Update README.md
Pull Request -
State: closed - Opened by brthor about 1 year ago
- 3 comments
#433 - Is it possible to create a docker image with auto-gptq on mac without GPU?
Issue -
State: closed - Opened by Prots about 1 year ago
- 5 comments
#432 - 代码与示例一致,模型改成Qwen-7B-Chat,量化报错:ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?),请问是什么原因呢?
Issue -
State: open - Opened by sunyclj about 1 year ago
- 1 comment
Labels: bug, chinese
#431 - [BUG] Build fails on ARM platforms
Issue -
State: closed - Opened by hillct about 1 year ago
- 4 comments
Labels: bug
#430 - Cuda 12 support
Issue -
State: closed - Opened by ParisNeo about 1 year ago
- 3 comments
Labels: bug
#429 - [BUG] ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
Issue -
State: closed - Opened by daehuikim about 1 year ago
- 4 comments
Labels: bug
#428 - TypeError: __init__() got an unexpected keyword argument 'weight_dtype'
Issue -
State: closed - Opened by Minami-su about 1 year ago
- 1 comment
#427 - Can the model quantified using GPTQ run normally with cuda on version 10.2?
Issue -
State: open - Opened by Oubaaa about 1 year ago
- 2 comments
#426 - AttributeError: module 'triton' has no attribute 'OutOfResources'
Issue -
State: closed - Opened by Minami-su about 1 year ago
- 1 comment
#425 - Support loading sharded quantized checkpoints.
Pull Request -
State: open - Opened by LaaZa about 1 year ago
- 15 comments
#424 - FileNotFoundError: [Errno 2] No such file or directory: 'python'
Issue -
State: open - Opened by amaze28 about 1 year ago
- 6 comments
Labels: bug
#423 - Fix triton unexpected keyword
Pull Request -
State: closed - Opened by LaaZa about 1 year ago
#422 - H_inv is not updated
Issue -
State: closed - Opened by MilesQLi about 1 year ago
- 2 comments
Labels: bug
#421 - Precise PyTorch version
Pull Request -
State: closed - Opened by fxmarty about 1 year ago
#420 - [BUG] Memory errors for Zephyr 7B beta on A100
Issue -
State: open - Opened by p-christ about 1 year ago
- 2 comments
Labels: bug
#419 - Fix workflows to use pip instead of conda
Pull Request -
State: closed - Opened by fxmarty about 1 year ago
#418 - 使用多卡加载gptq int8量化的qwen 14B模型推理时报错AttributeError: can't set attribute
Issue -
State: closed - Opened by LimpidEarth about 1 year ago
Labels: bug, chinese
#417 - Add support for Xverse models.
Pull Request -
State: closed - Opened by LaaZa about 1 year ago
#416 - 请问使用autogptq量化14b模型大概耗时多久。需要多少数据量呢
Issue -
State: open - Opened by zhangzai666 about 1 year ago
Labels: chinese
#415 - How long does it take to quantify. How many pieces of data and events
Issue -
State: open - Opened by zhangzai666 about 1 year ago
#414 - [BUG] 0.5.0 CUDA wheels did not build
Issue -
State: open - Opened by henk717 about 1 year ago
- 8 comments
Labels: bug
#413 - Add support for Yi models.
Pull Request -
State: closed - Opened by LaaZa about 1 year ago
- 1 comment