mit-han-lab/llm-awq issues and pull requests

#226 - 怎么将生成的.pt文件与模型结构合并，转换为其他结构

Issue - State: open - Opened by gdgfd22 28 days ago

#225 - AutoModelForSequenceClassification模型量化

Issue - State: open - Opened by Fenglly 30 days ago

#224 - Update news for chunk prefilling

Pull Request - State: closed - Opened by ys-2020 about 1 month ago

#223 - Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100

Issue - State: open - Opened by ShobhaRajanna about 1 month ago

#222 - AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'

Issue - State: open - Opened by lvtao65535 about 1 month ago - 1 comment

#221 - How to Split AWQ Weights?

Issue - State: open - Opened by Azure-Tang about 1 month ago

#218 - Flashattn and multiround chat

Pull Request - State: closed - Opened by Louym 2 months ago - 5 comments

#208 - Update Helpful links

Pull Request - State: closed - Opened by ys-2020 4 months ago

#172 - can awq support 3-bit,2-bit, 8-bit quantization?

Issue - State: open - Opened by ArlanCooper 7 months ago - 2 comments

#153 - awq_inference_engine has no attribute 'gemm_forward_cuda_new'

Issue - State: open - Opened by pribadihcr 8 months ago - 5 comments

#130 - AWQ and SmoothQuant

Issue - State: open - Opened by DavidePaglieri 11 months ago - 3 comments

#119 - Update TinyChat to support coding models

Pull Request - State: closed - Opened by kentang-mit 12 months ago

#100 - Do we have supports for Falcon-180B?

Issue - State: open - Opened by moonlightian about 1 year ago

#99 - Are there any prebuilt versions on pypi?

Issue - State: open - Opened by ParisNeo about 1 year ago - 1 comment

#98 - Will Open-Flamingo be added into AWQ Model Zoo?

Issue - State: closed - Opened by Starmys about 1 year ago - 1 comment

#97 - Extending AWQ to Other LLMs

Issue - State: open - Opened by Qifeng-Wu99 about 1 year ago - 2 comments

#96 - When runing AWQ for WizardLM/WizardLM-7B-V1.0 model, Line 139 of auto_scale.py: "scales = scales / (scales.max() * scales.min()).sqrt()" will lead scales to be inf.

Issue - State: open - Opened by guanchuwang about 1 year ago

#95 - Adding Mistral 7B

Issue - State: open - Opened by uprokevin about 1 year ago - 3 comments

#94 - Could you share the code of round-to-nearest?

Issue - State: open - Opened by guanchuwang about 1 year ago

#93 - Tesla T4 Feature '.m16n8k16' requires .target sm_80 or higher

Issue - State: open - Opened by piotrecode about 1 year ago - 7 comments

#92 - [Announcement] AWQ is now supported in text-generation-inference

Issue - State: open - Opened by abhinavkulkarni about 1 year ago - 5 comments

#91 - 4-bit pack order

Issue - State: closed - Opened by flytigerw about 1 year ago - 1 comment

#90 - TinyChat support for GQA and memory efficient loading

Pull Request - State: closed - Opened by kentang-mit about 1 year ago - 5 comments

#89 - Compilation issue when following "Efficient Kernels Step" in install process

Issue - State: closed - Opened by knnair about 1 year ago - 2 comments

#88 - GEMV kernel/fused modules are 10x slower at processing context

Issue - State: open - Opened by casper-hansen about 1 year ago - 5 comments

#87 - Output is abnormal when using my own data set (Japanese Instruction) for calibration data

Issue - State: closed - Opened by webbigdata-jp about 1 year ago - 2 comments

#86 - How to implement AWQ with GPTQ?

Issue - State: closed - Opened by rainyBJ about 1 year ago - 2 comments

#85 - A faster implementation for TinyChat

Pull Request - State: closed - Opened by kentang-mit about 1 year ago - 1 comment

#84 - Reference AutoAWQ in news

Pull Request - State: closed - Opened by casper-hansen about 1 year ago - 1 comment

#83 - installation failed on windows

Issue - State: closed - Opened by oneengineer about 1 year ago - 4 comments

#82 - [Minor] Update README.md

Pull Request - State: closed - Opened by eltociear about 1 year ago

#81 - NVCC Compilation Issue with PyTorch Extension on Torch 2.0.1 + cu117 and CUDA 11.8.0

Issue - State: open - Opened by MariOvO-casual about 1 year ago - 4 comments

#80 - Helping Speed up Inference

Issue - State: open - Opened by ri938 about 1 year ago - 2 comments

#79 - INT4 quantization only delievers 20%~35% faster inference performance than FP16 for the LLaMA-13b on A100

Issue - State: open - Opened by wanzhenchn about 1 year ago - 3 comments

#78 - Request to recommend a related project LMDeploy

Issue - State: open - Opened by lvhan028 about 1 year ago - 7 comments

#77 - [Minor] Temporarily change calibration dataset URL

Pull Request - State: closed - Opened by Sakits about 1 year ago

#76 - Ability to not quantize all weights: is this feature available?

Issue - State: closed - Opened by ri938 about 1 year ago - 2 comments

#75 - Questions about MMLU Results?

Issue - State: open - Opened by rainyBJ about 1 year ago

#74 - Calibration dataset sample (in `utils/calib_data.py`) is 404

Issue - State: closed - Opened by michael4tasman about 1 year ago - 3 comments

#73 - Vicuna-1.5 Quantized Weights

Issue - State: open - Opened by mmaaz60 about 1 year ago - 17 comments

#72 - [DRAFT] Refactor models, create extensible AWQ framework

Pull Request - State: closed - Opened by casper-hansen about 1 year ago - 11 comments

#71 - Create models directory in AWQ with model classes

Issue - State: open - Opened by casper-hansen about 1 year ago - 3 comments

#70 - How to make it run on multi-gpus?

Issue - State: open - Opened by moonlightian about 1 year ago - 2 comments

#69 - Nan or Infs when using llama-13B-chat

Issue - State: open - Opened by jamesdborin about 1 year ago - 6 comments

#68 - Any plan to support BigCode models?

Issue - State: open - Opened by curname about 1 year ago - 1 comment

#67 - Add compatibility with GQA & optimize multi-GPU memory allocation

Pull Request - State: closed - Opened by Sakits about 1 year ago - 4 comments

#66 - where can i find? vicuna-7b-awq-w4g128.pt?

Issue - State: closed - Opened by andysingal about 1 year ago - 1 comment

#65 - torch.nn.functional' has no attribute 'scaled_dot_product_attention

Issue - State: open - Opened by beyondli about 1 year ago - 1 comment

#64 - setup.py not found

Issue - State: closed - Opened by beyondli about 1 year ago - 1 comment

#63 - Simplify AWQ installation with single setup.py file

Pull Request - State: closed - Opened by casper-hansen over 1 year ago - 2 comments

#62 - the question about the speed of AWQ && GPTQ

Issue - State: open - Opened by lyg95 over 1 year ago - 8 comments

#61 - Question about the speed of tiny-chat

Issue - State: open - Opened by benyang0506 over 1 year ago - 1 comment

#60 - Integration with TensorRT?

Issue - State: open - Opened by bryanhpchiang over 1 year ago - 1 comment

#59 - Does AWQ support any PyTorch model?

Issue - State: open - Opened by bryanhpchiang over 1 year ago

#58 - Why scales need to be transformed by sqrt(scales.max() * scales.min())?

Issue - State: closed - Opened by rainyBJ over 1 year ago - 4 comments

#57 - Version of Nvidia Jetson Orin used for TinyChat benchmarks

Issue - State: open - Opened by retunelars over 1 year ago - 1 comment

#56 - SmoothQuant vs AWQ which one is faster?

Issue - State: closed - Opened by codertimo over 1 year ago - 2 comments

#55 - [Tinychat] Update README.

Pull Request - State: closed - Opened by kentang-mit over 1 year ago - 1 comment

#54 - awq use more GPU memory than gptq

Issue - State: open - Opened by lyg95 over 1 year ago - 2 comments

#53 - Error Occurs When Quantizing LLaMA2-70B

Issue - State: closed - Opened by Qifeng-Wu99 over 1 year ago - 5 comments

#52 - Question about inference speed

Issue - State: open - Opened by jianyuheng over 1 year ago - 3 comments

#51 - TypeError: expected string or bytes-like object

Issue - State: open - Opened by bhanuprakashd over 1 year ago

#50 - TypeError: _request() got an unexpected keyword argument 'https'

Issue - State: closed - Opened by Hukongtao over 1 year ago - 2 comments

#49 - ModuleNotFoundError: No module named 'awq_inference_engine'

Issue - State: open - Opened by Hukongtao over 1 year ago - 8 comments

#48 - [Question/Feature] Fused attention/mlp/norm for MPT

Issue - State: open - Opened by casper-hansen over 1 year ago - 3 comments

#47 - Would AWQ be able to support LLaMa2 quantization?

Issue - State: closed - Opened by moonlightian over 1 year ago

#46 - Hi, Could you also support xgen-7b-8k-inst ?

Issue - State: open - Opened by cmxiong over 1 year ago - 4 comments

#45 - TinyChat: Fix logic for selecting MPT prompt templates (support for 8k variant).

Pull Request - State: closed - Opened by casper-hansen over 1 year ago - 2 comments

#44 - [Question/Feature] Skip initialization after quantization

Issue - State: closed - Opened by casper-hansen over 1 year ago - 5 comments

#43 - Looking at the code, I see that there is an dequantisation process when actually doing the inference, i.e. the actual matrix multiplication is done with floating point arithmetic right?

Issue - State: open - Opened by dingjingzhen over 1 year ago - 1 comment

#42 - Add TinyChat

Pull Request - State: closed - Opened by kentang-mit over 1 year ago

#41 - Merge dev/more_models

Pull Request - State: closed - Opened by Sakits over 1 year ago

#40 - error on setup.py in kernels folder

Issue - State: closed - Opened by calebmor460 over 1 year ago - 6 comments

#39 - [Feature Request] Support grouped-query attention

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 2 comments

#38 - Open-Flamingo reference

Issue - State: open - Opened by YerongLi over 1 year ago

#37 - Can not install with 2080ti

Issue - State: open - Opened by wanghongtai92 over 1 year ago - 4 comments

#36 - [dev/more_models] Memory optimizations

Pull Request - State: closed - Opened by abhinavkulkarni over 1 year ago - 5 comments

#35 - Merge the more models branch with main branch

Issue - State: closed - Opened by casper-hansen over 1 year ago - 4 comments

#34 - [Bug] Memory leak in real_quantize_model_weight

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 1 comment

#33 - Added torch.cuda.empty_cache()

Pull Request - State: closed - Opened by abhinavkulkarni over 1 year ago - 1 comment

#32 - Has XGen models by Salesforce been tested?

Issue - State: closed - Opened by casper-hansen over 1 year ago - 3 comments

#31 - http.client.RemoteDisconnected: Remote end closed connection without response

Issue - State: open - Opened by 77h2l over 1 year ago

#30 - [Feature Request] Add support for Instructor models

Issue - State: open - Opened by abhinavkulkarni over 1 year ago

#29 - [Bug] ValueError: OC is not multiple of cta_N = 128

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 1 comment

#28 - Bad result when running AWQ without GPU

Issue - State: open - Opened by xin3he over 1 year ago - 4 comments

#27 - need help! about auto_scale.scale_fc_fc function

Issue - State: open - Opened by stary-d over 1 year ago - 1 comment

#26 - Can AWQ be run on TPUs?

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 1 comment

#25 - Guidance on CUDA driver and runtime versions

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 1 comment

#24 - How to use with custom fine-tuned LLM ?

Issue - State: open - Opened by mahimairaja over 1 year ago

#23 - How to measure the speedup of W4A16 kernel like Figure 6？

Issue - State: open - Opened by ChenMnZ over 1 year ago - 5 comments

#22 - Add support for CPU offloading for quantizing bigger models on smaller GPUs

Pull Request - State: closed - Opened by abhinavkulkarni over 1 year ago - 3 comments

#21 - W4A16 kernel error when group_size is not 128

Issue - State: open - Opened by ChenMnZ over 1 year ago - 1 comment

#20 - can we replace https://the-eye.eu/public/AI/pile/val.jsonl.zst

Issue - State: open - Opened by luchangli03 over 1 year ago - 3 comments

#19 - Bug of Load and evaluate the real quantized model

Issue - State: closed - Opened by BobxmuMa over 1 year ago - 1 comment

#18 - awqlora

Issue - State: open - Opened by jianyuheng over 1 year ago - 1 comment

#17 - bloom-176b CUDA out of memory on 8* A100 80g

Issue - State: open - Opened by Niko-zyf over 1 year ago - 3 comments

#16 - Can not load pre-computed AWQ results for Bloom7b

Issue - State: closed - Opened by moonlightian over 1 year ago - 1 comment

#15 - Quantization of larger models on smaller GPUs using CPU offloading

Issue - State: closed - Opened by abhinavkulkarni over 1 year ago - 6 comments

#14 - Question about Activation-aware Scaling and its implementation

Issue - State: open - Opened by yiliu30 over 1 year ago - 2 comments

#13 - Can this quantization model be inferenced on CPU?

Issue - State: open - Opened by JianbangZ over 1 year ago

GitHub / mit-han-lab/llm-awq issues and pull requests