Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / mit-han-lab/llm-awq issues and pull requests
#226 - 怎么将生成的.pt文件与模型结构合并,转换为其他结构
Issue -
State: open - Opened by gdgfd22 28 days ago
#225 - AutoModelForSequenceClassification模型量化
Issue -
State: open - Opened by Fenglly 30 days ago
#224 - Update news for chunk prefilling
Pull Request -
State: closed - Opened by ys-2020 about 1 month ago
#223 - Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100
Issue -
State: open - Opened by ShobhaRajanna about 1 month ago
#222 - AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'
Issue -
State: open - Opened by lvtao65535 about 1 month ago
- 1 comment
#221 - How to Split AWQ Weights?
Issue -
State: open - Opened by Azure-Tang about 1 month ago
#218 - Flashattn and multiround chat
Pull Request -
State: closed - Opened by Louym 2 months ago
- 5 comments
#208 - Update Helpful links
Pull Request -
State: closed - Opened by ys-2020 4 months ago
#172 - can awq support 3-bit,2-bit, 8-bit quantization?
Issue -
State: open - Opened by ArlanCooper 7 months ago
- 2 comments
#153 - awq_inference_engine has no attribute 'gemm_forward_cuda_new'
Issue -
State: open - Opened by pribadihcr 8 months ago
- 5 comments
#130 - AWQ and SmoothQuant
Issue -
State: open - Opened by DavidePaglieri 11 months ago
- 3 comments
#119 - Update TinyChat to support coding models
Pull Request -
State: closed - Opened by kentang-mit 12 months ago
#100 - Do we have supports for Falcon-180B?
Issue -
State: open - Opened by moonlightian about 1 year ago
#99 - Are there any prebuilt versions on pypi?
Issue -
State: open - Opened by ParisNeo about 1 year ago
- 1 comment
#98 - Will Open-Flamingo be added into AWQ Model Zoo?
Issue -
State: closed - Opened by Starmys about 1 year ago
- 1 comment
#97 - Extending AWQ to Other LLMs
Issue -
State: open - Opened by Qifeng-Wu99 about 1 year ago
- 2 comments
#96 - When runing AWQ for WizardLM/WizardLM-7B-V1.0 model, Line 139 of auto_scale.py: "scales = scales / (scales.max() * scales.min()).sqrt()" will lead scales to be inf.
Issue -
State: open - Opened by guanchuwang about 1 year ago
#95 - Adding Mistral 7B
Issue -
State: open - Opened by uprokevin about 1 year ago
- 3 comments
#94 - Could you share the code of round-to-nearest?
Issue -
State: open - Opened by guanchuwang about 1 year ago
#93 - Tesla T4 Feature '.m16n8k16' requires .target sm_80 or higher
Issue -
State: open - Opened by piotrecode about 1 year ago
- 7 comments
#92 - [Announcement] AWQ is now supported in text-generation-inference
Issue -
State: open - Opened by abhinavkulkarni about 1 year ago
- 5 comments
#91 - 4-bit pack order
Issue -
State: closed - Opened by flytigerw about 1 year ago
- 1 comment
#90 - TinyChat support for GQA and memory efficient loading
Pull Request -
State: closed - Opened by kentang-mit about 1 year ago
- 5 comments
#89 - Compilation issue when following "Efficient Kernels Step" in install process
Issue -
State: closed - Opened by knnair about 1 year ago
- 2 comments
#88 - GEMV kernel/fused modules are 10x slower at processing context
Issue -
State: open - Opened by casper-hansen about 1 year ago
- 5 comments
#87 - Output is abnormal when using my own data set (Japanese Instruction) for calibration data
Issue -
State: closed - Opened by webbigdata-jp about 1 year ago
- 2 comments
#86 - How to implement AWQ with GPTQ?
Issue -
State: closed - Opened by rainyBJ about 1 year ago
- 2 comments
#85 - A faster implementation for TinyChat
Pull Request -
State: closed - Opened by kentang-mit about 1 year ago
- 1 comment
#84 - Reference AutoAWQ in news
Pull Request -
State: closed - Opened by casper-hansen about 1 year ago
- 1 comment
#83 - installation failed on windows
Issue -
State: closed - Opened by oneengineer about 1 year ago
- 4 comments
#82 - [Minor] Update README.md
Pull Request -
State: closed - Opened by eltociear about 1 year ago
#81 - NVCC Compilation Issue with PyTorch Extension on Torch 2.0.1 + cu117 and CUDA 11.8.0
Issue -
State: open - Opened by MariOvO-casual about 1 year ago
- 4 comments
#80 - Helping Speed up Inference
Issue -
State: open - Opened by ri938 about 1 year ago
- 2 comments
#79 - INT4 quantization only delievers 20%~35% faster inference performance than FP16 for the LLaMA-13b on A100
Issue -
State: open - Opened by wanzhenchn about 1 year ago
- 3 comments
#78 - Request to recommend a related project LMDeploy
Issue -
State: open - Opened by lvhan028 about 1 year ago
- 7 comments
#77 - [Minor] Temporarily change calibration dataset URL
Pull Request -
State: closed - Opened by Sakits about 1 year ago
#76 - Ability to not quantize all weights: is this feature available?
Issue -
State: closed - Opened by ri938 about 1 year ago
- 2 comments
#75 - Questions about MMLU Results?
Issue -
State: open - Opened by rainyBJ about 1 year ago
#74 - Calibration dataset sample (in `utils/calib_data.py`) is 404
Issue -
State: closed - Opened by michael4tasman about 1 year ago
- 3 comments
#73 - Vicuna-1.5 Quantized Weights
Issue -
State: open - Opened by mmaaz60 about 1 year ago
- 17 comments
#72 - [DRAFT] Refactor models, create extensible AWQ framework
Pull Request -
State: closed - Opened by casper-hansen about 1 year ago
- 11 comments
#71 - Create models directory in AWQ with model classes
Issue -
State: open - Opened by casper-hansen about 1 year ago
- 3 comments
#70 - How to make it run on multi-gpus?
Issue -
State: open - Opened by moonlightian about 1 year ago
- 2 comments
#69 - Nan or Infs when using llama-13B-chat
Issue -
State: open - Opened by jamesdborin about 1 year ago
- 6 comments
#68 - Any plan to support BigCode models?
Issue -
State: open - Opened by curname about 1 year ago
- 1 comment
#67 - Add compatibility with GQA & optimize multi-GPU memory allocation
Pull Request -
State: closed - Opened by Sakits about 1 year ago
- 4 comments
#66 - where can i find? vicuna-7b-awq-w4g128.pt?
Issue -
State: closed - Opened by andysingal about 1 year ago
- 1 comment
#65 - torch.nn.functional' has no attribute 'scaled_dot_product_attention
Issue -
State: open - Opened by beyondli about 1 year ago
- 1 comment
#64 - setup.py not found
Issue -
State: closed - Opened by beyondli about 1 year ago
- 1 comment
#63 - Simplify AWQ installation with single setup.py file
Pull Request -
State: closed - Opened by casper-hansen over 1 year ago
- 2 comments
#62 - the question about the speed of AWQ && GPTQ
Issue -
State: open - Opened by lyg95 over 1 year ago
- 8 comments
#61 - Question about the speed of tiny-chat
Issue -
State: open - Opened by benyang0506 over 1 year ago
- 1 comment
#60 - Integration with TensorRT?
Issue -
State: open - Opened by bryanhpchiang over 1 year ago
- 1 comment
#59 - Does AWQ support any PyTorch model?
Issue -
State: open - Opened by bryanhpchiang over 1 year ago
#58 - Why scales need to be transformed by sqrt(scales.max() * scales.min())?
Issue -
State: closed - Opened by rainyBJ over 1 year ago
- 4 comments
#57 - Version of Nvidia Jetson Orin used for TinyChat benchmarks
Issue -
State: open - Opened by retunelars over 1 year ago
- 1 comment
#56 - SmoothQuant vs AWQ which one is faster?
Issue -
State: closed - Opened by codertimo over 1 year ago
- 2 comments
#55 - [Tinychat] Update README.
Pull Request -
State: closed - Opened by kentang-mit over 1 year ago
- 1 comment
#54 - awq use more GPU memory than gptq
Issue -
State: open - Opened by lyg95 over 1 year ago
- 2 comments
#53 - Error Occurs When Quantizing LLaMA2-70B
Issue -
State: closed - Opened by Qifeng-Wu99 over 1 year ago
- 5 comments
#52 - Question about inference speed
Issue -
State: open - Opened by jianyuheng over 1 year ago
- 3 comments
#51 - TypeError: expected string or bytes-like object
Issue -
State: open - Opened by bhanuprakashd over 1 year ago
#50 - TypeError: _request() got an unexpected keyword argument 'https'
Issue -
State: closed - Opened by Hukongtao over 1 year ago
- 2 comments
#49 - ModuleNotFoundError: No module named 'awq_inference_engine'
Issue -
State: open - Opened by Hukongtao over 1 year ago
- 8 comments
#48 - [Question/Feature] Fused attention/mlp/norm for MPT
Issue -
State: open - Opened by casper-hansen over 1 year ago
- 3 comments
#47 - Would AWQ be able to support LLaMa2 quantization?
Issue -
State: closed - Opened by moonlightian over 1 year ago
#46 - Hi, Could you also support xgen-7b-8k-inst ?
Issue -
State: open - Opened by cmxiong over 1 year ago
- 4 comments
#45 - TinyChat: Fix logic for selecting MPT prompt templates (support for 8k variant).
Pull Request -
State: closed - Opened by casper-hansen over 1 year ago
- 2 comments
#44 - [Question/Feature] Skip initialization after quantization
Issue -
State: closed - Opened by casper-hansen over 1 year ago
- 5 comments
#43 - Looking at the code, I see that there is an dequantisation process when actually doing the inference, i.e. the actual matrix multiplication is done with floating point arithmetic right?
Issue -
State: open - Opened by dingjingzhen over 1 year ago
- 1 comment
#42 - Add TinyChat
Pull Request -
State: closed - Opened by kentang-mit over 1 year ago
#41 - Merge dev/more_models
Pull Request -
State: closed - Opened by Sakits over 1 year ago
#40 - error on setup.py in kernels folder
Issue -
State: closed - Opened by calebmor460 over 1 year ago
- 6 comments
#39 - [Feature Request] Support grouped-query attention
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 2 comments
#38 - Open-Flamingo reference
Issue -
State: open - Opened by YerongLi over 1 year ago
#37 - Can not install with 2080ti
Issue -
State: open - Opened by wanghongtai92 over 1 year ago
- 4 comments
#36 - [dev/more_models] Memory optimizations
Pull Request -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 5 comments
#35 - Merge the more models branch with main branch
Issue -
State: closed - Opened by casper-hansen over 1 year ago
- 4 comments
#34 - [Bug] Memory leak in real_quantize_model_weight
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 1 comment
#33 - Added torch.cuda.empty_cache()
Pull Request -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 1 comment
#32 - Has XGen models by Salesforce been tested?
Issue -
State: closed - Opened by casper-hansen over 1 year ago
- 3 comments
#31 - http.client.RemoteDisconnected: Remote end closed connection without response
Issue -
State: open - Opened by 77h2l over 1 year ago
#30 - [Feature Request] Add support for Instructor models
Issue -
State: open - Opened by abhinavkulkarni over 1 year ago
#29 - [Bug] ValueError: OC is not multiple of cta_N = 128
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 1 comment
#28 - Bad result when running AWQ without GPU
Issue -
State: open - Opened by xin3he over 1 year ago
- 4 comments
#27 - need help! about auto_scale.scale_fc_fc function
Issue -
State: open - Opened by stary-d over 1 year ago
- 1 comment
#26 - Can AWQ be run on TPUs?
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 1 comment
#25 - Guidance on CUDA driver and runtime versions
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 1 comment
#24 - How to use with custom fine-tuned LLM ?
Issue -
State: open - Opened by mahimairaja over 1 year ago
#23 - How to measure the speedup of W4A16 kernel like Figure 6?
Issue -
State: open - Opened by ChenMnZ over 1 year ago
- 5 comments
#22 - Add support for CPU offloading for quantizing bigger models on smaller GPUs
Pull Request -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 3 comments
#21 - W4A16 kernel error when group_size is not 128
Issue -
State: open - Opened by ChenMnZ over 1 year ago
- 1 comment
#20 - can we replace https://the-eye.eu/public/AI/pile/val.jsonl.zst
Issue -
State: open - Opened by luchangli03 over 1 year ago
- 3 comments
#19 - Bug of Load and evaluate the real quantized model
Issue -
State: closed - Opened by BobxmuMa over 1 year ago
- 1 comment
#18 - awqlora
Issue -
State: open - Opened by jianyuheng over 1 year ago
- 1 comment
#17 - bloom-176b CUDA out of memory on 8* A100 80g
Issue -
State: open - Opened by Niko-zyf over 1 year ago
- 3 comments
#16 - Can not load pre-computed AWQ results for Bloom7b
Issue -
State: closed - Opened by moonlightian over 1 year ago
- 1 comment
#15 - Quantization of larger models on smaller GPUs using CPU offloading
Issue -
State: closed - Opened by abhinavkulkarni over 1 year ago
- 6 comments
#14 - Question about Activation-aware Scaling and its implementation
Issue -
State: open - Opened by yiliu30 over 1 year ago
- 2 comments
#13 - Can this quantization model be inferenced on CPU?
Issue -
State: open - Opened by JianbangZ over 1 year ago