mit-han-lab/smoothquant issues and pull requests

#105 - SmoothQuant on t5?

Issue - State: open - Opened by Manojbhat09 10 days ago

#104 - SmoothQuant kv-cache quantization

Issue - State: open - Opened by wuguowuge 11 days ago

#103 - Sayeh/qwen support

Pull Request - State: closed - Opened by sayehs-dmatrix 29 days ago

#102 - Sensitivity analysis of model parameters

Issue - State: open - Opened by shengqihailuo1 about 1 month ago

#101 - RuntimeError: “triu_tril_cuda_template“ not implemented for ‘BFloat16‘

Issue - State: open - Opened by cuicui2023 about 2 months ago

#100 - error when getting the scales for my model

Issue - State: open - Opened by rib12316 2 months ago - 1 comment

#99 - Sylvia working PR

Pull Request - State: closed - Opened by SylviaZiyuZhang 2 months ago

#98 - How to use this for custom model?

Issue - State: open - Opened by siddagra 2 months ago

#98 - How to use this for custom model?

Issue - State: open - Opened by siddagra 2 months ago

#97 - how to visual outier of activation

Issue - State: open - Opened by harleyszhang 4 months ago - 1 comment

#96 - The upper and lower bounds seems to not 8 bits in some cases

Issue - State: open - Opened by zhangyu68 4 months ago

#95 - Why only 4 layers?

Issue - State: open - Opened by VincentXWD 5 months ago

#94 - Support for Qwen2

Issue - State: open - Opened by JiaXinLI98 7 months ago

#93 - How to quantize the out_proj and fc2 module in OPT model family

Issue - State: open - Opened by yanchenmochen 7 months ago

#92 - How to quantize llama3?

Issue - State: open - Opened by jpyo0803 7 months ago

#91 - export_int8_model.py size issue

Issue - State: open - Opened by ljhyeok123 7 months ago - 1 comment

#90 - quantify other models,

Issue - State: open - Opened by AlexMa0 7 months ago

#89 - best Alpha value for Qwen 1.5 72B

Issue - State: open - Opened by Riskin1999 8 months ago

#88 - how to draw this result directly? is there any script?

Issue - State: open - Opened by foreverpiano 9 months ago - 1 comment

#87 - Huggingface_Hub Issue

Issue - State: open - Opened by faize5 9 months ago - 2 comments

#86 - Can SmoothQuant be used on ViT models?

Issue - State: open - Opened by n9s8a 10 months ago

#85 - Whether it can be supported stable diffusion

Issue - State: open - Opened by songh11 10 months ago

#84 - Inquiry about Int8 BMM overflow

Issue - State: open - Opened by luzai 10 months ago

#83 - Error when running smoothquant_opt_real_int8_demo.ipynb

Issue - State: open - Opened by kaijun924 11 months ago

#82 - how to use model.generate with smoothquant models

Issue - State: open - Opened by Hao-YunDeng 11 months ago

#81 - which version of transformer and datasets package do we need for this repo?

Issue - State: open - Opened by ghost 11 months ago - 2 comments

#80 - adjust activations

Issue - State: open - Opened by muzi0111 11 months ago

#79 - Question: why not need explicit scaling for activation X

Issue - State: open - Opened by ghost 11 months ago - 2 comments

#78 - RuntimeError: "clamp_min_cpu" not implemented for 'Half'

Issue - State: closed - Opened by ghost 11 months ago - 1 comment

#77 - Weight migration for Llama?

Issue - State: open - Opened by atyshka 11 months ago

#76 - Question about code

Issue - State: open - Opened by Lucky-Lance 12 months ago

#75 - How Can I Peft the Smoothquanted LLM?

Issue - State: closed - Opened by WoosungMyung 12 months ago - 1 comment

#74 - bmm_s8t_s8n_s8t cannot run with this shape

Issue - State: closed - Opened by xiachong94 12 months ago

#73 - Can I reproduce SmoothQuant on CPU only since I see that torch-int8 requires a GPU, and I am only interested in inference on the CPU?

Issue - State: open - Opened by WCSY-YG about 1 year ago

#72 - set quantize_output True the acc drop to 0

Issue - State: open - Opened by lonleyodd about 1 year ago

#71 - ask for a function in linear.py for smoothquant in llama @Anizpz

Issue - State: open - Opened by msz12345 about 1 year ago

#70 - w8a8 Does it require dequantization during forward inference?

Issue - State: open - Opened by shatealaboxiaowang about 1 year ago - 1 comment

#69 - general question about SmoothQuant kv-cache quantization

Issue - State: open - Opened by brisker about 1 year ago - 1 comment

#68 - Got accuray=0 when trying _real_int8_demo.ipynb

Issue - State: open - Opened by leocnj about 1 year ago

#67 - how to reproduce ppl of wikitext2?

Issue - State: open - Opened by Arthur-Ling about 1 year ago - 1 comment

#66 - Activation scales for bloomz 7.1b

Issue - State: open - Opened by bil-ash about 1 year ago - 1 comment

#65 - support auto search for per-layer smoothing alphas, and auto clip for weights, both bits-aware, can do W4A8 with minor loss

Pull Request - State: closed - Opened by yyfcc17 about 1 year ago - 2 comments

#64 - What does the accuracy in Figure 7 of the paper mean?

Issue - State: open - Opened by YundongGai about 1 year ago

#63 - Demo code for Bloom model?

Issue - State: open - Opened by llCurious about 1 year ago

#62 - Inference time decreases only by 7.5% on opt-6.7B

Issue - State: open - Opened by FurryMushroom over 1 year ago - 1 comment

#61 - llama-2-chat demo

Pull Request - State: closed - Opened by liquanfeng over 1 year ago

#60 - pickle.UnpicklingError: invalid load key, 'v'.

Issue - State: open - Opened by baiSongL over 1 year ago - 2 comments

#59 - failed to run int8 opt

Issue - State: closed - Opened by jackzhou121 over 1 year ago - 2 comments

#58 - UnpicklingError: invalid load key, 'v'.

Issue - State: closed - Opened by FurryMushroom over 1 year ago - 7 comments

#57 - add llama model support

Pull Request - State: open - Opened by AniZpZ over 1 year ago - 1 comment

#56 - which is faster between smoothquant and autogptq?

Issue - State: open - Opened by InkdyeHuang over 1 year ago

#55 - [BUG] Int8 inference with torch-int encounter errors

Issue - State: open - Opened by WelY1 over 1 year ago

#54 - How to calculate Alpha?

Issue - State: open - Opened by Triple-L over 1 year ago

#53 - Why do different models have the same size？

Issue - State: open - Opened by WelY1 over 1 year ago

#52 - Activation Channel Scales and Calibration

Issue - State: open - Opened by 520zw over 1 year ago - 1 comment

#51 - The ppl value of the opt-6.7b-smoothquant model shows abnormal performance

Issue - State: open - Opened by sitabulaixizawaluduo over 1 year ago - 1 comment

#50 - circular import

Issue - State: open - Opened by breaddance over 1 year ago

#49 - Can you explain in a step by step manner how we can implement this on our own model and dataset?

Issue - State: open - Opened by shahaamirbader over 1 year ago

#48 - How to reproduce the performance described in the paper

Issue - State: open - Opened by rolex-cjj over 1 year ago - 2 comments

#47 - How to conduct zero-shot experiments?

Issue - State: open - Opened by moodom over 1 year ago

#46 - Error loading `AutoModelForCausalLM` in `examples/generate_act_scales.py`

Issue - State: closed - Opened by julian-q over 1 year ago - 1 comment

#45 - Could not open smoothquant_opt_demo.ipynb

Issue - State: open - Opened by foreverpiano over 1 year ago - 1 comment

#44 - How can I make it support Bloom-7b?

Issue - State: open - Opened by moonlightian over 1 year ago

#43 - batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Issue - State: closed - Opened by GuoYi0 over 1 year ago - 1 comment

#42 - Accuracy drop for Llama

Issue - State: open - Opened by fmo-mt over 1 year ago - 10 comments

#41 - No module named 'torch_int'

Issue - State: open - Opened by kaust2018 over 1 year ago - 7 comments

#40 - support GPTNEOX model

Issue - State: open - Opened by amazingkmy over 1 year ago

#39 - How to reproduce the result with lm-evaluation-harness

Issue - State: open - Opened by Ther-nullptr almost 2 years ago

#38 - How can smoothquant be used in ConvNets

Issue - State: open - Opened by littletomatodonkey almost 2 years ago - 1 comment

#37 - SmoothQuant for llama

Issue - State: open - Opened by shhn1 almost 2 years ago - 3 comments

#36 - How to use SmoothQuant in FasterTransformer?

Issue - State: open - Opened by jiangsongHW almost 2 years ago - 1 comment

#35 - can provide the relize of GLM model or ohter model which is in you paper?

Issue - State: open - Opened by o-github-o almost 2 years ago

#34 - Doesn't work on gpt models.

Issue - State: closed - Opened by YaphetS-X almost 2 years ago - 2 comments

#33 - git lfs pull ERROR

Issue - State: closed - Opened by lingffff almost 2 years ago - 2 comments

#32 - How does it compares to Deepspeed?

Issue - State: open - Opened by LifeIsStrange almost 2 years ago

#31 - git lfs is currently down,could you solve this problem?

Issue - State: closed - Opened by Anychnn almost 2 years ago - 1 comment

#30 - No module named 'torch_int'

Issue - State: closed - Opened by liangxiaoyun almost 2 years ago - 1 comment

#29 - 4bit weight quantization? 4bit activation quantization?

Issue - State: open - Opened by Thomas-MMJ almost 2 years ago - 1 comment

#28 - what is the transformers' version

Issue - State: closed - Opened by lippman1125 almost 2 years ago

#27 - How to implement this method combinded with decoder

Issue - State: open - Opened by lileilai almost 2 years ago - 2 comments

#26 - Support for LLAMA

Issue - State: closed - Opened by fmac2000 almost 2 years ago - 2 comments

#25 - Out of memory

Issue - State: open - Opened by lileilai almost 2 years ago - 1 comment

#24 - Is O1 and O2 version for smoothquant available?

Issue - State: open - Opened by Ther-nullptr about 2 years ago

#23 - Missing the activation scales of opt-125m

Issue - State: closed - Opened by Ther-nullptr about 2 years ago - 1 comment

#22 - What is the difference between `get_act_scales` and `get_static_decoder_layer_scales`

Issue - State: open - Opened by CaffreyR about 2 years ago - 1 comment

#21 - Post-LayerNorm support

Issue - State: open - Opened by minghaoBD about 2 years ago - 1 comment

#20 - mseznec/export weights for ft fixes

Pull Request - State: closed - Opened by mickaelseznec about 2 years ago

#19 - add option to export scaling factors for FT

Pull Request - State: closed - Opened by mickaelseznec about 2 years ago - 1 comment

#18 - Visualization tool

Issue - State: open - Opened by ArulselvanMadhavan about 2 years ago - 2 comments

#17 - Size mismatch

Issue - State: open - Opened by anujnayyar1 about 2 years ago - 1 comment

#16 - Bloom code

Issue - State: open - Opened by Toan-Do about 2 years ago - 2 comments

#15 - paper says smoothing all linear layers, but code seems to smooth only the qkv projection in attention and the first fc in ffn?

Issue - State: closed - Opened by chenho74 about 2 years ago - 5 comments

#14 - Test smoothquant accuracy for just fc2 layer

Issue - State: closed - Opened by erichan1 about 2 years ago - 7 comments

#13 - error encounctered when loading act_scales

Issue - State: closed - Opened by chenho74 about 2 years ago - 2 comments

#12 - different smoothquant levels

Issue - State: closed - Opened by erichan1 about 2 years ago - 3 comments

#11 - Input to ReLU is quantized to int8? An error in quantization_flow.png?

Issue - State: closed - Opened by chenho74 about 2 years ago - 2 comments

#10 - The Naive W8A8 Quantized model accuracy of medium size model (e.g opt-2.7b)

Issue - State: closed - Opened by LiuShixing about 2 years ago - 2 comments

#9 - Merge branch 'pr/mickaelseznec/3' into mickaelseznec-mseznec/fastertransformer-compat

Pull Request - State: closed - Opened by Guangxuan-Xiao about 2 years ago

#8 - Does it support freeze the quantized pretrained model then use prefix tuning ?

Issue - State: closed - Opened by LiuShixing about 2 years ago - 1 comment

#7 - Latency calculation for OPT 175B

Issue - State: closed - Opened by tangbinh about 2 years ago - 2 comments

GitHub / mit-han-lab/smoothquant issues and pull requests