vllm-project/llm-compressor issues and pull requests

#1648 - [Transform] QuIP Modifier

Pull Request - State: open - Opened by kylesayrs 15 days ago

#1646 - [Bugfix] Untie word embeddings

Pull Request - State: open - Opened by kylesayrs 15 days ago

#1645 - Alternate moe calib

Pull Request - State: closed - Opened by dsikka 15 days ago

#1637 - [Transform] Norm fusing utilities

Pull Request - State: closed - Opened by kylesayrs 21 days ago - 1 comment
Labels: ready

#1456 - [deepseek-v2-lite-int8] RuntimeError: Unsupported FusedMoe scheme: num_bits=8 type='int'

Issue - State: open - Opened by mZhenz 2 months ago - 2 comments
Labels: bug

#1455 - Fix: Improve `SmoothQuant` Support for Mixture of Experts (MoE) Models

Pull Request - State: open - Opened by rahul-tuli 2 months ago - 2 comments

#1454 - Disable kernels during calibration (and tracing)

Pull Request - State: open - Opened by kylesayrs 2 months ago - 1 comment
Labels: ready

#1453 - [GPTQ] Fix actorder resolution, add sentinel

Pull Request - State: open - Opened by kylesayrs 2 months ago - 1 comment
Labels: ready

#1452 - [Tracing] Fix Traceable Imports

Pull Request - State: closed - Opened by kylesayrs 2 months ago - 1 comment
Labels: ready

#1451 - AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length

Pull Request - State: open - Opened by brian-dellabetta 2 months ago - 1 comment
Labels: ready

#1450 - [Observer] Optimize mse observer

Pull Request - State: open - Opened by shanjiaz 2 months ago - 2 comments
Labels: ready

#1449 - [Tests] Use proper offloading utils in `test_compress_tensor_utils`

Pull Request - State: closed - Opened by kylesayrs 2 months ago - 1 comment
Labels: ready

#1448 - [Bugfix][Tracing] Fix qwen2_5_vl

Pull Request - State: closed - Opened by kylesayrs 2 months ago - 1 comment
Labels: ready

#1447 - bge-reranker-v2-m3 support

Issue - State: open - Opened by bisunny 2 months ago - 1 comment
Labels: enhancement

#1446 - Fix missing logs when calling oneshot

Pull Request - State: open - Opened by kelkelcheng 2 months ago - 3 comments
Labels: ready

#1445 - oneshot entrypoint update

Pull Request - State: open - Opened by ved1beta 2 months ago - 2 comments
Labels: ready

#1444 - AWQModifier fast resolve mappings, better logging

Pull Request - State: open - Opened by brian-dellabetta 2 months ago - 1 comment

#1443 - Update `oneshot` to use an explicit keyword argument instead of using `**kwargs`

Issue - State: open - Opened by dsikka 2 months ago
Labels: enhancement, good first issue

#1442 - Add Additional Model Mappings for `AWQ` and `SmoothQuant`

Issue - State: open - Opened by dsikka 2 months ago - 1 comment
Labels: enhancement, good first issue

#1441 - Remove `sparse_logs` folder

Issue - State: open - Opened by dsikka 2 months ago
Labels: bug, enhancement, good first issue

#1440 - AWQ Qwen and Phi mappings

Pull Request - State: open - Opened by brian-dellabetta 2 months ago - 1 comment
Labels: ready

#1439 - patch awq tests/readme after QuantizationMixin refactor

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: ready

#1438 - [Research] Llama4 AutoWrapper + Onloading

Pull Request - State: open - Opened by kylesayrs 3 months ago - 2 comments

#1437 - [NVFp4] Activation Support

Pull Request - State: open - Opened by dsikka 3 months ago

#1436 - Initial implementation for the docs site and setup for LLM Compressor

Pull Request - State: open - Opened by markurtz 3 months ago - 1 comment

#1435 - [WIP][AWQ] Support accumulation for reduced memory usage

Pull Request - State: open - Opened by kylesayrs 3 months ago

#1434 - Added more tests for Quantization24SparseW4A16

Pull Request - State: closed - Opened by shanjiaz 3 months ago - 1 comment
Labels: ready

#1433 - Add: deepseekv2 smoothquant mappings

Pull Request - State: closed - Opened by rahul-tuli 3 months ago - 1 comment
Labels: ready

#1432 - NotImplementedError: Cannot copy out of meta tensor; no data! when trying to run AWQ

Issue - State: closed - Opened by shaibal13 3 months ago - 3 comments

#1431 - [Logging] Support logging once

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1430 - How to quant a model which a layer module is only `nn.Paramter` ?

Issue - State: open - Opened by shuxiaobo 3 months ago - 1 comment

#1429 - Remove RecipeArgs class & its references

Pull Request - State: closed - Opened by shanjiaz 3 months ago - 3 comments
Labels: ready

#1428 - [gemma3] Properly specifying which targets to ignore

Issue - State: open - Opened by Foreist 3 months ago - 1 comment
Labels: bug

#1427 - NotImplementedError: No compressed-tensors compatible scheme was found

Issue - State: closed - Opened by BigFaceBoy 3 months ago - 8 comments
Labels: bug

#1426 - AWQ QuantizationMixin + SequentialPipeline

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: ready

#1425 - [GPTQ] Change actorder default to "static"

Pull Request - State: open - Opened by kylesayrs 3 months ago - 3 comments

#1424 - [GPTQ] Add `actorder` option to modifier

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1423 - [Tracing] Reinstate ignore functionality

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1422 - Integrating HQQ (Half-Quadratic Quantization)?

Issue - State: closed - Opened by learning-chip 3 months ago - 1 comment
Labels: enhancement

#1421 - Question on unstable pruning result using SparseGPT method

Issue - State: open - Opened by zjnyly 3 months ago

#1420 - [Typo] overriden

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1419 - Use model compression pathways

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1418 - Quant Qwen2.5-VL with llmcompressor=0.5.1&Transformers 4.51.3 than infer with vLLM 0.8.1 got error

Issue - State: open - Opened by YangYang-DLUT 3 months ago - 2 comments

#1417 - Add `pull_request` trigger to base tests workflow

Pull Request - State: closed - Opened by dbarbuzzi 3 months ago - 1 comment
Labels: ready

#1416 - Rename SparsityModifierMixin to SparsityModifierBase

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 3 comments
Labels: ready

#1415 - [AWQ] Gemma 3: ValueError: too many values to unpack (expected 4)

Issue - State: open - Opened by ignaceHelsen 3 months ago - 2 comments

#1414 - removing RecipeMetadata and references

Pull Request - State: closed - Opened by shanjiaz 3 months ago - 2 comments
Labels: ready

#1413 - Adding a readthedocs docs build for llm-compressor

Pull Request - State: open - Opened by aireilly 3 months ago - 6 comments
Labels: ready

#1412 - [Examples] Standardize AWQ example

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 3 comments
Labels: ready

#1411 - [WIP][Tracing] Code AutoWrapper

Pull Request - State: open - Opened by kylesayrs 3 months ago

#1410 - [Feature] Log/info/Save/Restore quantization steps

Issue - State: open - Opened by mratsim 3 months ago
Labels: enhancement

#1409 - [AWQ] Insane memory requirement: over 900GB for 32B model

Issue - State: closed - Opened by mratsim 3 months ago - 1 comment
Labels: bug

#1408 - Add new-features section

Pull Request - State: closed - Opened by rahul-tuli 3 months ago - 2 comments
Labels: ready

#1407 - validation check added

Pull Request - State: open - Opened by ved1beta 3 months ago - 5 comments

#1406 - AWQ Qwen3-235B-A22B and Qwen3-30B-A3B

Issue - State: open - Opened by ehartford 3 months ago - 12 comments
Labels: bug

#1405 - AWQ sanitize_kwargs minor cleanup

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: ready

#1404 - use "qwen_2_5_vl_example.py" quant Qwen2.5-VL-7B-Instruct, got error with "SAMPLE GENERATION"

Issue - State: closed - Opened by YangYang-DLUT 3 months ago - 1 comment

#1403 - Error when computing device_map for Mistral-small-3.1-24B-Instruct-2503

Issue - State: open - Opened by VAmblardPEReN 3 months ago
Labels: bug

#1402 - [VLM] Fix mllama targets

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1401 - Add warning for non-divisible group quantization

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1400 - [WIP][Testing] Add VL e2e tests

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment

#1399 - [VLM] Add Gemma3 Example

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 2 comments

#1398 - Consolidate build config

Pull Request - State: closed - Opened by dbarbuzzi 3 months ago - 1 comment
Labels: ready

#1397 - Exclude images from package

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1396 - Drop `flash_attn` skip for quantizing_moe example tests

Pull Request - State: closed - Opened by dbarbuzzi 3 months ago - 1 comment
Labels: ready

#1395 - awq -- hotfix to missing kwargs

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: ready

#1394 - For multimodal models, such as QwenVL2.5, is the SmoothQuantModifier necessary when performing W8A8 quantization?

Issue - State: open - Opened by weirdo2310 3 months ago - 3 comments

#1393 - For FP8 Fused MoE layers, only per-tensor scalesfor weights and activations are supporte?

Issue - State: open - Opened by shuxiaobo 3 months ago - 1 comment
Labels: bug

#1392 - [Tracing] Trace with eager attention

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment

#1391 - [Lifecycle] Initialize only once, trigger on_start for each pipeline

Pull Request - State: closed - Opened by kylesayrs 3 months ago

#1390 - [Tracing] Autowrap methods by name

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment

#1389 - [Tracing] Skip non-ancestors of sequential targets

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 2 comments
Labels: ready

#1388 - [Tracing] Raise `_is_compiling_flag` while tracing

Pull Request - State: open - Opened by kylesayrs 3 months ago - 1 comment
Labels: ready

#1387 - [WIP][Tracing] Mistral3ForConditionalGeneration

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 1 comment

#1386 - AWQ: Clean up forward passes with kwargs using inspect.bind

Pull Request - State: closed - Opened by ved1beta 3 months ago - 2 comments

#1385 - AWQ -- Clean up forward passes with kwargs using `inspect.bind`

Issue - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: enhancement, good first issue

#1384 - bugfix AWQ with Llama models and python 3.9

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 1 comment
Labels: ready

#1383 - Load the model to CPU but quantize using the GPU

Issue - State: open - Opened by sgsdxzy 3 months ago - 1 comment
Labels: enhancement

#1382 - Is there any way quant model on multi nodes?

Issue - State: open - Opened by shuxiaobo 3 months ago - 1 comment
Labels: enhancement

#1381 - Bump version; set ct version

Pull Request - State: closed - Opened by dsikka 3 months ago - 1 comment

#1380 - Update w4a16_actorder_weight.yaml lmeval config

Pull Request - State: closed - Opened by dbarbuzzi 3 months ago - 1 comment
Labels: ready

#1379 - [WIP] Recipe `model_dump` fixes

Pull Request - State: open - Opened by rahul-tuli 3 months ago

#1378 - Revert "fix: Make Recipe.model_dump() output compatible ....

Pull Request - State: closed - Opened by rahul-tuli 3 months ago - 1 comment
Labels: ready

#1377 - Add: documentation for enhanced `save_pretrained` parameters

Pull Request - State: closed - Opened by rahul-tuli 3 months ago - 1 comment

#1376 - Enhance save_pretrained

Pull Request - State: open - Opened by rahul-tuli 3 months ago - 1 comment

#1375 - [Tests] Fix test case; update structure

Pull Request - State: closed - Opened by dsikka 3 months ago - 2 comments
Labels: ready

#1374 - [WIP] Add AWQ Asym e2e test case

Pull Request - State: closed - Opened by dsikka 3 months ago - 1 comment
Labels: ready

#1373 - [Tracing] Support tracing of Gemma3 [#1248]

Pull Request - State: closed - Opened by kelkelcheng 3 months ago - 7 comments
Labels: ready

#1372 - AWQ resolved mappings -- ensure shapes align

Pull Request - State: closed - Opened by brian-dellabetta 3 months ago - 11 comments
Labels: ready

#1371 - [Tests] Disable silently failing kv cache test

Pull Request - State: closed - Opened by kylesayrs 3 months ago - 3 comments
Labels: ready

#1369 - OOM (host) when running AWQ

Issue - State: closed - Opened by zjnyly 3 months ago - 2 comments
Labels: bug

#1368 - How to run AWQ-W4Afp8 quantization?

Issue - State: open - Opened by wanzhenchn 3 months ago - 2 comments

#1363 - Update: transformers support to latest

Pull Request - State: closed - Opened by rahul-tuli 3 months ago - 2 comments

#1359 - [Experimental] Mistral-format FP8 quantization

Pull Request - State: open - Opened by mgoin 3 months ago - 1 comment

#1358 - Running vllm after `oneshot` causes rerun of `oneshot`

Issue - State: closed - Opened by brian-dellabetta 3 months ago - 3 comments
Labels: bug

#1355 - How can i quant a model using fp8 blockwise quant just like deepseekv3

Issue - State: closed - Opened by WhatGhost 4 months ago - 13 comments

#1351 - Implement `QuantizationMixin`

Pull Request - State: closed - Opened by kylesayrs 4 months ago - 4 comments
Labels: ready

#1350 - qat question?

Issue - State: closed - Opened by coolKeen 4 months ago - 2 comments

#1349 - [Gemma3] The decoded token_ids are all [0,0,...,] after GPTQ quantization

Issue - State: closed - Opened by Caleb66666 4 months ago - 10 comments

#1348 - Add torch device to list of offloadable types

Pull Request - State: closed - Opened by kylesayrs 4 months ago - 1 comment
Labels: ready

GitHub / vllm-project/llm-compressor issues and pull requests