NVIDIA/TensorRT-LLM issues and pull requests

#5522 - [DRAFT] feat: transfer multimodal_data and refactor HyperCLOVAX & Qwen2/2.5-VL

Pull Request - State: open - Opened by yechank-nvidia 30 days ago

#5518 - [don't review] Fp8 blockwise gemm autotune

Pull Request - State: open - Opened by limin2021 about 1 month ago

#5513 - feature: unify new_tokens format sample state to trtllm samper tokens format

Pull Request - State: closed - Opened by netanel-haber about 1 month ago - 9 comments

#5507 - chore: Mass integration of release/0.21

Pull Request - State: open - Opened by dc3671 about 1 month ago - 21 comments

#5505 - Cache transceiver support VSWA

Pull Request - State: open - Opened by chuangz0 about 1 month ago - 6 comments

#5503 - [fix]: Fix main test skip issue

Pull Request - State: closed - Opened by yizhang-nv about 1 month ago - 23 comments

#5500 - Qwen2.5-VL-3B pytorch backend with cuda graph error result

Issue - State: open - Opened by specter2018 about 1 month ago - 4 comments
Labels: bug, triaged

#5489 - [TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest

Pull Request - State: open - Opened by Funatiq about 1 month ago - 34 comments

#5480 - Enable cuda graph as default testing.

Pull Request - State: open - Opened by dominicshanshan about 1 month ago - 17 comments

#5476 - feat: reduce unnecessary kernel generation

Pull Request - State: open - Opened by tongyuantongyu about 1 month ago - 32 comments

#5465 - [nvbug 5300551] test: increase block count in eviction test

Pull Request - State: closed - Opened by zhengd-nv about 1 month ago - 15 comments

#5439 - draft: migration from pybind11 to nanobind

Pull Request - State: open - Opened by Linda-Stadter about 1 month ago - 29 comments

#5437 - test: Reduce number of C++ test cases

Pull Request - State: open - Opened by Funatiq about 1 month ago - 40 comments

#5432 - perf: better heuristic for allreduce

Pull Request - State: open - Opened by yilin-void about 1 month ago - 28 comments

#5431 - [TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1)

Pull Request - State: open - Opened by Superjomn about 1 month ago - 61 comments

#5419 - Feat/apply autotuner to cute dsl fp8 gemm

Pull Request - State: closed - Opened by limin2021 about 1 month ago

#5418 - Chore: split _build_model method for TorchLlm and TrtLlm

Pull Request - State: open - Opened by QiJune about 1 month ago - 2 comments

#5417 - [nvbug 5273941] fix: broken cyclic reference detect

Pull Request - State: open - Opened by Superjomn about 1 month ago

#5416 - Fix test Pytorch model engine

Pull Request - State: open - Opened by Tabrizian about 1 month ago - 2 comments

#5415 - [DRAFT] fix: Enable num_return_sequences (`n`) support in PyTorch backend

Pull Request - State: open - Opened by jaedeok-nvidia about 1 month ago

#5414 - add fmha test

Pull Request - State: open - Opened by qsang-nv about 1 month ago

#5413 - Fix load balancing router bug

Pull Request - State: open - Opened by Shunkangz about 1 month ago

#5412 - Make moe permute and final as custom op

Pull Request - State: open - Opened by limin2021 about 1 month ago - 1 comment

#5411 - fix: Fix static EPLB

Pull Request - State: open - Opened by syuoni about 1 month ago - 5 comments

#5410 - feat: Expose bias and FP8_MXFP4 MOE CUTLASS backend features to pytorch

Pull Request - State: open - Opened by djns99 about 1 month ago

#5409 - chore: delete mamba hybrid, since it is now called NemotronH

Pull Request - State: open - Opened by vegaluisjose about 1 month ago - 2 comments

#5408 - Gibberish from Llama-3.3-70B-Instruct-FP8

Issue - State: open - Opened by sarmiena about 1 month ago - 5 comments
Labels: bug, triaged

#5407 - Draft: test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv about 1 month ago

#5406 - [DRAFT] refactor: PyExecutor uses a list-type for response handling

Pull Request - State: open - Opened by jaedeok-nvidia about 1 month ago - 4 comments

#5405 - Add unit test for routing kernels

Pull Request - State: open - Opened by ChristinaZ about 1 month ago - 3 comments

#5404 - [#5403][perf] Conditionally enable SWAP AB for speculative decoding

Pull Request - State: open - Opened by zoheth about 1 month ago - 10 comments
Labels: Community want to contribute

#5403 - [Perf] Conditionally enable SWAP AB for speculative decoding

Issue - State: open - Opened by zoheth about 1 month ago

#5402 - when enable_kv_cache_reuse enabled，but only need to cache Prompt ,not cache ecoder, how to config?

Issue - State: open - Opened by w066650 about 1 month ago

#5401 - [DON'T MERGE] NGram V2 test

Pull Request - State: open - Opened by wili-65535 about 1 month ago - 4 comments

#5400 - test: [CI] remove closed bugs

Pull Request - State: open - Opened by xinhe-nv about 1 month ago - 2 comments

#5399 - remove libnuma conan dependency

Pull Request - State: open - Opened by dongxuy04 about 1 month ago - 3 comments

#5398 - Add sleep function for disagg gen-only benchmarking

Pull Request - State: open - Opened by qiaoxj07 about 1 month ago - 3 comments

#5397 - test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests

Pull Request - State: closed - Opened by yizhang-nv about 1 month ago - 66 comments

#5396 - [1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes

Pull Request - State: open - Opened by chang-l about 1 month ago - 6 comments

#5395 - ValueError: Inferred model format _ModelFormatKind.HF, but failed to load config.json: The given huggingface model architecture Qwen2_5_VLForConditionalGeneration is not supported in TRT-LLM yet

Issue - State: open - Opened by moguizhizi about 1 month ago

#5394 - [TRTLLM-6019] feat: Remove cutlass min latency code from AutoTuner.

Pull Request - State: open - Opened by hyukn about 1 month ago - 6 comments

#5393 - Broken openai api

Issue - State: open - Opened by Alcanderian about 1 month ago
Labels: bug

#5392 - Adapt a new model with a structure similar to LLaMA3.

Issue - State: open - Opened by BaiStone2017 about 1 month ago

#5391 - Server stuck with high load

Issue - State: open - Opened by k-l-lambda about 1 month ago - 1 comment
Labels: bug

#5390 - RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

Issue - State: open - Opened by moguizhizi about 1 month ago
Labels: bug

#5389 - [TensorRT-LLM][ERROR] Assertion failed: Error occurred when running GEMM!

Issue - State: open - Opened by momaek about 1 month ago
Labels: bug

#5388 - [test] speedup TRT accuracy tests

Pull Request - State: closed - Opened by omera-nv about 1 month ago - 3 comments

#5387 - [doc] update mtp documents

Pull Request - State: closed - Opened by lfr-0531 about 1 month ago - 4 comments

#5386 - invalid request_id in `MTPSpecMetadata`

Issue - State: open - Opened by k-l-lambda about 1 month ago
Labels: bug

#5385 - feat: Remove not used padding_idx in models

Pull Request - State: open - Opened by HuiGao-NV about 1 month ago - 6 comments

#5384 - refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead

Pull Request - State: open - Opened by Funatiq about 1 month ago - 15 comments

#5383 - Qwen3-0.6B-FP8模型在使用quickstart_advanced.py时报错于 rmsnorm 数据类型 Float 无法调度

Issue - State: open - Opened by Sevix7766 about 1 month ago

#5382 - Detokenize option in /v1/completions request

Pull Request - State: open - Opened by Wokzy about 1 month ago
Labels: Community want to contribute, Community Engagement

#5381 - How to Quantify Qwen2.5-VL-Instruct with Tensorrt-LLM?

Issue - State: open - Opened by buptmengjj about 1 month ago

#5380 - Is the following result normal? Why is prompt_token_ids so long?

Issue - State: open - Opened by buptmengjj about 1 month ago

#5379 - Mixtral-8x7B-Instruct awq-w4a8 output shows duplicated Chinese text

Issue - State: open - Opened by wanzhenchn about 1 month ago
Labels: bug

#5378 - Fix: missing clientId when serialize and deserialize response (cherry-pick #5231)

Pull Request - State: closed - Opened by kaiyux about 1 month ago - 3 comments

#5377 - add to trtllm-bench user-controllable backend configuration

Issue - State: open - Opened by nzmora-nvidia about 1 month ago

#5376 - [TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve

Pull Request - State: open - Opened by talorabr about 1 month ago

#5374 - feat: Misc Opt for large scale EP

Pull Request - State: closed - Opened by dongxuy04 about 1 month ago - 6 comments

#5371 - [TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H

Pull Request - State: open - Opened by tomeras91 about 1 month ago - 24 comments

#5369 - fix: fix bug of qwen3 + eagle3 + finalize_moe_fusion

Pull Request - State: open - Opened by byshiue about 1 month ago - 5 comments

#5367 - Mxfp4 moe

Pull Request - State: open - Opened by Tracin about 1 month ago

#5364 - feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18

Pull Request - State: open - Opened by Wanli-Jiang about 1 month ago - 8 comments

#5358 - Make moe permute and final as custom op

Pull Request - State: closed - Opened by limin2021 about 1 month ago - 1 comment

#5355 - fix: Fix skip by mpi size fixture

Pull Request - State: closed - Opened by yizhang-nv about 1 month ago - 12 comments

#5348 - test: Add LLGuidance test and refine guided decoding

Pull Request - State: open - Opened by syuoni about 1 month ago - 26 comments

#5346 - feat: add LLmArgs option to force using dynamic quantization

Pull Request - State: open - Opened by achartier about 1 month ago - 9 comments

#5343 - [fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation

Pull Request - State: closed - Opened by HuiGao-NV about 1 month ago - 13 comments

#5341 - [fix][test] parametrize deepseek eval

Pull Request - State: open - Opened by omera-nv about 1 month ago - 33 comments

#5340 - fix: pass the correct KV retention config for block eviction

Pull Request - State: open - Opened by achartier about 1 month ago - 6 comments

#5336 - [nvbugs/5323043] test: Fix triton_extensive test

Pull Request - State: open - Opened by Tabrizian about 1 month ago - 9 comments

#5333 - [TRTLLM-3442] feat: added beam search support to the PyTorch Workflow

Pull Request - State: open - Opened by stnie about 1 month ago - 27 comments

#5328 - [TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler

Pull Request - State: open - Opened by dcampora about 1 month ago - 18 comments

#5318 - perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf

Pull Request - State: open - Opened by bobboli about 1 month ago - 6 comments

#5316 - refactor: Speculative decoding buffers part 2

Pull Request - State: closed - Opened by Funatiq about 1 month ago - 33 comments

#5315 - refactor: manage cache indirection in decoder state

Pull Request - State: open - Opened by Funatiq about 1 month ago - 15 comments

#5312 - [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default

Pull Request - State: closed - Opened by Superjomn about 1 month ago - 16 comments

#5310 - How to run Qwen3 using triton-server + trtllm

Issue - State: open - Opened by ezioliao about 1 month ago - 4 comments
Labels: question, triaged, Triton Backend, Investigating

#5291 - [ModelLoad] Concurrent load model

Pull Request - State: open - Opened by arekay about 1 month ago - 3 comments

#5282 - Draft: add script to protect perf. DO-NOT-MERGE

Pull Request - State: open - Opened by litaotju about 1 month ago

#5281 - ci: unwaive llmapi launch test

Pull Request - State: open - Opened by Superjomn about 1 month ago - 8 comments

#5270 - feat: Dynamically remove servers in PD

Pull Request - State: open - Opened by Shunkangz about 1 month ago - 12 comments

#5265 - Fix : fix build for sm120

Pull Request - State: open - Opened by peaceh-nv about 1 month ago - 21 comments

#5262 - draft: fix cudaStreamSynchronize when using relaxed acceptance

Pull Request - State: open - Opened by yweng0828 about 1 month ago

#5261 - chore: remove torch_compile prefix for TorchCompileConfig field members

Pull Request - State: open - Opened by nv-guomingz about 1 month ago - 3 comments

#5260 - [Torchflow] Can't run torchflow on llamaforcausallm with fp32. rmsnorm(at::Tensor&, at::Tensor&, at::Tensor&, double, bool)::<lambda()> failed to dispatch data type Float

Issue - State: open - Opened by michaelfeil about 1 month ago
Labels: bug

#5259 - Fix: https://nvbugs/5345720

Pull Request - State: open - Opened by QiJune about 1 month ago - 2 comments

#5258 - [enhance] Add the ability to write a request timeline.

Pull Request - State: open - Opened by FrankD412 about 1 month ago

#5257 - [AutoDeploy] Enhance graph transformation test utils

Issue - State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy

#5256 - [AutoDeploy] Investigate Graph Visualization

Issue - State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy

#5255 - Migrant other transformations to use torch._inductor pattern matcher

Issue - State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy

#5254 - Investigate TRTLLM runtime repetitive issue

Issue - State: open - Opened by Fridah-nv about 1 month ago
Labels: bug, AutoDeploy

#5253 - fix: only set _mpi_session if world_size is > 1

Pull Request - State: open - Opened by achartier about 1 month ago - 4 comments

#5252 - chore: Waive CI failure.

Pull Request - State: closed - Opened by SimengLiu-nv about 1 month ago - 3 comments

#5251 - [chore] Remove BaseDraftTokenManager

Pull Request - State: open - Opened by mikeiovine about 1 month ago - 6 comments

#5250 - doc:update contributing md for internal developers

Pull Request - State: open - Opened by nv-guomingz about 1 month ago - 3 comments
Labels: Documentation

#5249 - Refactor the signature of AD graph transformations

Issue - State: open - Opened by nzmora-nvidia about 1 month ago
Labels: AutoDeploy

#5248 - [infra] Make test_chunked_prefill faster

Pull Request - State: closed - Opened by mikeiovine about 1 month ago - 10 comments

#5247 - quantize.py doesn't recognize cnn_dailymail dataset if the dataset path doesn't include cnn_dailymail

Issue - State: open - Opened by mwawrzos about 1 month ago
Labels: bug

GitHub / NVIDIA/TensorRT-LLM issues and pull requests