NVIDIA/TensorRT-LLM issues and pull requests

#4111 - Draft: chore: Make GEMM config enums human readable for better logging

Pull Request - State: open - Opened by djns99 3 months ago

#4110 - fix: instruct torch to use nvtx3

Pull Request - State: open - Opened by tongyuantongyu 3 months ago - 6 comments

#4109 - [Infra] - Update code ownership rules

Pull Request - State: closed - Opened by chzblych 3 months ago - 3 comments

#4108 - doc: update release notes

Pull Request - State: closed - Opened by kaiyux 3 months ago - 3 comments

#4107 - [TRTLLM-5057][fix] Adding option to specify a set of token ids for multimodal tokens

Pull Request - State: closed - Opened by rakib-hasan 3 months ago - 9 comments

#4106 - Fix Pipeline Parallelism in Llama4

Pull Request - State: open - Opened by v-shobhit 3 months ago - 4 comments

#4105 - Add initial list of CODEOWNERS

Pull Request - State: closed - Opened by kevinch-nv 3 months ago - 5 comments

#4104 - [nvbug/5262268][fix] Fix trtllm-bench for llama 4

Pull Request - State: open - Opened by mikeiovine 3 months ago - 12 comments

#4102 - refactor: Copy sequence lengths once in decoder setup

Pull Request - State: open - Opened by Funatiq 3 months ago - 18 comments

#4101 - [https://nvbugspro.nvidia.com/bug/5238626] illegal memory address when running llama 4 with cuda graph enabled

Pull Request - State: open - Opened by PerkzZheng 3 months ago - 33 comments

#4100 - doc: TRTLLM-4797 Update perf-analysis.md

Pull Request - State: closed - Opened by kaiyux 3 months ago - 5 comments

#4099 - Install from docs not working

Issue - State: closed - Opened by darraghdog 3 months ago - 7 comments
Labels: bug, triaged, Installation

#4097 - enh: Update docker Makefile to use only the visible GPUs of machine

Pull Request - State: closed - Opened by venkywonka 3 months ago - 3 comments
Labels: Ease of Use

#4096 - refactor: Unify request order in TRT and PyTorch workflow

Pull Request - State: open - Opened by Funatiq 3 months ago - 18 comments

#4095 - [https://nvbugspro.nvidia.com/bug/5260676]test: skip fp8 quantization case for pre-ada

Pull Request - State: open - Opened by crazydemo 3 months ago - 3 comments

#4094 - fix: llmapi-launch add add trtllm-bench test with engine building

Pull Request - State: closed - Opened by Superjomn 3 months ago - 4 comments

#4093 - [Qwen3] chore: fix bug of fused_moe on tp > 1

Pull Request - State: closed - Opened by byshiue 3 months ago - 3 comments

#4092 - [TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow

Pull Request - State: open - Opened by Funatiq 3 months ago - 13 comments

#4091 - fix: llmapi-launch add add trtllm-bench test with engine building

Pull Request - State: open - Opened by Superjomn 3 months ago - 15 comments

#4090 - [TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior

Pull Request - State: open - Opened by syuoni 3 months ago - 12 comments

#4089 - [#4085][fix] Fix `apply_per_channel_scale` for extremely large input sequence length.

Pull Request - State: open - Opened by StudyingShao 3 months ago - 20 comments
Labels: bug

#4087 - chore: misc static analysis fixes and generating xqa source header at config time

Pull Request - State: open - Opened by hypdeb 3 months ago - 4 comments

#4086 - Cherry-pick trtllm-gen from feat/llama4 to main

Pull Request - State: open - Opened by chenfeiz0326 3 months ago - 26 comments

#4084 - [fix] Fix add_dummy_requests for spec decoding cases

Pull Request - State: closed - Opened by lfr-0531 3 months ago - 44 comments

#4083 - test: add qwen3 and disaggregated serving accuracy tests to qa test list

Pull Request - State: open - Opened by StanleySun639 3 months ago - 21 comments

#4082 - Integrate trtllm-gen kernel for the QKV gemm in llama4

Pull Request - State: open - Opened by eopXD 3 months ago - 1 comment

#4081 - chore: Clean up the legacy DeepseekAllreudceFusionOp.

Pull Request - State: open - Opened by hyukn 3 months ago - 20 comments

#4080 - feat: Fallback to NCCL for various patterns when input size is large.

Pull Request - State: open - Opened by hyukn 3 months ago - 18 comments

#4079 - fix: Enable test case disabled by nvbug 5245262

Pull Request - State: closed - Opened by HuiGao-NV 3 months ago - 12 comments

#4078 - refactor: Allow models to override apply_qk_norm.

Pull Request - State: open - Opened by yuxianq 3 months ago - 28 comments

#4077 - [https://nvbugspro.nvidia.com/bug/5244006, https://nvbugspro.nvidia.com/bug/5240350][test] Unwaive guided decoding tests

Pull Request - State: open - Opened by syuoni 3 months ago - 31 comments

#4070 - fix: Update log query regex in perf integration test to match trtllm-bench reporting

Pull Request - State: open - Opened by venkywonka 3 months ago - 9 comments

#4069 - [fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue

Pull Request - State: closed - Opened by mikeiovine 3 months ago - 23 comments

#4068 - fix: Set `trust_remote_code=True` when verifying config.json load

Pull Request - State: closed - Opened by venkywonka 3 months ago - 1 comment

#4067 - feat: Reduce branch overhead in groupRMSNorm kernels

Pull Request - State: closed - Opened by SimengLiu-nv 3 months ago - 10 comments

#4066 - feat: Support the Structural Tag in guided decoding

Pull Request - State: open - Opened by Ubospica 3 months ago - 15 comments
Labels: Community want to contribute, Community Engagement

#4065 - [feat/] enable attention DP in Llama4 maverick model - part 1

Pull Request - State: open - Opened by zihaok 3 months ago - 13 comments

#4064 - feat:[AutoDeploy] utilize torch._inductor.pattern_matcher to write pattern matcher

Pull Request - State: open - Opened by Fridah-nv 3 months ago - 3 comments
Labels: AutoDeploy

#4063 - [feat] trtllmGen MoE routing: added support for top groups and top K bounds

Pull Request - State: open - Opened by MatthiasKohl 3 months ago - 5 comments

#4061 - Refactor: Lookahead TRT workflow

Pull Request - State: open - Opened by wili-65535 3 months ago - 1 comment

#4057 - feat: adopt new logprob definition in PyTorch flow

Pull Request - State: closed - Opened by tongyuantongyu 3 months ago - 16 comments

#4053 - [TRTQA-2861][test]: add nemotron and llama4 cases into qa test

Pull Request - State: closed - Opened by crazydemo 3 months ago - 14 comments

#4047 - feat: Add heuristic for GroupRMSNorm kernel selection.

Pull Request - State: open - Opened by SimengLiu-nv 3 months ago - 4 comments

#4046 - test: [CI] remove closed bugs

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 15 comments

#4037 - Deepseek R1 and V3, FP4 quant, output quality issues at batch size > 2

Issue - State: open - Opened by pankajroark 3 months ago - 7 comments
Labels: bug, triaged

#4034 - feat: [nvbug/5261055][nvbug/5170160] non-invasive pipeline parallelism

Pull Request - State: open - Opened by yuxianq 3 months ago - 30 comments

#4030 - [DRAFT] Introducing multi-vocab token sampling for audio generation

Pull Request - State: open - Opened by vklimkov-nvidia 3 months ago - 3 comments

#4028 - feat:enable kvcache to be reused during request generation

Pull Request - State: open - Opened by narutolhy 3 months ago - 105 comments
Labels: triaged, Community want to contribute, Community Engagement

#4027 - Refactor: Restructure C++ tests for better modularisation of non-shared code

Pull Request - State: closed - Opened by DomBrown 3 months ago - 38 comments

#4020 - feat: Enable AutoDeploy to llm-eval example

Pull Request - State: open - Opened by meenchen 3 months ago - 4 comments
Labels: AutoDeploy

#4019 - feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI

Pull Request - State: closed - Opened by yuanjingx87 3 months ago - 61 comments

#4016 - [Deepseek] Refactor Deepseek Decoder layer

Pull Request - State: closed - Opened by hlu1 3 months ago - 23 comments

#4011 - bench: TRTLLM-4936 Port benchmark_serving.py

Pull Request - State: closed - Opened by kaiyux 3 months ago - 6 comments

#3998 - [fix] Fix llama4 + eagle3

Pull Request - State: open - Opened by mikeiovine 3 months ago - 21 comments

#3993 - chore:update .gitignore for doc building task.

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 6 comments

#3992 - chore: enhance the cmake experience by ignoring the additional semicolon

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 17 comments

#3990 - chore: reduce size of the docker images

Pull Request - State: closed - Opened by MartinMarciniszyn 3 months ago - 14 comments

#3989 - fix:https://nvbugs/5246733

Pull Request - State: open - Opened by nv-guomingz 3 months ago - 2 comments

#3988 - fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled.

Pull Request - State: open - Opened by hyukn 3 months ago - 2 comments

#3986 - docs:update 0.19 docs

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 3 comments

#3985 - [TRTLLM-3925, https://nvbugs/5245262] [fix] Normalize LLM.generate API

Pull Request - State: closed - Opened by syuoni 3 months ago - 14 comments

#3984 - fix: Correctly sizes seqslotmanager considering pp.

Pull Request - State: open - Opened by dcampora 3 months ago - 6 comments

#3983 - feat: support to trace executor loop.

Pull Request - State: open - Opened by yuxianq 3 months ago - 6 comments

#3981 - infra: Add NIXL into the Dockerfile

Pull Request - State: closed - Opened by Shixiaowei02 3 months ago - 11 comments

#3980 - refactor: Move ModelSpec to core library

Pull Request - State: open - Opened by Funatiq 3 months ago - 14 comments

#3979 - Feat: Variable-Beam-Width-Search (VBWS) part4

Pull Request - State: open - Opened by wili-65535 3 months ago - 9 comments

#3978 - [fix] Enable pp tests

Pull Request - State: open - Opened by yizhang-nv 3 months ago

#3977 - Qserve-w4a8 Shows Lower Computational Efficiency on H20

Issue - State: open - Opened by StaryDing 3 months ago - 4 comments
Labels: not a bug

#3976 - doc: Update 0.19.0 release notes

Pull Request - State: open - Opened by kaiyux 3 months ago

#3975 - [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler

Pull Request - State: open - Opened by syuoni 3 months ago - 14 comments

#3974 - feat: conditional disaggregation in disagg server

Pull Request - State: closed - Opened by zhengd-nv 3 months ago - 53 comments

#3973 - chore: update internal_cutlass_kernels.

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 9 comments

#3972 - fix[nvbug-5228840]: Add debug log memory infomation for memory allocation error

Pull Request - State: open - Opened by HuiGao-NV 3 months ago - 3 comments

#3971 - chore: update multi-gpu trigger file list

Pull Request - State: closed - Opened by QiJune 3 months ago - 6 comments

#3970 - fix: Add attention workspace memory check

Pull Request - State: open - Opened by hlu1 3 months ago - 3 comments

#3969 - Chore: 2025-04-29 CI allowlist update

Pull Request - State: open - Opened by tburt-nv 3 months ago

#3968 - [TRTLLM-4623][fix] sync internal cutlass kernel changes

Pull Request - State: closed - Opened by pamelap-nvidia 3 months ago - 3 comments

#3967 - [fix] Eagle-2 LLMAPI pybind argument fix.

Pull Request - State: open - Opened by jhaotingc 3 months ago - 19 comments

#3966 - SW Architecture Enhancements

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: roadmap, SW Architecture

#3964 - 1.0 Architecture

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: roadmap, SW Architecture

#3963 - Disaggregated Prefill & Decode serving optimizations

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: triaged, Performance, Investigating, roadmap

#3962 - MoE optimizations

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: triaged, Performance, Investigating, roadmap

#3961 - Support versioned github.io doc to make it easy to map code with the corresponding doc version

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: Documentation, triaged, Investigating, roadmap

#3960 - Re-organize the example directory into Feature level examples and Model level examples

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: Documentation, triaged, Investigating, roadmap

#3958 - Intra-1.x-version backward compatibility for selected APIs.

Issue - State: open - Opened by mk-nvidia 3 months ago
Labels: triaged, Investigating, roadmap

#3957 - [fix] Pad requests to maximum draft length in spec decode

Pull Request - State: closed - Opened by mikeiovine 3 months ago - 3 comments

#3955 - Plenty of regressions in trt-llm v0.20.0

Issue - State: open - Opened by michaelfeil 3 months ago - 1 comment
Labels: bug

#3954 - chore: remove release branch codeowners from main

Pull Request - State: closed - Opened by tburt-nv 3 months ago - 3 comments

#3953 - align decoder state with trtllm decoder

Pull Request - State: closed - Opened by netanel-haber 3 months ago

#3952 - [https://nvbugs/5123103][fix] Fix torch compile for DeepSeekV3

Pull Request - State: open - Opened by liji-nv 3 months ago - 43 comments

#3951 - [https://nvbugs/5238105] fix: ModelRunnerCpp num_return_sequences

Pull Request - State: open - Opened by Funatiq 3 months ago - 44 comments

#3950 - test: Add fp8kv to DS-v3-lite integration tests.

Pull Request - State: open - Opened by bobboli 3 months ago - 23 comments

#3949 - chore: bump version to 0.20.0rc2

Pull Request - State: closed - Opened by ZhanruiSunCh 3 months ago - 6 comments

#3948 - infra: Fix pipeline step error in post merge

Pull Request - State: open - Opened by ZhanruiSunCh 3 months ago - 6 comments

GitHub / NVIDIA/TensorRT-LLM issues and pull requests