NVIDIA/TensorRT-LLM issues and pull requests

#4309 - [AutoDeploy] Investigate cudagraph in torch.compile

Issue - State: open - Opened by lucaslie 3 months ago
Labels: triaged, AutoDeploy

#4303 - [perf] Reduce the workspace size of FP4 activation scales for MoE

Pull Request - State: closed - Opened by jinyangyuan-nvidia 3 months ago - 106 comments

#4298 - [CI] add some sanity check test cases for PyTorch backend

Pull Request - State: open - Opened by QiJune 3 months ago - 18 comments

#4296 - test: add llama_v4_scout_instruct and llama_v4_maverick_instruct into perf test

Pull Request - State: closed - Opened by ruodil 3 months ago - 3 comments

#4294 - infra: [TRTLLM-5072] Refactor docker build image groovy and support NGC images

Pull Request - State: open - Opened by ZhanruiSunCh 3 months ago - 60 comments

#4236 - Draft: add NVLM_D support

Pull Request - State: open - Opened by mwawrzos 3 months ago - 7 comments
Labels: triaged, Community want to contribute, waiting for feedback

#4232 - feat: W4A16 GEMM

Pull Request - State: open - Opened by danielafrimi 3 months ago - 121 comments
Labels: triaged, Community want to contribute

#4215 - tests: PyTorch multimodal using keyword match

Pull Request - State: open - Opened by amukkara 3 months ago

#4214 - opt: the perormance for dist-agg streaming generation

Pull Request - State: open - Opened by Superjomn 3 months ago - 36 comments

#4213 - fix: Fix input_scale no attribute issue in BF16 mode

Pull Request - State: open - Opened by nvpohanh 3 months ago

#4212 - [Infra] Waive L0 test

Pull Request - State: closed - Opened by yiqingy0 3 months ago - 3 comments

#4211 - [https://nvbugspro.nvidia.com/bug/5270564][test] skip per-hopper for llama4

Pull Request - State: open - Opened by crazydemo 3 months ago - 3 comments

#4210 - [CI] update pytorch only file list

Pull Request - State: closed - Opened by QiJune 3 months ago - 3 comments

#4209 - doc:update linux installation md.

Pull Request - State: open - Opened by nv-guomingz 3 months ago - 1 comment

#4208 - fix: fix qwen3 rope to use xqa

Pull Request - State: open - Opened by dongjiyingdjy 3 months ago - 1 comment

#4207 - test: [CI] remove closed bugs

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 21 comments

#4206 - [CI] waive two multi-gpu test cases

Pull Request - State: closed - Opened by QiJune 3 months ago - 6 comments

#4205 - test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 7 comments

#4204 - [doc] fix: disaggreggated examples

Pull Request - State: open - Opened by lkm2835 3 months ago

#4203 - test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 6 comments

#4202 - [draft] Refactor quant in linear

Pull Request - State: closed - Opened by HuiGao-NV 3 months ago - 34 comments

#4201 - Integrate trtllm-gen kernels for QKVGemm, FC13+swiGLU, and FC2 for Llama4

Pull Request - State: closed - Opened by eopXD 3 months ago

#4200 - chore: PR to fix the formatting errors

Pull Request - State: closed - Opened by mayani-nv 3 months ago - 3 comments

#4199 - [TRTLLM-5188] fix: [AutoDeploy] update output shape of prepare_fused_mha_metadata_fake

Pull Request - State: open - Opened by Fridah-nv 3 months ago - 11 comments

#4198 - Added tests for Llama3.1-70B-BF16 on SM120

Pull Request - State: open - Opened by farazkh80 3 months ago

#4196 - BF16 llama 4 broken on feat/llama4 branch

Issue - State: open - Opened by mikeiovine 3 months ago
Labels: bug

#4195 - Extend the Llama-Nemotron-Nano-8B perf-integration-tests

Pull Request - State: open - Opened by venkywonka 3 months ago

#4194 - Test main images CI result

Pull Request - State: open - Opened by ZhanruiSunCh 3 months ago - 5 comments

#4193 - Update 0.19

Pull Request - State: closed - Opened by kaiyux 3 months ago

#4191 - infra: [TRTLLM-325] Prepare for NGC release - multiplatform build

Pull Request - State: open - Opened by MartinMarciniszyn 3 months ago - 7 comments

#4190 - fix: Revert NIXL and ETCD from the main image

Pull Request - State: open - Opened by Shixiaowei02 3 months ago

#4189 - Cherry-pick commits from feat/llama4 to main

Pull Request - State: open - Opened by chenfeiz0326 3 months ago - 32 comments

#4188 - [bug/5247505] fix: CP accuracy on Blackwell

Pull Request - State: open - Opened by DylanChen-NV 3 months ago - 5 comments

#4187 - test: Remove CNN Dailymail tasks in favor of GSM8K

Pull Request - State: closed - Opened by syuoni 3 months ago - 3 comments

#4186 - test: amend regex match for perf throughput

Pull Request - State: closed - Opened by ruodil 3 months ago

#4185 - infra: open source fmha v2 kernels

Pull Request - State: open - Opened by qsang-nv 3 months ago - 6 comments

#4184 - fix: library path of nixl

Pull Request - State: closed - Opened by Shixiaowei02 3 months ago - 5 comments

#4183 - feat: Support for Mistral Small 3.1 24B VLM

Pull Request - State: open - Opened by brb-nv 3 months ago - 3 comments

#4182 - feat: Prefetch safetensors files before loading them

Pull Request - State: closed - Opened by nvpohanh 3 months ago

#4181 - ^gdr_copy

Pull Request - State: open - Opened by chuangz0 3 months ago - 3 comments

#4180 - add changes for fp8, nemotron-nas, API

Pull Request - State: open - Opened by shaharmor98 3 months ago - 21 comments

#4179 - feat: Improve perf of AllGather-Top1 after LMHead

Pull Request - State: closed - Opened by nvpohanh 3 months ago - 3 comments

#4176 - test: amend default pytorch extra-llm-api-config.yml in perf test

Pull Request - State: closed - Opened by ruodil 3 months ago - 3 comments

#4175 - [TRTQA-2802][fix]: add --host for mgmn serve examples script

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 3 comments

#4174 - Breaking change: perf: Enable scheduling overlap by default

Pull Request - State: open - Opened by kaiyux 3 months ago - 11 comments

#4173 - chore: Deprecate evaltool

Pull Request - State: closed - Opened by Tracin 3 months ago - 12 comments

#4171 - chore: Remove deprecated Python runtime benchmark

Pull Request - State: open - Opened by kaiyux 3 months ago - 6 comments

#4170 - exp: pull/4114

Pull Request - State: open - Opened by tongyuantongyu 3 months ago - 7 comments

#4167 - fix: draft target README and assertion for logits-based acceptance

Pull Request - State: closed - Opened by mayani-nv 3 months ago - 1 comment

#4166 - doc: Release V0.19 Perf Overview Update

Pull Request - State: closed - Opened by zbpatel 3 months ago - 8 comments

#4165 - test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv 3 months ago - 9 comments

#4163 - [feat] [AutoDeploy] Llama-4 Support

Pull Request - State: open - Opened by lucaslie 3 months ago

#4161 - [TRTLLM-5054][fix] Removing repeated loading of input processor

Pull Request - State: open - Opened by rakib-hasan 3 months ago - 2 comments

#4160 - fix: bump xgrammar

Pull Request - State: open - Opened by milesial 3 months ago - 2 comments

#4159 - [nvbugs/5268808][fix] Fix the potential out-of-range-access issue of allreduce workspace.

Pull Request - State: open - Opened by hyukn 3 months ago - 14 comments

#4158 - Add test case for kv memory estimation

Pull Request - State: closed - Opened by HuiGao-NV 3 months ago - 46 comments

#4157 - fix: alltoall padding for chunked MoE

Pull Request - State: open - Opened by dongxuy04 3 months ago - 2 comments

#4156 - [TRTLLM-5050][feat] Enable per-request stats with PyT backend

Pull Request - State: open - Opened by pcastonguay 3 months ago - 9 comments

#4155 - Feat: support exporting softmax statistics and update the kernel-selection heuristic

Pull Request - State: open - Opened by PerkzZheng 3 months ago - 16 comments

#4154 - Scaffolding support streaming output

Issue - State: open - Opened by WeiHaocheng 3 months ago
Labels: Scaffolding

#4153 - remove cache_transceiver_prealloc_size

Pull Request - State: closed - Opened by chuangz0 3 months ago - 14 comments

#4152 - infra: Move SBSA build stage to Blossom

Pull Request - State: open - Opened by ZhanruiSunCh 3 months ago - 8 comments

#4151 - [TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller

Pull Request - State: open - Opened by dc3671 3 months ago

#4150 - chore:update modelopt to 0.29

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 9 comments

#4149 - Ensure FDL is enabled for fc13 swiglu

Pull Request - State: open - Opened by eopXD 3 months ago

#4148 - [Infra] Waive L0 flaky test

Pull Request - State: closed - Opened by yiqingy0 3 months ago - 3 comments

#4147 - fix/ replace sanity test for nemotron h with a correctness test

Pull Request - State: open - Opened by omera-nv 3 months ago - 6 comments

#4146 - perf: Fuse gemm setup function for SM90/SM100 MOE plugin path

Pull Request - State: open - Opened by djns99 3 months ago - 3 comments

#4145 - [TRTLLM-5007][feat] Add multimodal hashing support (image hashing)

Pull Request - State: closed - Opened by chang-l 3 months ago - 68 comments

#4143 - Fix TP8 for NVFP4 kv dupilcation.

Pull Request - State: closed - Opened by Tracin 3 months ago - 3 comments

#4142 - enh: Enable option in trtllm-bench build subcommand to avoid loading weights

Pull Request - State: open - Opened by venkywonka 3 months ago

#4141 - [TRTLLM-5147][Qwen3] fix: fix bug of attention dp on qwen3_moe model

Pull Request - State: open - Opened by byshiue 3 months ago - 16 comments

#4140 - feat: Prefetch safetensors files before loading them

Pull Request - State: open - Opened by nvpohanh 3 months ago - 23 comments

#4139 - test: add llama_3.2_1B model and fix for test lora script issue

Pull Request - State: open - Opened by ruodil 3 months ago - 3 comments

#4137 - Cherry-pick: Use multi-threading to load MoE expert weights

Pull Request - State: closed - Opened by chenfeiz0326 3 months ago - 22 comments

#4136 - test: Waive test_llm cases

Pull Request - State: closed - Opened by syuoni 3 months ago - 8 comments

#4135 - fix: Fix MOE benchmark to rotate buffers to prevent L2 cache reuse

Pull Request - State: open - Opened by djns99 3 months ago - 6 comments

#4134 - feat: Add disagg accuracy testing for DeepSeek V3 Lite

Pull Request - State: open - Opened by Tabrizian 3 months ago - 9 comments

#4133 - Draft: feat: Add chunking to PyT heuristic for trtllm-bench.

Pull Request - State: open - Opened by FrankD412 3 months ago - 9 comments

#4132 - [feat] Enable chunked context for flashinfer

Pull Request - State: open - Opened by mikeiovine 3 months ago - 19 comments

#4131 - infra: WAR for Argument list too long of globalVars[CACHED_CHANGED_FILE_LIST]

Pull Request - State: closed - Opened by ZhanruiSunCh 3 months ago - 6 comments

#4130 - fix: change pp broadcast pattern for LPs

Pull Request - State: open - Opened by hchings 3 months ago - 9 comments

#4129 - docs:add torch flow supported model list.

Pull Request - State: open - Opened by nv-guomingz 3 months ago - 3 comments

#4128 - test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench)

Pull Request - State: open - Opened by venkywonka 3 months ago - 15 comments

#4127 - [Call for contributions]The development plan of large-scale EP support in TensorRT-LLM

Issue - State: open - Opened by juney-nvidia 3 months ago - 1 comment
Labels: Community Engagement

#4126 - [fix] Fix relaxed acceptance to support enabling it in context phase

Pull Request - State: open - Opened by lfr-0531 3 months ago - 14 comments

#4125 - Agent interface impl for NIXL

Pull Request - State: open - Opened by chuangz0 3 months ago

#4124 - test: Waive disagg accuracy test

Pull Request - State: closed - Opened by syuoni 3 months ago - 17 comments

#4123 - [feat] Support DeepSeek-R1 W4A8 on Hopper

Pull Request - State: open - Opened by Barry-Delaney 3 months ago - 30 comments

#4122 - [Infra] - Update code ownership rules for public APIs

Pull Request - State: open - Opened by chzblych 3 months ago - 6 comments

#4121 - Readability: "decoder"->"sampler"

Pull Request - State: open - Opened by netanel-haber 3 months ago - 22 comments

#4120 - docs:update 0.19 doc.

Pull Request - State: closed - Opened by nv-guomingz 3 months ago - 3 comments

#4119 - [fix] trtllm-gen mla kernel warnings

Pull Request - State: closed - Opened by zhhuang-nv 3 months ago - 21 comments

#4118 - Qwen2-0.5B Inference Freezes with TensorRT-LLM on RTX 5000

Issue - State: open - Opened by ashkanzarkhah 3 months ago

#4117 - Feat: support MTP for fmha_v2 based MLA kernels.

Pull Request - State: open - Opened by PerkzZheng 3 months ago - 12 comments

#4116 - fix: always cleanup process tree

Pull Request - State: open - Opened by tongyuantongyu 3 months ago - 31 comments

GitHub / NVIDIA/TensorRT-LLM issues and pull requests