GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#4309 - [AutoDeploy] Investigate cudagraph in torch.compile
Issue -
State: open - Opened by lucaslie 3 months ago
Labels: triaged, AutoDeploy
#4303 - [perf] Reduce the workspace size of FP4 activation scales for MoE
Pull Request -
State: closed - Opened by jinyangyuan-nvidia 3 months ago
- 106 comments
#4298 - [CI] add some sanity check test cases for PyTorch backend
Pull Request -
State: open - Opened by QiJune 3 months ago
- 18 comments
#4296 - test: add llama_v4_scout_instruct and llama_v4_maverick_instruct into perf test
Pull Request -
State: closed - Opened by ruodil 3 months ago
- 3 comments
#4294 - infra: [TRTLLM-5072] Refactor docker build image groovy and support NGC images
Pull Request -
State: open - Opened by ZhanruiSunCh 3 months ago
- 60 comments
#4236 - Draft: add NVLM_D support
Pull Request -
State: open - Opened by mwawrzos 3 months ago
- 7 comments
Labels: triaged, Community want to contribute, waiting for feedback
#4232 - feat: W4A16 GEMM
Pull Request -
State: open - Opened by danielafrimi 3 months ago
- 121 comments
Labels: triaged, Community want to contribute
#4215 - tests: PyTorch multimodal using keyword match
Pull Request -
State: open - Opened by amukkara 3 months ago
#4214 - opt: the perormance for dist-agg streaming generation
Pull Request -
State: open - Opened by Superjomn 3 months ago
- 36 comments
#4213 - fix: Fix input_scale no attribute issue in BF16 mode
Pull Request -
State: open - Opened by nvpohanh 3 months ago
#4212 - [Infra] Waive L0 test
Pull Request -
State: closed - Opened by yiqingy0 3 months ago
- 3 comments
#4211 - [https://nvbugspro.nvidia.com/bug/5270564][test] skip per-hopper for llama4
Pull Request -
State: open - Opened by crazydemo 3 months ago
- 3 comments
#4210 - [CI] update pytorch only file list
Pull Request -
State: closed - Opened by QiJune 3 months ago
- 3 comments
#4209 - doc:update linux installation md.
Pull Request -
State: open - Opened by nv-guomingz 3 months ago
- 1 comment
#4208 - fix: fix qwen3 rope to use xqa
Pull Request -
State: open - Opened by dongjiyingdjy 3 months ago
- 1 comment
#4207 - test: [CI] remove closed bugs
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 21 comments
#4206 - [CI] waive two multi-gpu test cases
Pull Request -
State: closed - Opened by QiJune 3 months ago
- 6 comments
#4205 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 7 comments
#4204 - [doc] fix: disaggreggated examples
Pull Request -
State: open - Opened by lkm2835 3 months ago
#4203 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 6 comments
#4202 - [draft] Refactor quant in linear
Pull Request -
State: closed - Opened by HuiGao-NV 3 months ago
- 34 comments
#4201 - Integrate trtllm-gen kernels for QKVGemm, FC13+swiGLU, and FC2 for Llama4
Pull Request -
State: closed - Opened by eopXD 3 months ago
#4200 - chore: PR to fix the formatting errors
Pull Request -
State: closed - Opened by mayani-nv 3 months ago
- 3 comments
#4199 - [TRTLLM-5188] fix: [AutoDeploy] update output shape of prepare_fused_mha_metadata_fake
Pull Request -
State: open - Opened by Fridah-nv 3 months ago
- 11 comments
#4198 - Added tests for Llama3.1-70B-BF16 on SM120
Pull Request -
State: open - Opened by farazkh80 3 months ago
#4196 - BF16 llama 4 broken on feat/llama4 branch
Issue -
State: open - Opened by mikeiovine 3 months ago
Labels: bug
#4195 - Extend the Llama-Nemotron-Nano-8B perf-integration-tests
Pull Request -
State: open - Opened by venkywonka 3 months ago
#4194 - Test main images CI result
Pull Request -
State: open - Opened by ZhanruiSunCh 3 months ago
- 5 comments
#4193 - Update 0.19
Pull Request -
State: closed - Opened by kaiyux 3 months ago
#4191 - infra: [TRTLLM-325] Prepare for NGC release - multiplatform build
Pull Request -
State: open - Opened by MartinMarciniszyn 3 months ago
- 7 comments
#4190 - fix: Revert NIXL and ETCD from the main image
Pull Request -
State: open - Opened by Shixiaowei02 3 months ago
#4189 - Cherry-pick commits from feat/llama4 to main
Pull Request -
State: open - Opened by chenfeiz0326 3 months ago
- 32 comments
#4188 - [bug/5247505] fix: CP accuracy on Blackwell
Pull Request -
State: open - Opened by DylanChen-NV 3 months ago
- 5 comments
#4187 - test: Remove CNN Dailymail tasks in favor of GSM8K
Pull Request -
State: closed - Opened by syuoni 3 months ago
- 3 comments
#4186 - test: amend regex match for perf throughput
Pull Request -
State: closed - Opened by ruodil 3 months ago
#4185 - infra: open source fmha v2 kernels
Pull Request -
State: open - Opened by qsang-nv 3 months ago
- 6 comments
#4184 - fix: library path of nixl
Pull Request -
State: closed - Opened by Shixiaowei02 3 months ago
- 5 comments
#4183 - feat: Support for Mistral Small 3.1 24B VLM
Pull Request -
State: open - Opened by brb-nv 3 months ago
- 3 comments
#4182 - feat: Prefetch safetensors files before loading them
Pull Request -
State: closed - Opened by nvpohanh 3 months ago
#4181 - ^gdr_copy
Pull Request -
State: open - Opened by chuangz0 3 months ago
- 3 comments
#4180 - add changes for fp8, nemotron-nas, API
Pull Request -
State: open - Opened by shaharmor98 3 months ago
- 21 comments
#4179 - feat: Improve perf of AllGather-Top1 after LMHead
Pull Request -
State: closed - Opened by nvpohanh 3 months ago
- 3 comments
#4176 - test: amend default pytorch extra-llm-api-config.yml in perf test
Pull Request -
State: closed - Opened by ruodil 3 months ago
- 3 comments
#4175 - [TRTQA-2802][fix]: add --host for mgmn serve examples script
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 3 comments
#4174 - Breaking change: perf: Enable scheduling overlap by default
Pull Request -
State: open - Opened by kaiyux 3 months ago
- 11 comments
#4173 - chore: Deprecate evaltool
Pull Request -
State: closed - Opened by Tracin 3 months ago
- 12 comments
#4171 - chore: Remove deprecated Python runtime benchmark
Pull Request -
State: open - Opened by kaiyux 3 months ago
- 6 comments
#4170 - exp: pull/4114
Pull Request -
State: open - Opened by tongyuantongyu 3 months ago
- 7 comments
#4167 - fix: draft target README and assertion for logits-based acceptance
Pull Request -
State: closed - Opened by mayani-nv 3 months ago
- 1 comment
#4166 - doc: Release V0.19 Perf Overview Update
Pull Request -
State: closed - Opened by zbpatel 3 months ago
- 8 comments
#4165 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 9 comments
#4163 - [feat] [AutoDeploy] Llama-4 Support
Pull Request -
State: open - Opened by lucaslie 3 months ago
#4161 - [TRTLLM-5054][fix] Removing repeated loading of input processor
Pull Request -
State: open - Opened by rakib-hasan 3 months ago
- 2 comments
#4160 - fix: bump xgrammar
Pull Request -
State: open - Opened by milesial 3 months ago
- 2 comments
#4159 - [nvbugs/5268808][fix] Fix the potential out-of-range-access issue of allreduce workspace.
Pull Request -
State: open - Opened by hyukn 3 months ago
- 14 comments
#4158 - Add test case for kv memory estimation
Pull Request -
State: closed - Opened by HuiGao-NV 3 months ago
- 46 comments
#4157 - fix: alltoall padding for chunked MoE
Pull Request -
State: open - Opened by dongxuy04 3 months ago
- 2 comments
#4156 - [TRTLLM-5050][feat] Enable per-request stats with PyT backend
Pull Request -
State: open - Opened by pcastonguay 3 months ago
- 9 comments
#4155 - Feat: support exporting softmax statistics and update the kernel-selection heuristic
Pull Request -
State: open - Opened by PerkzZheng 3 months ago
- 16 comments
#4154 - Scaffolding support streaming output
Issue -
State: open - Opened by WeiHaocheng 3 months ago
Labels: Scaffolding
#4153 - remove cache_transceiver_prealloc_size
Pull Request -
State: closed - Opened by chuangz0 3 months ago
- 14 comments
#4152 - infra: Move SBSA build stage to Blossom
Pull Request -
State: open - Opened by ZhanruiSunCh 3 months ago
- 8 comments
#4151 - [TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller
Pull Request -
State: open - Opened by dc3671 3 months ago
#4150 - chore:update modelopt to 0.29
Pull Request -
State: closed - Opened by nv-guomingz 3 months ago
- 9 comments
#4149 - Ensure FDL is enabled for fc13 swiglu
Pull Request -
State: open - Opened by eopXD 3 months ago
#4148 - [Infra] Waive L0 flaky test
Pull Request -
State: closed - Opened by yiqingy0 3 months ago
- 3 comments
#4147 - fix/ replace sanity test for nemotron h with a correctness test
Pull Request -
State: open - Opened by omera-nv 3 months ago
- 6 comments
#4146 - perf: Fuse gemm setup function for SM90/SM100 MOE plugin path
Pull Request -
State: open - Opened by djns99 3 months ago
- 3 comments
#4145 - [TRTLLM-5007][feat] Add multimodal hashing support (image hashing)
Pull Request -
State: closed - Opened by chang-l 3 months ago
- 68 comments
#4143 - Fix TP8 for NVFP4 kv dupilcation.
Pull Request -
State: closed - Opened by Tracin 3 months ago
- 3 comments
#4142 - enh: Enable option in trtllm-bench build subcommand to avoid loading weights
Pull Request -
State: open - Opened by venkywonka 3 months ago
#4141 - [TRTLLM-5147][Qwen3] fix: fix bug of attention dp on qwen3_moe model
Pull Request -
State: open - Opened by byshiue 3 months ago
- 16 comments
#4140 - feat: Prefetch safetensors files before loading them
Pull Request -
State: open - Opened by nvpohanh 3 months ago
- 23 comments
#4139 - test: add llama_3.2_1B model and fix for test lora script issue
Pull Request -
State: open - Opened by ruodil 3 months ago
- 3 comments
#4137 - Cherry-pick: Use multi-threading to load MoE expert weights
Pull Request -
State: closed - Opened by chenfeiz0326 3 months ago
- 22 comments
#4136 - test: Waive test_llm cases
Pull Request -
State: closed - Opened by syuoni 3 months ago
- 8 comments
#4135 - fix: Fix MOE benchmark to rotate buffers to prevent L2 cache reuse
Pull Request -
State: open - Opened by djns99 3 months ago
- 6 comments
#4134 - feat: Add disagg accuracy testing for DeepSeek V3 Lite
Pull Request -
State: open - Opened by Tabrizian 3 months ago
- 9 comments
#4133 - Draft: feat: Add chunking to PyT heuristic for trtllm-bench.
Pull Request -
State: open - Opened by FrankD412 3 months ago
- 9 comments
#4132 - [feat] Enable chunked context for flashinfer
Pull Request -
State: open - Opened by mikeiovine 3 months ago
- 19 comments
#4131 - infra: WAR for Argument list too long of globalVars[CACHED_CHANGED_FILE_LIST]
Pull Request -
State: closed - Opened by ZhanruiSunCh 3 months ago
- 6 comments
#4130 - fix: change pp broadcast pattern for LPs
Pull Request -
State: open - Opened by hchings 3 months ago
- 9 comments
#4129 - docs:add torch flow supported model list.
Pull Request -
State: open - Opened by nv-guomingz 3 months ago
- 3 comments
#4128 - test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench)
Pull Request -
State: open - Opened by venkywonka 3 months ago
- 15 comments
#4127 - [Call for contributions]The development plan of large-scale EP support in TensorRT-LLM
Issue -
State: open - Opened by juney-nvidia 3 months ago
- 1 comment
Labels: Community Engagement
#4126 - [fix] Fix relaxed acceptance to support enabling it in context phase
Pull Request -
State: open - Opened by lfr-0531 3 months ago
- 14 comments
#4125 - Agent interface impl for NIXL
Pull Request -
State: open - Opened by chuangz0 3 months ago
#4124 - test: Waive disagg accuracy test
Pull Request -
State: closed - Opened by syuoni 3 months ago
- 17 comments
#4123 - [feat] Support DeepSeek-R1 W4A8 on Hopper
Pull Request -
State: open - Opened by Barry-Delaney 3 months ago
- 30 comments
#4122 - [Infra] - Update code ownership rules for public APIs
Pull Request -
State: open - Opened by chzblych 3 months ago
- 6 comments
#4121 - Readability: "decoder"->"sampler"
Pull Request -
State: open - Opened by netanel-haber 3 months ago
- 22 comments
#4120 - docs:update 0.19 doc.
Pull Request -
State: closed - Opened by nv-guomingz 3 months ago
- 3 comments
#4119 - [fix] trtllm-gen mla kernel warnings
Pull Request -
State: closed - Opened by zhhuang-nv 3 months ago
- 21 comments
#4118 - Qwen2-0.5B Inference Freezes with TensorRT-LLM on RTX 5000
Issue -
State: open - Opened by ashkanzarkhah 3 months ago
#4117 - Feat: support MTP for fmha_v2 based MLA kernels.
Pull Request -
State: open - Opened by PerkzZheng 3 months ago
- 12 comments
#4116 - fix: always cleanup process tree
Pull Request -
State: open - Opened by tongyuantongyu 3 months ago
- 31 comments
#4115 - feat: add router_gemm, fuse_a_gemm and PDL for DeepSeek-R1 min-latency mode
Pull Request -
State: open - Opened by yunruis 3 months ago
#4114 - infra: Down the gcc toolset version from 13 to 11
Pull Request -
State: open - Opened by ZhanruiSunCh 3 months ago
- 9 comments
#4113 - tests: https://nvbugs/5219534 remove failed tests from test list
Pull Request -
State: closed - Opened by xinhe-nv 3 months ago
- 12 comments
#4112 - fix: Fix incorrect conversion of Gen TPS/user
Pull Request -
State: open - Opened by FrankD412 3 months ago
- 20 comments