NVIDIA/TensorRT-LLM issues and pull requests

#4994 - ucxx only use ucp_feature_tag to aviod some issuse on some platform

Pull Request - State: open - Opened by chuangz0 about 2 months ago - 10 comments

#4993 - fix: Disaggregate serving with attention DP

Pull Request - State: open - Opened by VALLIS-NERIA about 2 months ago - 9 comments

#4990 - doc: Added documentation for enable_trtllm_sampler.

Pull Request - State: open - Opened by dcampora about 2 months ago

#4989 - tests: fix some typo and limitation on test cases

Pull Request - State: open - Opened by crazydemo about 2 months ago - 9 comments

#4986 - fix: Fix warmup phase batch size out of range.

Pull Request - State: closed - Opened by hyukn about 2 months ago - 22 comments

#4982 - feat: Allow discontiguous inputs to the group_rms_norm.

Pull Request - State: open - Opened by SimengLiu-nv about 2 months ago - 3 comments

#4975 - doc: Minor fixes and clarification

Pull Request - State: closed - Opened by kaiyux about 2 months ago

#4972 - fix: build_config in TorchLlmArgs and avoid invalid args

Pull Request - State: open - Opened by Superjomn about 2 months ago - 16 comments

#4971 - feat: Add non-streaming support for trtllm serve bench script & fixed prompt and output token length

Pull Request - State: open - Opened by yizhang-nv about 2 months ago - 33 comments

#4964 - [https://nvbugspro.nvidia.com/bug/5323820] Fix chunking equation for disabled case.

Pull Request - State: closed - Opened by FrankD412 about 2 months ago - 3 comments

#4963 - [fix] Fix incorrect get_spec_worker selection logic

Pull Request - State: open - Opened by nv-yilinf about 2 months ago - 9 comments

#4961 - Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs

Pull Request - State: open - Opened by moraxu about 2 months ago - 18 comments

#4955 - Add customized renormalized moe routing kernel for moe cutlass backend

Pull Request - State: closed - Opened by ChristinaZ about 2 months ago - 19 comments

#4954 - [nvbug 5325284][fix] Increase Nemotron-H warmup request robustness

Pull Request - State: open - Opened by tomeras91 about 2 months ago - 23 comments

#4951 - ci: [nvbugs/5280806] Unwaive unittests/_torch.

Pull Request - State: closed - Opened by yuxianq about 2 months ago - 27 comments

#4939 - infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build

Pull Request - State: open - Opened by ZhanruiSunCh about 2 months ago - 66 comments

#4936 - Raise shut down error for each request

Pull Request - State: open - Opened by Shunkangz about 2 months ago - 18 comments

#4930 - fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor.

Pull Request - State: open - Opened by yuxianq about 2 months ago - 6 comments

#4928 - chore: cleanup GDS Cmake interface

Pull Request - State: open - Opened by achartier about 2 months ago - 13 comments

#4924 - fix: trtllm-bench --dataset required=True

Pull Request - State: open - Opened by jasonqinzhou about 2 months ago - 5 comments

#4923 - Coalesce text diffs in streaming requests.

Pull Request - State: open - Opened by pathorn about 2 months ago - 3 comments
Labels: triaged, Community want to contribute, OpenAI API

#4918 - chore: Refactor apply_rope.

Pull Request - State: closed - Opened by bobboli about 2 months ago - 30 comments

#4917 - [Nvidia A10G + _torch flow]: No fused attention + OOM for 2048 context length

Issue - State: open - Opened by michaelfeil about 2 months ago - 2 comments
Labels: bug

#4915 - perf: Removing initializing ptuning buffers to zero

Pull Request - State: open - Opened by pcastonguay about 2 months ago - 28 comments

#4904 - [nvbug/5195657][fix] fix reset spec buffer and update mMaxAttentionWindowVec logic

Pull Request - State: open - Opened by yweng0828 about 2 months ago - 18 comments

#4903 - Fix support of system error

Pull Request - State: open - Opened by Shunkangz about 2 months ago

#4900 - chore: partition LLM class into TorchLLM and TrtLLM

Pull Request - State: open - Opened by Superjomn about 2 months ago - 31 comments

#4893 - chore: Refine weight prefetching.

Pull Request - State: closed - Opened by yuxianq about 2 months ago - 27 comments

#4885 - [Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11

Pull Request - State: open - Opened by EmmaQiaoCh about 2 months ago - 125 comments
Labels: Release Blocker

#4883 - [TRTLLM-5644][infra] Update the community action to more appropriate api

Pull Request - State: open - Opened by poweiw about 2 months ago - 1 comment

#4879 - [nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len

Pull Request - State: open - Opened by mikeiovine about 2 months ago - 5 comments

#4877 - [TRTLLM-5518] doc: Adding disaggregated serving section to models doc

Pull Request - State: closed - Opened by pcastonguay about 2 months ago - 9 comments

#4875 - ReDrafter support for Qwen

Pull Request - State: open - Opened by darraghdog about 2 months ago

#4872 - [TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner

Pull Request - State: closed - Opened by DomBrown about 2 months ago - 23 comments

#4870 - TRTLLM-5219[feat] W4A8 AWQ Support

Pull Request - State: open - Opened by danielafrimi about 2 months ago

#4869 - [feat] Optimize KV Cache Reuse for MLA

Pull Request - State: open - Opened by zhhuang-nv about 2 months ago - 16 comments

#4867 - feat: Add w4a8_mxfp4_fp8 mode.

Pull Request - State: open - Opened by Tracin about 2 months ago - 22 comments

#4858 - [feat] Support XQA-based MLA on SM120

Pull Request - State: open - Opened by jinyangyuan-nvidia about 2 months ago - 3 comments

#4857 - Waive L0 test

Pull Request - State: closed - Opened by yiqingy0 about 2 months ago - 3 comments

#4856 - [feat] Support XQA-based MLA on SM120

Pull Request - State: closed - Opened by jinyangyuan-nvidia about 2 months ago

#4855 - Disable cyclic kv cache chunk

Pull Request - State: open - Opened by ming-wei about 2 months ago - 1 comment

#4854 - Draft: Generalize Checkpoint Loading Logic

Pull Request - State: closed - Opened by shaharmor98 about 2 months ago

#4853 - fix: fix cuda graph padding for spec decoding

Pull Request - State: open - Opened by lfr-0531 about 2 months ago - 5 comments

#4852 - Remove unnecessary duplicated tests from H100/H200 DGX pre/post-merge

Pull Request - State: open - Opened by litaotju about 2 months ago - 3 comments

#4851 - Replace memset with data initialization within kernels

Pull Request - State: open - Opened by ChristinaZ about 2 months ago - 3 comments

#4850 - [Infra] - Better utilize multi-GPU CI resources

Pull Request - State: closed - Opened by chzblych about 2 months ago - 3 comments

#4849 - shorten reqs in con:1 cases and add streaming cases, and add l2 perf …

Pull Request - State: closed - Opened by ruodil about 2 months ago

#4848 - enh: Enable trtllm-bench to run LoRA PyT flow

Pull Request - State: open - Opened by venkywonka about 2 months ago - 20 comments

#4846 - [Doc] Fix readme for disaggregated serving

Pull Request - State: open - Opened by arekay about 2 months ago - 5 comments

#4845 - [nvbug 5283506] fix: Fix spec decode triton test

Pull Request - State: closed - Opened by pcastonguay about 2 months ago - 27 comments

#4843 - [nvbug/5314469][feat] Include the executor's max batch size in CUDA g…

Pull Request - State: closed - Opened by mikeiovine about 2 months ago - 25 comments

#4842 - Add pre-merge Triton backend tests

Pull Request - State: closed - Opened by Tabrizian about 2 months ago - 3 comments

#4840 - [Infra] - Reduce the default Pytest timeout and test Ubuntu mirrors

Pull Request - State: closed - Opened by chzblych about 2 months ago - 3 comments

#4839 - Update code owner list

Pull Request - State: open - Opened by juney-nvidia about 2 months ago - 3 comments

#4838 - Draft: test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv about 2 months ago

#4837 - Chore: refine prepre inputs method of model engine

Pull Request - State: open - Opened by QiJune 2 months ago - 10 comments

#4836 - feat: TRTLLM Sampler log probs support

Pull Request - State: open - Opened by dcampora 2 months ago - 5 comments

#4835 - Fix: NVBug 5302895

Pull Request - State: open - Opened by Shixiaowei02 2 months ago - 5 comments

#4834 - Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536

Pull Request - State: closed - Opened by lfr-0531 2 months ago - 3 comments

#4833 - Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379

Pull Request - State: closed - Opened by lfr-0531 2 months ago - 3 comments

#4831 - test

Pull Request - State: open - Opened by yunruis 2 months ago - 2 comments

#4830 - fix: [nvbugs/5298600] fix illegal memory access on mrope_position_deltas

Pull Request - State: closed - Opened by yechank-nvidia 2 months ago - 3 comments

#4828 - feat: port MakeDecodingBatchInputOutput to python in TRTLLMSampler

Pull Request - State: open - Opened by dcampora 2 months ago - 36 comments

#4827 - Fix trtllm-bench iter_stats and cuda_graph_batch_sizes error errors.

Pull Request - State: open - Opened by qiaoxj07 2 months ago - 8 comments

#4826 - chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM

Pull Request - State: open - Opened by rosenrodt 2 months ago - 11 comments

#4825 - Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model

Issue - State: open - Opened by bebilli 2 months ago - 1 comment
Labels: bug, triaged

#4822 - [TRTLLM-4923][feat] Paged mamba cache

Pull Request - State: open - Opened by tomeras91 2 months ago - 4 comments

#4819 - [TRTLLM-4987][feat] Support generation logits in TRTLLMSampler

Pull Request - State: open - Opened by amitz-nv 2 months ago - 3 comments

#4818 - feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4)

Pull Request - State: open - Opened by dongxuy04 2 months ago - 42 comments

#4816 - Driver crash during warmup of DeepSeek-R1-FP4

Issue - State: open - Opened by pathorn 2 months ago - 1 comment
Labels: bug

#4815 - The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32

Issue - State: open - Opened by Alireza3242 2 months ago - 2 comments
Labels: bug, triaged

#4814 - [Arch] Freeze model_config

Pull Request - State: open - Opened by hlu1 2 months ago - 18 comments

#4812 - [enhanchment] Add beam width to low latency.

Pull Request - State: open - Opened by FrankD412 2 months ago - 3 comments

#4809 - [fix] Fix llama 4 long context on Hopper

Pull Request - State: open - Opened by mikeiovine 2 months ago - 14 comments

#4807 - [nvbug/5280806][fix] Fix 2 model spec decode flow

Pull Request - State: open - Opened by mikeiovine 2 months ago - 10 comments

#4804 - [fix] Do not reuse dummy request KVCache

Pull Request - State: open - Opened by liji-nv 2 months ago - 29 comments

#4802 - [TRTLLM-4983] feat: enable overlap scheduler between draft forwards

Pull Request - State: open - Opened by lfr-0531 2 months ago - 3 comments

#4801 - test: remove invalid triton integration test cases

Pull Request - State: closed - Opened by StanleySun639 2 months ago - 3 comments

#4800 - how do the Qwen3-14B convert TensorRT engine？

Issue - State: closed - Opened by w066650 2 months ago - 3 comments

#4799 - feat: add HyperCLOVAX-SEED-Vision support in refactored way

Pull Request - State: open - Opened by yechank-nvidia 2 months ago

#4798 - fix [nvbug5256044]: bench hang due to llmapi ipc

Pull Request - State: closed - Opened by Superjomn 2 months ago - 12 comments

#4796 - test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test

Pull Request - State: closed - Opened by ruodil 2 months ago

#4795 - Waive l0 tests

Pull Request - State: closed - Opened by yiqingy0 2 months ago - 3 comments

#4794 - upgrade cutlass to 4.0

Pull Request - State: closed - Opened by yunruis 2 months ago - 3 comments

#4792 - feat: large-scale EP(part 7: DeepEP integration)

Pull Request - State: open - Opened by yuantailing 2 months ago - 16 comments

#4790 - [Architecture] Refactor FusedMoE

Pull Request - State: closed - Opened by hlu1 2 months ago - 27 comments

#4787 - Feature support: eagle multimodal inputs

Issue - State: open - Opened by liyi-xia 2 months ago - 2 comments
Labels: feature request

#4784 - fix: remove the accuracy assert on run_majority_vote_aime24.py #5340

Pull Request - State: closed - Opened by WeiHaocheng 2 months ago - 3 comments

#4781 - fix: correct the order of llm request state

Pull Request - State: open - Opened by zhengd-nv 2 months ago - 9 comments

#4778 - [feat] Implement model-agnostic one-engine eagle3

Pull Request - State: open - Opened by nv-yilinf 2 months ago - 8 comments

#4773 - [feat] Enable CUDA graphs for EAGLE3 two model implementation prefill stage

Pull Request - State: closed - Opened by mikeiovine 2 months ago - 4 comments

#4771 - [feat] Multi node support via Slurm

Pull Request - State: open - Opened by yuanjingx87 2 months ago - 57 comments
Labels: Release Blocker

#4768 - [DO NOT MERGE] Debug perf

Pull Request - State: open - Opened by kaiyux 2 months ago

#4765 - feat: add heuristics for checkpoint files prefetching.

Pull Request - State: closed - Opened by yuxianq 2 months ago - 13 comments

#4764 - infra: upload imageTag info to artifactory and add ngc_staging to save ngc image

Pull Request - State: open - Opened by ZhanruiSunCh 2 months ago - 49 comments

GitHub / NVIDIA/TensorRT-LLM issues and pull requests