GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#4994 - ucxx only use ucp_feature_tag to aviod some issuse on some platform
Pull Request -
State: open - Opened by chuangz0 about 2 months ago
- 10 comments
#4993 - fix: Disaggregate serving with attention DP
Pull Request -
State: open - Opened by VALLIS-NERIA about 2 months ago
- 9 comments
#4990 - doc: Added documentation for enable_trtllm_sampler.
Pull Request -
State: open - Opened by dcampora about 2 months ago
#4989 - tests: fix some typo and limitation on test cases
Pull Request -
State: open - Opened by crazydemo about 2 months ago
- 9 comments
#4986 - fix: Fix warmup phase batch size out of range.
Pull Request -
State: closed - Opened by hyukn about 2 months ago
- 22 comments
#4982 - feat: Allow discontiguous inputs to the group_rms_norm.
Pull Request -
State: open - Opened by SimengLiu-nv about 2 months ago
- 3 comments
#4975 - doc: Minor fixes and clarification
Pull Request -
State: closed - Opened by kaiyux about 2 months ago
#4972 - fix: build_config in TorchLlmArgs and avoid invalid args
Pull Request -
State: open - Opened by Superjomn about 2 months ago
- 16 comments
#4971 - feat: Add non-streaming support for trtllm serve bench script & fixed prompt and output token length
Pull Request -
State: open - Opened by yizhang-nv about 2 months ago
- 33 comments
#4964 - [https://nvbugspro.nvidia.com/bug/5323820] Fix chunking equation for disabled case.
Pull Request -
State: closed - Opened by FrankD412 about 2 months ago
- 3 comments
#4963 - [fix] Fix incorrect get_spec_worker selection logic
Pull Request -
State: open - Opened by nv-yilinf about 2 months ago
- 9 comments
#4961 - Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs
Pull Request -
State: open - Opened by moraxu about 2 months ago
- 18 comments
#4955 - Add customized renormalized moe routing kernel for moe cutlass backend
Pull Request -
State: closed - Opened by ChristinaZ about 2 months ago
- 19 comments
#4954 - [nvbug 5325284][fix] Increase Nemotron-H warmup request robustness
Pull Request -
State: open - Opened by tomeras91 about 2 months ago
- 23 comments
#4951 - ci: [nvbugs/5280806] Unwaive unittests/_torch.
Pull Request -
State: closed - Opened by yuxianq about 2 months ago
- 27 comments
#4939 - infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build
Pull Request -
State: open - Opened by ZhanruiSunCh about 2 months ago
- 66 comments
#4936 - Raise shut down error for each request
Pull Request -
State: open - Opened by Shunkangz about 2 months ago
- 18 comments
#4930 - fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor.
Pull Request -
State: open - Opened by yuxianq about 2 months ago
- 6 comments
#4928 - chore: cleanup GDS Cmake interface
Pull Request -
State: open - Opened by achartier about 2 months ago
- 13 comments
#4924 - fix: trtllm-bench --dataset required=True
Pull Request -
State: open - Opened by jasonqinzhou about 2 months ago
- 5 comments
#4923 - Coalesce text diffs in streaming requests.
Pull Request -
State: open - Opened by pathorn about 2 months ago
- 3 comments
Labels: triaged, Community want to contribute, OpenAI API
#4918 - chore: Refactor apply_rope.
Pull Request -
State: closed - Opened by bobboli about 2 months ago
- 30 comments
#4917 - [Nvidia A10G + _torch flow]: No fused attention + OOM for 2048 context length
Issue -
State: open - Opened by michaelfeil about 2 months ago
- 2 comments
Labels: bug
#4915 - perf: Removing initializing ptuning buffers to zero
Pull Request -
State: open - Opened by pcastonguay about 2 months ago
- 28 comments
#4904 - [nvbug/5195657][fix] fix reset spec buffer and update mMaxAttentionWindowVec logic
Pull Request -
State: open - Opened by yweng0828 about 2 months ago
- 18 comments
#4903 - Fix support of system error
Pull Request -
State: open - Opened by Shunkangz about 2 months ago
#4900 - chore: partition LLM class into TorchLLM and TrtLLM
Pull Request -
State: open - Opened by Superjomn about 2 months ago
- 31 comments
#4893 - chore: Refine weight prefetching.
Pull Request -
State: closed - Opened by yuxianq about 2 months ago
- 27 comments
#4885 - [Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11
Pull Request -
State: open - Opened by EmmaQiaoCh about 2 months ago
- 125 comments
Labels: Release Blocker
#4883 - [TRTLLM-5644][infra] Update the community action to more appropriate api
Pull Request -
State: open - Opened by poweiw about 2 months ago
- 1 comment
#4879 - [nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len
Pull Request -
State: open - Opened by mikeiovine about 2 months ago
- 5 comments
#4877 - [TRTLLM-5518] doc: Adding disaggregated serving section to models doc
Pull Request -
State: closed - Opened by pcastonguay about 2 months ago
- 9 comments
#4875 - ReDrafter support for Qwen
Pull Request -
State: open - Opened by darraghdog about 2 months ago
#4872 - [TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner
Pull Request -
State: closed - Opened by DomBrown about 2 months ago
- 23 comments
#4870 - TRTLLM-5219[feat] W4A8 AWQ Support
Pull Request -
State: open - Opened by danielafrimi about 2 months ago
#4869 - [feat] Optimize KV Cache Reuse for MLA
Pull Request -
State: open - Opened by zhhuang-nv about 2 months ago
- 16 comments
#4867 - feat: Add w4a8_mxfp4_fp8 mode.
Pull Request -
State: open - Opened by Tracin about 2 months ago
- 22 comments
#4858 - [feat] Support XQA-based MLA on SM120
Pull Request -
State: open - Opened by jinyangyuan-nvidia about 2 months ago
- 3 comments
#4857 - Waive L0 test
Pull Request -
State: closed - Opened by yiqingy0 about 2 months ago
- 3 comments
#4856 - [feat] Support XQA-based MLA on SM120
Pull Request -
State: closed - Opened by jinyangyuan-nvidia about 2 months ago
#4855 - Disable cyclic kv cache chunk
Pull Request -
State: open - Opened by ming-wei about 2 months ago
- 1 comment
#4854 - Draft: Generalize Checkpoint Loading Logic
Pull Request -
State: closed - Opened by shaharmor98 about 2 months ago
#4853 - fix: fix cuda graph padding for spec decoding
Pull Request -
State: open - Opened by lfr-0531 about 2 months ago
- 5 comments
#4852 - Remove unnecessary duplicated tests from H100/H200 DGX pre/post-merge
Pull Request -
State: open - Opened by litaotju about 2 months ago
- 3 comments
#4851 - Replace memset with data initialization within kernels
Pull Request -
State: open - Opened by ChristinaZ about 2 months ago
- 3 comments
#4850 - [Infra] - Better utilize multi-GPU CI resources
Pull Request -
State: closed - Opened by chzblych about 2 months ago
- 3 comments
#4849 - shorten reqs in con:1 cases and add streaming cases, and add l2 perf …
Pull Request -
State: closed - Opened by ruodil about 2 months ago
#4848 - enh: Enable trtllm-bench to run LoRA PyT flow
Pull Request -
State: open - Opened by venkywonka about 2 months ago
- 20 comments
#4846 - [Doc] Fix readme for disaggregated serving
Pull Request -
State: open - Opened by arekay about 2 months ago
- 5 comments
#4845 - [nvbug 5283506] fix: Fix spec decode triton test
Pull Request -
State: closed - Opened by pcastonguay about 2 months ago
- 27 comments
#4843 - [nvbug/5314469][feat] Include the executor's max batch size in CUDA g…
Pull Request -
State: closed - Opened by mikeiovine about 2 months ago
- 25 comments
#4842 - Add pre-merge Triton backend tests
Pull Request -
State: closed - Opened by Tabrizian about 2 months ago
- 3 comments
#4840 - [Infra] - Reduce the default Pytest timeout and test Ubuntu mirrors
Pull Request -
State: closed - Opened by chzblych about 2 months ago
- 3 comments
#4839 - Update code owner list
Pull Request -
State: open - Opened by juney-nvidia about 2 months ago
- 3 comments
#4838 - Draft: test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv about 2 months ago
#4837 - Chore: refine prepre inputs method of model engine
Pull Request -
State: open - Opened by QiJune 2 months ago
- 10 comments
#4836 - feat: TRTLLM Sampler log probs support
Pull Request -
State: open - Opened by dcampora 2 months ago
- 5 comments
#4835 - Fix: NVBug 5302895
Pull Request -
State: open - Opened by Shixiaowei02 2 months ago
- 5 comments
#4834 - Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536
Pull Request -
State: closed - Opened by lfr-0531 2 months ago
- 3 comments
#4833 - Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379
Pull Request -
State: closed - Opened by lfr-0531 2 months ago
- 3 comments
#4831 - test
Pull Request -
State: open - Opened by yunruis 2 months ago
- 2 comments
#4830 - fix: [nvbugs/5298600] fix illegal memory access on mrope_position_deltas
Pull Request -
State: closed - Opened by yechank-nvidia 2 months ago
- 3 comments
#4828 - feat: port MakeDecodingBatchInputOutput to python in TRTLLMSampler
Pull Request -
State: open - Opened by dcampora 2 months ago
- 36 comments
#4827 - Fix trtllm-bench iter_stats and cuda_graph_batch_sizes error errors.
Pull Request -
State: open - Opened by qiaoxj07 2 months ago
- 8 comments
#4826 - chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM
Pull Request -
State: open - Opened by rosenrodt 2 months ago
- 11 comments
#4825 - Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model
Issue -
State: open - Opened by bebilli 2 months ago
- 1 comment
Labels: bug, triaged
#4822 - [TRTLLM-4923][feat] Paged mamba cache
Pull Request -
State: open - Opened by tomeras91 2 months ago
- 4 comments
#4819 - [TRTLLM-4987][feat] Support generation logits in TRTLLMSampler
Pull Request -
State: open - Opened by amitz-nv 2 months ago
- 3 comments
#4818 - feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4)
Pull Request -
State: open - Opened by dongxuy04 2 months ago
- 42 comments
#4816 - Driver crash during warmup of DeepSeek-R1-FP4
Issue -
State: open - Opened by pathorn 2 months ago
- 1 comment
Labels: bug
#4815 - The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32
Issue -
State: open - Opened by Alireza3242 2 months ago
- 2 comments
Labels: bug, triaged
#4814 - [Arch] Freeze model_config
Pull Request -
State: open - Opened by hlu1 2 months ago
- 18 comments
#4812 - [enhanchment] Add beam width to low latency.
Pull Request -
State: open - Opened by FrankD412 2 months ago
- 3 comments
#4809 - [fix] Fix llama 4 long context on Hopper
Pull Request -
State: open - Opened by mikeiovine 2 months ago
- 14 comments
#4807 - [nvbug/5280806][fix] Fix 2 model spec decode flow
Pull Request -
State: open - Opened by mikeiovine 2 months ago
- 10 comments
#4804 - [fix] Do not reuse dummy request KVCache
Pull Request -
State: open - Opened by liji-nv 2 months ago
- 29 comments
#4802 - [TRTLLM-4983] feat: enable overlap scheduler between draft forwards
Pull Request -
State: open - Opened by lfr-0531 2 months ago
- 3 comments
#4801 - test: remove invalid triton integration test cases
Pull Request -
State: closed - Opened by StanleySun639 2 months ago
- 3 comments
#4800 - how do the Qwen3-14B convert TensorRT engine?
Issue -
State: closed - Opened by w066650 2 months ago
- 3 comments
#4799 - feat: add HyperCLOVAX-SEED-Vision support in refactored way
Pull Request -
State: open - Opened by yechank-nvidia 2 months ago
#4798 - fix [nvbug5256044]: bench hang due to llmapi ipc
Pull Request -
State: closed - Opened by Superjomn 2 months ago
- 12 comments
#4796 - test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test
Pull Request -
State: closed - Opened by ruodil 2 months ago
#4795 - Waive l0 tests
Pull Request -
State: closed - Opened by yiqingy0 2 months ago
- 3 comments
#4794 - upgrade cutlass to 4.0
Pull Request -
State: closed - Opened by yunruis 2 months ago
- 3 comments
#4792 - feat: large-scale EP(part 7: DeepEP integration)
Pull Request -
State: open - Opened by yuantailing 2 months ago
- 16 comments
#4790 - [Architecture] Refactor FusedMoE
Pull Request -
State: closed - Opened by hlu1 2 months ago
- 27 comments
#4787 - Feature support: eagle multimodal inputs
Issue -
State: open - Opened by liyi-xia 2 months ago
- 2 comments
Labels: feature request
#4784 - fix: remove the accuracy assert on run_majority_vote_aime24.py #5340
Pull Request -
State: closed - Opened by WeiHaocheng 2 months ago
- 3 comments
#4781 - fix: correct the order of llm request state
Pull Request -
State: open - Opened by zhengd-nv 2 months ago
- 9 comments
#4778 - [feat] Implement model-agnostic one-engine eagle3
Pull Request -
State: open - Opened by nv-yilinf 2 months ago
- 8 comments
#4773 - [feat] Enable CUDA graphs for EAGLE3 two model implementation prefill stage
Pull Request -
State: closed - Opened by mikeiovine 2 months ago
- 4 comments
#4771 - [feat] Multi node support via Slurm
Pull Request -
State: open - Opened by yuanjingx87 2 months ago
- 57 comments
Labels: Release Blocker
#4768 - [DO NOT MERGE] Debug perf
Pull Request -
State: open - Opened by kaiyux 2 months ago
#4765 - feat: add heuristics for checkpoint files prefetching.
Pull Request -
State: closed - Opened by yuxianq 2 months ago
- 13 comments
#4764 - infra: upload imageTag info to artifactory and add ngc_staging to save ngc image
Pull Request -
State: open - Opened by ZhanruiSunCh 2 months ago
- 49 comments
#4763 - chore: remove request_error ipc in LLM.submit
Pull Request -
State: open - Opened by Superjomn 2 months ago
- 8 comments
#4762 - fix: refactor and fix mtp vanilla
Pull Request -
State: closed - Opened by lfr-0531 2 months ago
- 79 comments
#4757 - Refactor test timeout for individual long case
Pull Request -
State: open - Opened by EmmaQiaoCh 2 months ago
- 110 comments
#4756 - [TRTLLM-3927] [feat] Finalize + Allreduce + add + rmsnorm fusion
Pull Request -
State: open - Opened by zongfeijing 2 months ago
- 10 comments
#4754 - tests: Update gb200 test case
Pull Request -
State: open - Opened by yizhang-nv 2 months ago
- 32 comments