NVIDIA/TensorRT-LLM issues and pull requests

#6094 - [fix] Update jenkins container images

Pull Request - State: open - Opened by ixlmar 9 days ago

#6091 - Deploying qwen2.5-1.5B VRAM in 3090 exploded

Issue - State: open - Opened by Ulthunzy 9 days ago

#6090 - ucx establish connection with zmq

Pull Request - State: open - Opened by chuangz0 9 days ago

#6089 - Fix FP8 blockwise scaling GEMM support on Blackwell

Pull Request - State: closed - Opened by yuxianq 9 days ago

#6088 - Cherry Pick: PR #6076

Pull Request - State: closed - Opened by ZhanruiSunCh 9 days ago - 3 comments

#6087 - optimize: ADP schedule optimization

Pull Request - State: open - Opened by yunruis 9 days ago

#6085 - [TRTLLM-6444][doc] Add some UCX trouble shooting docs

Pull Request - State: open - Opened by reasonsolo 9 days ago

#6083 - [Whisper] add whisper support

Pull Request - State: open - Opened by wu6u3tw 9 days ago

#6082 - [None] - Waive L0 tests

Pull Request - State: open - Opened by yiqingy0 9 days ago

#6081 - Doc: Cherry pick https://github.com/NVIDIA/TensorRT-LLM/pull/5864 and https://github.com/NVIDIA/TensorRT-LLM/pull/5810 to release/0.21

Pull Request - State: open - Opened by jiahanc 9 days ago

#6080 - feat: Add support for benchmarking individual gemms in MOE benchmark

Pull Request - State: open - Opened by djns99 9 days ago

#6077 - [Draft] Inter-request kv cache manager support for HSTU

Pull Request - State: open - Opened by geoffreyQiu 9 days ago

#6076 - [fix] Fix Triton build

Pull Request - State: open - Opened by Tabrizian 9 days ago

#6075 - fix TMA error with GEMM+AR on TP=2

Pull Request - State: open - Opened by xavier-nvidia 9 days ago

#6074 - No onboarding of blocks that are outside of attention window

Pull Request - State: open - Opened by thorjohnsen 9 days ago

#6073 - [fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE

Pull Request - State: open - Opened by shengliangxu 9 days ago

#6072 - Add documentation for eagle3+disagg+dynamo

Pull Request - State: open - Opened by Tabrizian 9 days ago

#6069 - Correct handling of NVFP4 block scaling factors in preprocessing for MoE

Pull Request - State: open - Opened by shengliangxu 9 days ago

#6066 - How to run cross attention with different number of q and kv tokens

Issue - State: open - Opened by zhumakhan 9 days ago

#6065 - [Fix][Chore][Qwen3] fix bug of using fp4 on sm120

Pull Request - State: open - Opened by byshiue 9 days ago

#6064 - feat: Remove padding in attention DP.

Pull Request - State: open - Opened by bobboli 9 days ago

#6062 - fix: Add $HOME/.local/bin to PATH when running docker in local user mode

Pull Request - State: closed - Opened by MartinMarciniszyn 9 days ago - 3 comments

#6061 - optimize: ADP schedule optimization

Pull Request - State: open - Opened by yunruis 9 days ago

#6060 - feat: support Vora(Vision as LoRA) model in TensorRT-LLM library

Pull Request - State: open - Opened by effortprogrammer 10 days ago

#6059 - [feat] : Add FP8 context MLA support for SM120

Pull Request - State: open - Opened by peaceh-nv 10 days ago

#6058 - chroe: upgrade modelopt to 0.33

Pull Request - State: open - Opened by nv-guomingz 10 days ago

#6057 - chore: Update required CUDAToolkit version to 12.9 in CMakeLists.txt

Pull Request - State: open - Opened by Funatiq 10 days ago

#6055 - refactor: Enhanced handling of decoder requests and logits within the batch manager

Pull Request - State: open - Opened by Funatiq 10 days ago

#6053 - [fix] Move NCCL group in all-gather and reduce-scatter OPs outside the outer loop

Pull Request - State: closed - Opened by jinyangyuan-nvidia 10 days ago - 9 comments

#6051 - [nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill

Pull Request - State: closed - Opened by crazydemo 10 days ago - 6 comments

#6049 - add release notes for 0.21 release

Pull Request - State: closed - Opened by QiJune 10 days ago - 6 comments

#6048 - Fix: pad DeepEP fp4 recv tensors if empty

Pull Request - State: closed - Opened by yuantailing 10 days ago - 3 comments

#6006 - chore: Cleanup disable_fp4_allgather.

Pull Request - State: closed - Opened by bobboli 11 days ago - 12 comments

#6001 - [nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig

Pull Request - State: closed - Opened by Superjomn 11 days ago - 15 comments

#5982 - BlockManager copy constructor fix

Pull Request - State: closed - Opened by tshmilnvidia 12 days ago - 10 comments
Labels: Community want to contribute

#5962 - fix: Unable to load phi4-model with tp_size>1

Pull Request - State: closed - Opened by Wanli-Jiang 14 days ago - 10 comments

#5903 - feat: Add deepseek-lite tests for RTX pro 6000

Pull Request - State: closed - Opened by peaceh-nv 15 days ago - 13 comments

#5634 - [fix] Update to properly set cuda graphs in trtllm-bench overrides.

Pull Request - State: open - Opened by FrankD412 24 days ago - 2 comments

#5633 - How to implement structured JSON output using TensorRT-LLM in an isolated environment?

Issue - State: open - Opened by PhamGiaMinh 24 days ago

#5632 - [fix] (benchmark) Correct file creation error in text_dataset_dump for paths without a directory

Pull Request - State: open - Opened by SuperGoodGame 24 days ago

#5631 - [Infra] - Add some timeout and unwaive a test which dev fixed

Pull Request - State: open - Opened by EmmaQiaoCh 24 days ago - 2 comments

#5630 - [Github] Action to Auto-assign PR Reviewers (that respects CODEOWNERS.md and overrides)

Pull Request - State: open - Opened by venkywonka 24 days ago

#5629 - chores: [TRTLLM-6072] 1.0 LLMAPI doc updates

Pull Request - State: closed - Opened by hchings 24 days ago - 3 comments

#5628 - [Test] Update transformers to 4.53.0

Pull Request - State: open - Opened by hlu1 24 days ago - 9 comments

#5627 - [feat] Implement pytorch sampler for MTP

Pull Request - State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement

#5626 - [test] [mock] touch various files and check if correctly auto-assigns reviewers

Pull Request - State: open - Opened by venkywonka 24 days ago

#5625 - [https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking calculation.

Pull Request - State: open - Opened by FrankD412 24 days ago - 3 comments

#5624 - [https://nvbugs/5318059][test] Unwaive test

Pull Request - State: open - Opened by pamelap-nvidia 24 days ago - 3 comments

#5623 - [fix] Ignore context for presence and frequency penalties. Matches calculation in vllm.

Pull Request - State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement

#5622 - [feat] Compatibility with other LLM engines: Support negative seed and top_k=-1

Pull Request - State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement

#5621 - [TRTLLM-3576][fix] Raise exception when exceeds input tokens and clamp max tokens

Pull Request - State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement

#5620 - [fix] fix log probs and add for mtp and completion requests

Pull Request - State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement

#5619 - fix: draft tokens `TorchSampler` fast path

Pull Request - State: closed - Opened by netanel-haber 24 days ago - 3 comments

#5618 - AutoDeploy graph capture seems to fail when invoked with large batch sizes.

Issue - State: open - Opened by suyoggupta 24 days ago

#5617 - refactor: Clean up DecodingInput and DecodingOutput

Pull Request - State: open - Opened by Funatiq 24 days ago - 3 comments

#5616 - [TRTLLM-5826][feat] Support pytorch LoRA adapter eviction

Pull Request - State: open - Opened by amitz-nv 24 days ago

#5615 - [TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow

Pull Request - State: open - Opened by DylanChen-NV 25 days ago

#5614 - Chunked Experts

Pull Request - State: open - Opened by zongfeijing 25 days ago

#5613 - [Draft] feat: fuse w4a8 moe pre-quant scale on Hopper

Pull Request - State: open - Opened by xiaoweiw-nv 25 days ago - 3 comments

#5611 - feat: use session abstraction in data transceiver and cache formatter

Pull Request - State: open - Opened by zhengd-nv 25 days ago - 8 comments

#5610 - chore: enhance yaml loading arbitrary options in LlmArgs

Pull Request - State: open - Opened by Superjomn 25 days ago - 3 comments

#5609 - fix:https://nvbugs/5362398

Pull Request - State: closed - Opened by nv-guomingz 25 days ago - 9 comments

#5608 - fix [nvbug5351244]: test_mpi_session submit sync/async

Pull Request - State: closed - Opened by Superjomn 25 days ago - 3 comments

#5607 - ci: add failing test

Pull Request - State: open - Opened by Funatiq 25 days ago - 9 comments

#5606 - fix: [https://nvbugspro.nvidia.com/bug/5345215] Unwaive for bug 5345215.

Pull Request - State: open - Opened by bobboli 25 days ago - 9 comments

#5605 - [TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490)

Pull Request - State: closed - Opened by ixlmar 25 days ago - 6 comments

#5604 - [ci] move eagle1 and medusa tests to post-merge

Pull Request - State: closed - Opened by omera-nv 25 days ago - 3 comments

#5603 - [fix][ci] missing class names in post-merge test reports

Pull Request - State: closed - Opened by omera-nv 25 days ago - 21 comments

#5595 - chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs

Pull Request - State: closed - Opened by Superjomn 25 days ago - 6 comments

#5589 - [TRTLLM-5930] doc:refactor doc structure for 1.0 release

Pull Request - State: open - Opened by nv-guomingz 25 days ago

#5585 - chore: remove cuda_graph_ prefix from cuda_graph_config filed members.

Pull Request - State: closed - Opened by nv-guomingz 25 days ago - 12 comments

#5582 - test: [CI] Add failed cases into waives.txt

Pull Request - State: open - Opened by xinhe-nv 25 days ago - 5 comments

#5580 - feat: KV events for sliding window attention

Pull Request - State: open - Opened by jthomson04 25 days ago - 6 comments

#5577 - refactor: Remove IGptDecoderBatched interface

Pull Request - State: open - Opened by Funatiq 25 days ago - 15 comments

#5576 - refactor: Improve lookahead decoding interfaces

Pull Request - State: open - Opened by Funatiq 25 days ago - 6 comments

#5574 - perf: Use tokenizers API to optimize incremental detokenization perf

Pull Request - State: open - Opened by kaiyux 26 days ago - 12 comments

#5572 - test: [CI] remove closed bugs

Pull Request - State: closed - Opened by xinhe-nv 26 days ago - 6 comments

#5570 - [TRTLLM-5331] large-scale EP: perf - Replace allgaher with AllToAllPrepare

Pull Request - State: closed - Opened by WeiHaocheng 26 days ago - 3 comments

#5569 - test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv 26 days ago - 3 comments

#5564 - Investigate Gemma3 1B discrepancy

Pull Request - State: open - Opened by brb-nv 27 days ago

#5563 - Fix GEMM+AR fusion on blackwell

Pull Request - State: open - Opened by xavier-nvidia 27 days ago - 5 comments

#5562 - test: Deprecate gpt_model_type "v1" static batching from triton_backe…

Pull Request - State: closed - Opened by mc-nv 27 days ago - 12 comments
Labels: Community want to contribute

#5562 - test: Deprecate gpt_model_type "v1" static batching from triton_backe…

Pull Request - State: open - Opened by mc-nv 27 days ago - 3 comments

#5561 - Implement --served_model_name and improve command line parsing

Pull Request - State: open - Opened by pathorn 27 days ago

#5560 - [TRTLLM-4926][feat] Reimplement metrics endpoint with stats about requests

Pull Request - State: open - Opened by pathorn 27 days ago

#5559 - [fix] Use decorator for request cancelation and handle CancelledError

Pull Request - State: open - Opened by pathorn 27 days ago

#5558 - [nvbug/5337601][fix] Fix disagg + speculative decoding

Pull Request - State: open - Opened by Tabrizian 27 days ago

#5557 - refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code

Pull Request - State: closed - Opened by limin2021 27 days ago - 9 comments

#5555 - [AutoDeploy] Enhance checkpoint loading pipeline

Issue - State: open - Opened by Fridah-nv 27 days ago
Labels: AutoDeploy

#5554 - [TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend

Pull Request - State: closed - Opened by xuanzic 27 days ago - 15 comments

#5552 - [feat] Support MXFP4 x BF16 Grouped GEMM in FusedMoE Pytorch Module

Pull Request - State: open - Opened by jinyangyuan-nvidia 27 days ago - 14 comments

#5551 - feat: Improve dev container tagging

Pull Request - State: open - Opened by ixlmar 28 days ago - 31 comments

#5549 - tests: add test_chunked_prefill for llama4

Pull Request - State: open - Opened by xinhe-nv 28 days ago

#5544 - rcca: test default kv_cache_reuse option for pytorch multimodal

Pull Request - State: closed - Opened by StanleySun639 28 days ago - 16 comments

#5535 - feat: Add support for MXFP8xMXFP4 in pytorch

Pull Request - State: open - Opened by djns99 28 days ago

#5534 - Refactor: move DeepEP from Docker images to wheel building

Pull Request - State: open - Opened by yuantailing 28 days ago - 17 comments

#5530 - [enh] [GH/CI] [WIP] [TEST] Auto-assign PR reviewers using module-owners information randomly

Pull Request - State: open - Opened by venkywonka 28 days ago

#5529 - feat(models): Mistral3.1 VLM pytorch backend support

Pull Request - State: open - Opened by 2ez4bz 28 days ago

#5527 - [nvbugs/5302040] feat. Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100)

Pull Request - State: open - Opened by wu6u3tw 28 days ago - 10 comments

#5524 - [TRTLLM-5366][feat]Add support for sm121

Pull Request - State: open - Opened by pamelap-nvidia 28 days ago - 22 comments

GitHub / NVIDIA/TensorRT-LLM issues and pull requests