GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#6094 - [fix] Update jenkins container images
Pull Request -
State: open - Opened by ixlmar 9 days ago
#6091 - Deploying qwen2.5-1.5B VRAM in 3090 exploded
Issue -
State: open - Opened by Ulthunzy 9 days ago
#6090 - ucx establish connection with zmq
Pull Request -
State: open - Opened by chuangz0 9 days ago
#6089 - Fix FP8 blockwise scaling GEMM support on Blackwell
Pull Request -
State: closed - Opened by yuxianq 9 days ago
#6088 - Cherry Pick: PR #6076
Pull Request -
State: closed - Opened by ZhanruiSunCh 9 days ago
- 3 comments
#6087 - optimize: ADP schedule optimization
Pull Request -
State: open - Opened by yunruis 9 days ago
#6085 - [TRTLLM-6444][doc] Add some UCX trouble shooting docs
Pull Request -
State: open - Opened by reasonsolo 9 days ago
#6083 - [Whisper] add whisper support
Pull Request -
State: open - Opened by wu6u3tw 9 days ago
#6082 - [None] - Waive L0 tests
Pull Request -
State: open - Opened by yiqingy0 9 days ago
#6081 - Doc: Cherry pick https://github.com/NVIDIA/TensorRT-LLM/pull/5864 and https://github.com/NVIDIA/TensorRT-LLM/pull/5810 to release/0.21
Pull Request -
State: open - Opened by jiahanc 9 days ago
#6080 - feat: Add support for benchmarking individual gemms in MOE benchmark
Pull Request -
State: open - Opened by djns99 9 days ago
#6077 - [Draft] Inter-request kv cache manager support for HSTU
Pull Request -
State: open - Opened by geoffreyQiu 9 days ago
#6076 - [fix] Fix Triton build
Pull Request -
State: open - Opened by Tabrizian 9 days ago
#6075 - fix TMA error with GEMM+AR on TP=2
Pull Request -
State: open - Opened by xavier-nvidia 9 days ago
#6074 - No onboarding of blocks that are outside of attention window
Pull Request -
State: open - Opened by thorjohnsen 9 days ago
#6073 - [fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE
Pull Request -
State: open - Opened by shengliangxu 9 days ago
#6072 - Add documentation for eagle3+disagg+dynamo
Pull Request -
State: open - Opened by Tabrizian 9 days ago
#6069 - Correct handling of NVFP4 block scaling factors in preprocessing for MoE
Pull Request -
State: open - Opened by shengliangxu 9 days ago
#6066 - How to run cross attention with different number of q and kv tokens
Issue -
State: open - Opened by zhumakhan 9 days ago
#6065 - [Fix][Chore][Qwen3] fix bug of using fp4 on sm120
Pull Request -
State: open - Opened by byshiue 9 days ago
#6064 - feat: Remove padding in attention DP.
Pull Request -
State: open - Opened by bobboli 9 days ago
#6062 - fix: Add $HOME/.local/bin to PATH when running docker in local user mode
Pull Request -
State: closed - Opened by MartinMarciniszyn 9 days ago
- 3 comments
#6061 - optimize: ADP schedule optimization
Pull Request -
State: open - Opened by yunruis 9 days ago
#6060 - feat: support Vora(Vision as LoRA) model in TensorRT-LLM library
Pull Request -
State: open - Opened by effortprogrammer 10 days ago
#6059 - [feat] : Add FP8 context MLA support for SM120
Pull Request -
State: open - Opened by peaceh-nv 10 days ago
#6058 - chroe: upgrade modelopt to 0.33
Pull Request -
State: open - Opened by nv-guomingz 10 days ago
#6057 - chore: Update required CUDAToolkit version to 12.9 in CMakeLists.txt
Pull Request -
State: open - Opened by Funatiq 10 days ago
#6055 - refactor: Enhanced handling of decoder requests and logits within the batch manager
Pull Request -
State: open - Opened by Funatiq 10 days ago
#6053 - [fix] Move NCCL group in all-gather and reduce-scatter OPs outside the outer loop
Pull Request -
State: closed - Opened by jinyangyuan-nvidia 10 days ago
- 9 comments
#6051 - [nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill
Pull Request -
State: closed - Opened by crazydemo 10 days ago
- 6 comments
#6049 - add release notes for 0.21 release
Pull Request -
State: closed - Opened by QiJune 10 days ago
- 6 comments
#6048 - Fix: pad DeepEP fp4 recv tensors if empty
Pull Request -
State: closed - Opened by yuantailing 10 days ago
- 3 comments
#6006 - chore: Cleanup disable_fp4_allgather.
Pull Request -
State: closed - Opened by bobboli 11 days ago
- 12 comments
#6001 - [nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig
Pull Request -
State: closed - Opened by Superjomn 11 days ago
- 15 comments
#5982 - BlockManager copy constructor fix
Pull Request -
State: closed - Opened by tshmilnvidia 12 days ago
- 10 comments
Labels: Community want to contribute
#5962 - fix: Unable to load phi4-model with tp_size>1
Pull Request -
State: closed - Opened by Wanli-Jiang 14 days ago
- 10 comments
#5903 - feat: Add deepseek-lite tests for RTX pro 6000
Pull Request -
State: closed - Opened by peaceh-nv 15 days ago
- 13 comments
#5634 - [fix] Update to properly set cuda graphs in trtllm-bench overrides.
Pull Request -
State: open - Opened by FrankD412 24 days ago
- 2 comments
#5633 - How to implement structured JSON output using TensorRT-LLM in an isolated environment?
Issue -
State: open - Opened by PhamGiaMinh 24 days ago
#5632 - [fix] (benchmark) Correct file creation error in text_dataset_dump for paths without a directory
Pull Request -
State: open - Opened by SuperGoodGame 24 days ago
#5631 - [Infra] - Add some timeout and unwaive a test which dev fixed
Pull Request -
State: open - Opened by EmmaQiaoCh 24 days ago
- 2 comments
#5630 - [Github] Action to Auto-assign PR Reviewers (that respects CODEOWNERS.md and overrides)
Pull Request -
State: open - Opened by venkywonka 24 days ago
#5629 - chores: [TRTLLM-6072] 1.0 LLMAPI doc updates
Pull Request -
State: closed - Opened by hchings 24 days ago
- 3 comments
#5628 - [Test] Update transformers to 4.53.0
Pull Request -
State: open - Opened by hlu1 24 days ago
- 9 comments
#5627 - [feat] Implement pytorch sampler for MTP
Pull Request -
State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement
#5626 - [test] [mock] touch various files and check if correctly auto-assigns reviewers
Pull Request -
State: open - Opened by venkywonka 24 days ago
#5625 - [https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking calculation.
Pull Request -
State: open - Opened by FrankD412 24 days ago
- 3 comments
#5624 - [https://nvbugs/5318059][test] Unwaive test
Pull Request -
State: open - Opened by pamelap-nvidia 24 days ago
- 3 comments
#5623 - [fix] Ignore context for presence and frequency penalties. Matches calculation in vllm.
Pull Request -
State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement
#5622 - [feat] Compatibility with other LLM engines: Support negative seed and top_k=-1
Pull Request -
State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement
#5621 - [TRTLLM-3576][fix] Raise exception when exceeds input tokens and clamp max tokens
Pull Request -
State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement
#5620 - [fix] fix log probs and add for mtp and completion requests
Pull Request -
State: open - Opened by pathorn 24 days ago
Labels: Community want to contribute, Community Engagement
#5619 - fix: draft tokens `TorchSampler` fast path
Pull Request -
State: closed - Opened by netanel-haber 24 days ago
- 3 comments
#5618 - AutoDeploy graph capture seems to fail when invoked with large batch sizes.
Issue -
State: open - Opened by suyoggupta 24 days ago
#5617 - refactor: Clean up DecodingInput and DecodingOutput
Pull Request -
State: open - Opened by Funatiq 24 days ago
- 3 comments
#5616 - [TRTLLM-5826][feat] Support pytorch LoRA adapter eviction
Pull Request -
State: open - Opened by amitz-nv 24 days ago
#5615 - [TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow
Pull Request -
State: open - Opened by DylanChen-NV 25 days ago
#5614 - Chunked Experts
Pull Request -
State: open - Opened by zongfeijing 25 days ago
#5613 - [Draft] feat: fuse w4a8 moe pre-quant scale on Hopper
Pull Request -
State: open - Opened by xiaoweiw-nv 25 days ago
- 3 comments
#5611 - feat: use session abstraction in data transceiver and cache formatter
Pull Request -
State: open - Opened by zhengd-nv 25 days ago
- 8 comments
#5610 - chore: enhance yaml loading arbitrary options in LlmArgs
Pull Request -
State: open - Opened by Superjomn 25 days ago
- 3 comments
#5609 - fix:https://nvbugs/5362398
Pull Request -
State: closed - Opened by nv-guomingz 25 days ago
- 9 comments
#5608 - fix [nvbug5351244]: test_mpi_session submit sync/async
Pull Request -
State: closed - Opened by Superjomn 25 days ago
- 3 comments
#5607 - ci: add failing test
Pull Request -
State: open - Opened by Funatiq 25 days ago
- 9 comments
#5606 - fix: [https://nvbugspro.nvidia.com/bug/5345215] Unwaive for bug 5345215.
Pull Request -
State: open - Opened by bobboli 25 days ago
- 9 comments
#5605 - [TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490)
Pull Request -
State: closed - Opened by ixlmar 25 days ago
- 6 comments
#5604 - [ci] move eagle1 and medusa tests to post-merge
Pull Request -
State: closed - Opened by omera-nv 25 days ago
- 3 comments
#5603 - [fix][ci] missing class names in post-merge test reports
Pull Request -
State: closed - Opened by omera-nv 25 days ago
- 21 comments
#5595 - chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs
Pull Request -
State: closed - Opened by Superjomn 25 days ago
- 6 comments
#5589 - [TRTLLM-5930] doc:refactor doc structure for 1.0 release
Pull Request -
State: open - Opened by nv-guomingz 25 days ago
#5585 - chore: remove cuda_graph_ prefix from cuda_graph_config filed members.
Pull Request -
State: closed - Opened by nv-guomingz 25 days ago
- 12 comments
#5582 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: open - Opened by xinhe-nv 25 days ago
- 5 comments
#5580 - feat: KV events for sliding window attention
Pull Request -
State: open - Opened by jthomson04 25 days ago
- 6 comments
#5577 - refactor: Remove IGptDecoderBatched interface
Pull Request -
State: open - Opened by Funatiq 25 days ago
- 15 comments
#5576 - refactor: Improve lookahead decoding interfaces
Pull Request -
State: open - Opened by Funatiq 25 days ago
- 6 comments
#5574 - perf: Use tokenizers API to optimize incremental detokenization perf
Pull Request -
State: open - Opened by kaiyux 26 days ago
- 12 comments
#5572 - test: [CI] remove closed bugs
Pull Request -
State: closed - Opened by xinhe-nv 26 days ago
- 6 comments
#5570 - [TRTLLM-5331] large-scale EP: perf - Replace allgaher with AllToAllPrepare
Pull Request -
State: closed - Opened by WeiHaocheng 26 days ago
- 3 comments
#5569 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv 26 days ago
- 3 comments
#5564 - Investigate Gemma3 1B discrepancy
Pull Request -
State: open - Opened by brb-nv 27 days ago
#5563 - Fix GEMM+AR fusion on blackwell
Pull Request -
State: open - Opened by xavier-nvidia 27 days ago
- 5 comments
#5562 - test: Deprecate gpt_model_type "v1" static batching from triton_backe…
Pull Request -
State: closed - Opened by mc-nv 27 days ago
- 12 comments
Labels: Community want to contribute
#5562 - test: Deprecate gpt_model_type "v1" static batching from triton_backe…
Pull Request -
State: open - Opened by mc-nv 27 days ago
- 3 comments
#5561 - Implement --served_model_name and improve command line parsing
Pull Request -
State: open - Opened by pathorn 27 days ago
#5560 - [TRTLLM-4926][feat] Reimplement metrics endpoint with stats about requests
Pull Request -
State: open - Opened by pathorn 27 days ago
#5559 - [fix] Use decorator for request cancelation and handle CancelledError
Pull Request -
State: open - Opened by pathorn 27 days ago
#5558 - [nvbug/5337601][fix] Fix disagg + speculative decoding
Pull Request -
State: open - Opened by Tabrizian 27 days ago
#5557 - refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code
Pull Request -
State: closed - Opened by limin2021 27 days ago
- 9 comments
#5555 - [AutoDeploy] Enhance checkpoint loading pipeline
Issue -
State: open - Opened by Fridah-nv 27 days ago
Labels: AutoDeploy
#5554 - [TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend
Pull Request -
State: closed - Opened by xuanzic 27 days ago
- 15 comments
#5552 - [feat] Support MXFP4 x BF16 Grouped GEMM in FusedMoE Pytorch Module
Pull Request -
State: open - Opened by jinyangyuan-nvidia 27 days ago
- 14 comments
#5551 - feat: Improve dev container tagging
Pull Request -
State: open - Opened by ixlmar 28 days ago
- 31 comments
#5549 - tests: add test_chunked_prefill for llama4
Pull Request -
State: open - Opened by xinhe-nv 28 days ago
#5544 - rcca: test default kv_cache_reuse option for pytorch multimodal
Pull Request -
State: closed - Opened by StanleySun639 28 days ago
- 16 comments
#5535 - feat: Add support for MXFP8xMXFP4 in pytorch
Pull Request -
State: open - Opened by djns99 28 days ago
#5534 - Refactor: move DeepEP from Docker images to wheel building
Pull Request -
State: open - Opened by yuantailing 28 days ago
- 17 comments
#5530 - [enh] [GH/CI] [WIP] [TEST] Auto-assign PR reviewers using module-owners information randomly
Pull Request -
State: open - Opened by venkywonka 28 days ago
#5529 - feat(models): Mistral3.1 VLM pytorch backend support
Pull Request -
State: open - Opened by 2ez4bz 28 days ago
#5527 - [nvbugs/5302040] feat. Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100)
Pull Request -
State: open - Opened by wu6u3tw 28 days ago
- 10 comments
#5524 - [TRTLLM-5366][feat]Add support for sm121
Pull Request -
State: open - Opened by pamelap-nvidia 28 days ago
- 22 comments