GitHub / NVIDIA/TensorRT-LLM issues and pull requests

#4753 - tests: [TRTQA-2905] improve timeout report for qa test cases

Pull Request - State: closed - Opened by crazydemo about 2 months ago - 3 comments

#4749 - feat: cache reuse support (selective cache transfer) in mla cache formatter

Pull Request - State: open - Opened by zhengd-nv about 2 months ago - 8 comments

#4738 - fix: make BaseLlmArgs fail on unsupported args

Pull Request - State: open - Opened by ixlmar about 2 months ago - 19 comments

#4737 - [feat] Enable NVFP4 output for TRTLLM attention kernels

Pull Request - State: closed - Opened by Tom-Zheng about 2 months ago - 17 comments

#4735 - [nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding

Pull Request - State: closed - Opened by Funatiq about 2 months ago - 21 comments

#4727 - fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances

Pull Request - State: open - Opened by Superjomn about 2 months ago - 18 comments

#4723 - test CI

Pull Request - State: closed - Opened by qsang-nv about 2 months ago - 12 comments

#4700 - refactor: Separate DecoderState from GptDecoderBatched

Pull Request - State: closed - Opened by Funatiq 2 months ago - 12 comments

#4694 - use cu for fmha_v2

Pull Request - State: open - Opened by qsang-nv 2 months ago - 27 comments

#4693 - [https://nvbugspro.nvidia.com/bug/5300080] Fix the bug of setting attention_chunk_size and enable chunked-attention in the generation-phase by default

Pull Request - State: open - Opened by PerkzZheng 2 months ago - 30 comments

#4692 - Refactor the first token response in PD

Pull Request - State: open - Opened by Shunkangz 2 months ago - 15 comments

#4690 - fix: handle OOMs during KV cache estimation

Pull Request - State: open - Opened by ixlmar 2 months ago - 14 comments

#4685 - tests: fix 5273697

Pull Request - State: open - Opened by xinhe-nv 2 months ago

#4667 - Solve underallocation in VSWA+/VGQA

Pull Request - State: open - Opened by netanel-haber 2 months ago - 70 comments

#4664 - Chore: only pad one dummy request for attention dp scenario

Pull Request - State: open - Opened by QiJune 2 months ago - 2 comments

#4663 - Add files into scan ignoreList

Pull Request - State: open - Opened by yiqingy0 2 months ago

#4662 - [NVBUG 5301980] Fix fp4 gemm padding.

Pull Request - State: open - Opened by Tracin 2 months ago - 2 comments

#4661 - fix fmha v2 tests

Pull Request - State: open - Opened by qsang-nv 2 months ago - 3 comments

#4660 - Use runtime total gpu memory to calculate kv cache memory and log more memory information

Pull Request - State: open - Opened by HuiGao-NV 2 months ago - 3 comments

#4659 - fix: fmha_v2 compilation

Pull Request - State: open - Opened by PerkzZheng 2 months ago - 1 comment

#4658 - [https://nvbugs/5294983][fix] unwaive TestDeepSeekV3Lite::test_fp8_block_scales_4gpus

Pull Request - State: open - Opened by zhhuang-nv 2 months ago - 6 comments

#4657 - [5289904] chore: Unwaive test for Qwen model.

Pull Request - State: open - Opened by hyukn 2 months ago - 1 comment

#4656 - infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image)

Pull Request - State: open - Opened by ZhanruiSunCh 2 months ago - 103 comments

#4655 - [fix] Fix SamplingParams check on n and best_of

Pull Request - State: open - Opened by syuoni 2 months ago - 5 comments

#4654 - [TRTLLM-5327] - Fix guardwords scan step

Pull Request - State: closed - Opened by yiqingy0 2 months ago - 3 comments

#4653 - Support for Devstral with pytorch backend

Issue - State: open - Opened by ankitmaurya001 2 months ago

#4652 - User/nvpohanh/pull out to min latency py

Pull Request - State: open - Opened by nvpohanh 2 months ago

#4651 - feat: chunked prefill for MLA (Blackwell)

Pull Request - State: open - Opened by jmydurant 2 months ago - 38 comments

#4650 - fix nvbug 5302895

Pull Request - State: open - Opened by chuangz0 2 months ago - 2 comments

#4649 - Chore: introduce RequestQueueItem class instead of using tuple

Pull Request - State: closed - Opened by QiJune 2 months ago - 3 comments

#4648 - Fix handle cancel request for attentionDP

Pull Request - State: open - Opened by Shunkangz 2 months ago - 3 comments

#4647 - [fix] Unwaive torch compile tests

Pull Request - State: open - Opened by liji-nv 2 months ago - 3 comments

#4646 - fix disagg config params

Pull Request - State: open - Opened by chuangz0 2 months ago - 4 comments

#4645 - Waive L0 tests

Pull Request - State: closed - Opened by yiqingy0 2 months ago - 3 comments

#4644 - tests: unwaive deepseek case

Pull Request - State: open - Opened by crazydemo 2 months ago

#4643 - feat: update DeepSeek FP8 TRT-LLM Gen cubins

Pull Request - State: open - Opened by nekorobov 2 months ago - 27 comments

#4635 - Use backend to replace macro to control enablement of MNNVL all reduce

Pull Request - State: open - Opened by HuiGao-NV 2 months ago - 77 comments

#4630 - [TRTLLM-4971]: Use safe deserialization in ParallelConfig

Pull Request - State: open - Opened by yibinl-nvidia 2 months ago - 47 comments

#4623 - [TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend

Pull Request - State: open - Opened by LinPoly 2 months ago - 8 comments

#4621 - [nvbugs/5274894] fix: Sort requests for functional correctness and performance (adapted from #4608)

Pull Request - State: closed - Opened by Funatiq 2 months ago - 18 comments

#4617 - [nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router

Pull Request - State: closed - Opened by Funatiq 2 months ago - 6 comments

#4615 - feat: large-scale EP(part 4: Static EP load balancer integration)

Pull Request - State: closed - Opened by syuoni 2 months ago - 11 comments

#4614 - Chore: refine shutdown signal of PyExecutor

Pull Request - State: closed - Opened by QiJune 2 months ago - 9 comments

#4613 - fix: test trtllm-bench mgmn

Pull Request - State: open - Opened by Superjomn 2 months ago - 5 comments

#4612 - Update fmha v2 and switch to cu

Pull Request - State: open - Opened by qsang-nv 2 months ago - 15 comments

#4611 - feat: Integration of Fused QKNorm+RoPE.

Pull Request - State: open - Opened by bobboli 2 months ago - 6 comments

#4609 - Waive L0 test

Pull Request - State: closed - Opened by yiqingy0 2 months ago - 3 comments

#4607 - chore: sort llm request state enums in chronological order

Pull Request - State: closed - Opened by zhengd-nv 2 months ago - 6 comments

#4605 - [test] Unwaive testcases

Pull Request - State: open - Opened by zongfeijing 2 months ago - 12 comments

#4603 - chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs

Pull Request - State: closed - Opened by Superjomn 2 months ago - 32 comments

#4602 - [TRTLLM-5327] - Add scan stage

Pull Request - State: closed - Opened by yiqingy0 2 months ago - 8 comments

#4600 - fix: build_config in TorchLlmArgs and avoid invalid args

Pull Request - State: open - Opened by Superjomn 2 months ago - 6 comments

#4597 - fix: random fail of cache router test

Pull Request - State: open - Opened by zhengd-nv 2 months ago - 21 comments

#4594 - [TRTLLM-4647][fix] Fix the no fusion allreduce hanging

Pull Request - State: open - Opened by timlee0212 2 months ago - 8 comments

#4593 - [AutoDeploy] Arch2: Model Support: VLM, Long-Context, and Linear Attention

Issue - State: open - Opened by sugunav14 2 months ago
Labels: AutoDeploy

#4583 - [fix] Make add_special_tokens to false for completions API

Pull Request - State: open - Opened by Pernekhan 2 months ago - 6 comments
Labels: triaged, Community want to contribute, OpenAI API

#4581 - chore: introduce KvCacheCreator

Pull Request - State: open - Opened by ixlmar 2 months ago - 44 comments

#4577 - Release 0.20 to main

Pull Request - State: open - Opened by amirkl94 2 months ago - 24 comments

#4569 - [TRTLLM-5000][feat] NGrams V2

Pull Request - State: open - Opened by wili-65535 2 months ago - 10 comments

#4561 - fix: fix dsr1 min lat cga ar rate drop(0.2)

Pull Request - State: open - Opened by yunruis 2 months ago - 6 comments

#4560 - Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL

Pull Request - State: open - Opened by yunruis 2 months ago - 8 comments

#4558 - [TRTLLM-3456] Speculation: Draft Target in new FW

Pull Request - State: closed - Opened by IzzyPutterman 2 months ago - 41 comments

#4557 - Cherry-pick feat/llama4's updates

Pull Request - State: closed - Opened by nvpohanh 2 months ago

#4552 - [Infra]Remove some old keyword

Pull Request - State: open - Opened by EmmaQiaoCh 2 months ago - 11 comments

#4550 - fix: llmapi-launch add add trtllm-bench test with engine building (#4…

Pull Request - State: open - Opened by Superjomn 2 months ago - 6 comments

#4549 - test: [CI] Add failed cases into waives.txt

Pull Request - State: closed - Opened by xinhe-nv 2 months ago - 3 comments

#4543 - chore: update transformers version to 4.52.1

Pull Request - State: open - Opened by nv-guomingz 2 months ago - 15 comments

#4541 - fix: Move cv2 import to load_video function

Pull Request - State: closed - Opened by Funatiq 2 months ago - 6 comments

#4536 - [https://nvbugs/5271281][fix] fix a pd+mtp accuracy issue

Pull Request - State: closed - Opened by lfr-0531 2 months ago - 34 comments

#4533 - [DON'T MERGE] NGram V2 draft

Pull Request - State: closed - Opened by wili-65535 2 months ago - 7 comments

#4532 - fix: max_num_sequences calculation with overlap scheduling

Pull Request - State: closed - Opened by Funatiq 2 months ago - 35 comments

#4529 - fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu

Pull Request - State: open - Opened by Superjomn 2 months ago - 9 comments

#4525 - feat: better build_wheel.py venv handling

Pull Request - State: closed - Opened by tongyuantongyu 2 months ago - 15 comments

#4522 - test: conditional disagg and cache aware balancing for deepseek v3

Pull Request - State: open - Opened by zhengd-nv 2 months ago - 78 comments

#4514 - feat: Skip sampler for intermediate pp stages.

Pull Request - State: closed - Opened by yuxianq 2 months ago - 27 comments

#4510 - test: rcca https://nvbugs/5223130

Pull Request - State: open - Opened by xinhe-nv 2 months ago - 10 comments

#4498 - fix: Handle additional model outputs based on pipeline parallel rank

Pull Request - State: closed - Opened by Funatiq 2 months ago - 9 comments

#4497 - feat: forward exceptions to Python and catch OOMs

Pull Request - State: open - Opened by ixlmar 2 months ago - 29 comments

#4494 - [TRTLLM-4783][feat] Mamba2 kernel updates for Nemotron-H

Pull Request - State: open - Opened by tomeras91 2 months ago - 26 comments

#4492 - fix: Remove duplicate tokenization in generation server

Pull Request - State: closed - Opened by Shunkangz 2 months ago - 13 comments

#4480 - [TEST] Pytest skip for Llama4 tests fix

Pull Request - State: open - Opened by EmmaQiaoCh 2 months ago - 5 comments

#4467 - [feat] Piecewise cuda graph support for MLA

Pull Request - State: open - Opened by liji-nv 2 months ago - 15 comments

#4466 - feat: Enhance AutoTuner inference path and code readability

Pull Request - State: open - Opened by hyukn 2 months ago - 50 comments

#4464 - draft: KV Cache GPUDirect Storage

Pull Request - State: open - Opened by achartier 2 months ago - 36 comments

#4458 - No module named 'tensorrt_llm.bindings

Issue - State: open - Opened by Shegun93 2 months ago - 3 comments
Labels: triaged, Installation, Investigating

#4454 - [Infra] - Multi-GPU testing support with Slurm

Pull Request - State: open - Opened by yuanjingx87 2 months ago - 72 comments

#4450 - refactor: Update decoder buffer and logits management

Pull Request - State: open - Opened by Funatiq 2 months ago - 55 comments

#4436 - Feature flux

Pull Request - State: open - Opened by forrestl111 2 months ago - 1 comment

#4406 - [AutoDeploy] Refactor AutoDeploy torch custom op to attach `auto_deploy` prefix to the op namespace

Issue - State: open - Opened by suyoggupta 2 months ago
Labels: AutoDeploy

#4401 - feature: make trtllmsampler new_tokens format the universal format

Pull Request - State: closed - Opened by netanel-haber 2 months ago - 84 comments

#4398 - refactor: DisaggExecutorTest

Pull Request - State: closed - Opened by Funatiq 2 months ago - 22 comments

#4383 - feat: add support for florence2

Pull Request - State: open - Opened by ducviet00 2 months ago - 4 comments
Labels: triaged, feature request, Community want to contribute, new model, Generic Runtime, Community Engagement

#4379 - fix: fix accuracy and illegal memory access issues when using mtp + attention dp

Pull Request - State: open - Opened by lfr-0531 2 months ago - 20 comments

#4367 - [AutoDeploy] Example Transformation in new configuration system

Issue - State: closed - Opened by lucaslie 2 months ago
Labels: AutoDeploy

#4364 - [AutoDeploy] Support overlap scheduler

Issue - State: closed - Opened by lucaslie 2 months ago - 2 comments
Labels: bug, triaged, AutoDeploy

#4332 - Change the method to calculate kv memory size in tests

Pull Request - State: closed - Opened by HuiGao-NV 2 months ago - 14 comments

#4331 - [Draft] [TRTLLM-5227]: Remove HMAC code

Pull Request - State: open - Opened by yibinl-nvidia 2 months ago - 21 comments

#4327 - [AutoDeploy] Arch1: Configurable transformation pipeline

Issue - State: closed - Opened by lucaslie 2 months ago
Labels: AutoDeploy

#4318 - [AutoDeploy] Arch0: AutoSharder v2

Issue - State: open - Opened by lucaslie 2 months ago
Labels: triaged, AutoDeploy

#4312 - [AutoDeploy] Advanced MLA Support

Issue - State: open - Opened by lucaslie 2 months ago
Labels: triaged, AutoDeploy