GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#4753 - tests: [TRTQA-2905] improve timeout report for qa test cases
Pull Request -
State: closed - Opened by crazydemo about 2 months ago
- 3 comments
#4749 - feat: cache reuse support (selective cache transfer) in mla cache formatter
Pull Request -
State: open - Opened by zhengd-nv about 2 months ago
- 8 comments
#4738 - fix: make BaseLlmArgs fail on unsupported args
Pull Request -
State: open - Opened by ixlmar about 2 months ago
- 19 comments
#4737 - [feat] Enable NVFP4 output for TRTLLM attention kernels
Pull Request -
State: closed - Opened by Tom-Zheng about 2 months ago
- 17 comments
#4735 - [nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding
Pull Request -
State: closed - Opened by Funatiq about 2 months ago
- 21 comments
#4727 - fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances
Pull Request -
State: open - Opened by Superjomn about 2 months ago
- 18 comments
#4723 - test CI
Pull Request -
State: closed - Opened by qsang-nv about 2 months ago
- 12 comments
#4700 - refactor: Separate DecoderState from GptDecoderBatched
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 12 comments
#4694 - use cu for fmha_v2
Pull Request -
State: open - Opened by qsang-nv 2 months ago
- 27 comments
#4693 - [https://nvbugspro.nvidia.com/bug/5300080] Fix the bug of setting attention_chunk_size and enable chunked-attention in the generation-phase by default
Pull Request -
State: open - Opened by PerkzZheng 2 months ago
- 30 comments
#4692 - Refactor the first token response in PD
Pull Request -
State: open - Opened by Shunkangz 2 months ago
- 15 comments
#4690 - fix: handle OOMs during KV cache estimation
Pull Request -
State: open - Opened by ixlmar 2 months ago
- 14 comments
#4685 - tests: fix 5273697
Pull Request -
State: open - Opened by xinhe-nv 2 months ago
#4667 - Solve underallocation in VSWA+/VGQA
Pull Request -
State: open - Opened by netanel-haber 2 months ago
- 70 comments
#4664 - Chore: only pad one dummy request for attention dp scenario
Pull Request -
State: open - Opened by QiJune 2 months ago
- 2 comments
#4663 - Add files into scan ignoreList
Pull Request -
State: open - Opened by yiqingy0 2 months ago
#4662 - [NVBUG 5301980] Fix fp4 gemm padding.
Pull Request -
State: open - Opened by Tracin 2 months ago
- 2 comments
#4661 - fix fmha v2 tests
Pull Request -
State: open - Opened by qsang-nv 2 months ago
- 3 comments
#4660 - Use runtime total gpu memory to calculate kv cache memory and log more memory information
Pull Request -
State: open - Opened by HuiGao-NV 2 months ago
- 3 comments
#4659 - fix: fmha_v2 compilation
Pull Request -
State: open - Opened by PerkzZheng 2 months ago
- 1 comment
#4658 - [https://nvbugs/5294983][fix] unwaive TestDeepSeekV3Lite::test_fp8_block_scales_4gpus
Pull Request -
State: open - Opened by zhhuang-nv 2 months ago
- 6 comments
#4657 - [5289904] chore: Unwaive test for Qwen model.
Pull Request -
State: open - Opened by hyukn 2 months ago
- 1 comment
#4656 - infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image)
Pull Request -
State: open - Opened by ZhanruiSunCh 2 months ago
- 103 comments
#4655 - [fix] Fix SamplingParams check on n and best_of
Pull Request -
State: open - Opened by syuoni 2 months ago
- 5 comments
#4654 - [TRTLLM-5327] - Fix guardwords scan step
Pull Request -
State: closed - Opened by yiqingy0 2 months ago
- 3 comments
#4653 - Support for Devstral with pytorch backend
Issue -
State: open - Opened by ankitmaurya001 2 months ago
#4652 - User/nvpohanh/pull out to min latency py
Pull Request -
State: open - Opened by nvpohanh 2 months ago
#4651 - feat: chunked prefill for MLA (Blackwell)
Pull Request -
State: open - Opened by jmydurant 2 months ago
- 38 comments
#4650 - fix nvbug 5302895
Pull Request -
State: open - Opened by chuangz0 2 months ago
- 2 comments
#4649 - Chore: introduce RequestQueueItem class instead of using tuple
Pull Request -
State: closed - Opened by QiJune 2 months ago
- 3 comments
#4648 - Fix handle cancel request for attentionDP
Pull Request -
State: open - Opened by Shunkangz 2 months ago
- 3 comments
#4647 - [fix] Unwaive torch compile tests
Pull Request -
State: open - Opened by liji-nv 2 months ago
- 3 comments
#4646 - fix disagg config params
Pull Request -
State: open - Opened by chuangz0 2 months ago
- 4 comments
#4645 - Waive L0 tests
Pull Request -
State: closed - Opened by yiqingy0 2 months ago
- 3 comments
#4644 - tests: unwaive deepseek case
Pull Request -
State: open - Opened by crazydemo 2 months ago
#4643 - feat: update DeepSeek FP8 TRT-LLM Gen cubins
Pull Request -
State: open - Opened by nekorobov 2 months ago
- 27 comments
#4635 - Use backend to replace macro to control enablement of MNNVL all reduce
Pull Request -
State: open - Opened by HuiGao-NV 2 months ago
- 77 comments
#4630 - [TRTLLM-4971]: Use safe deserialization in ParallelConfig
Pull Request -
State: open - Opened by yibinl-nvidia 2 months ago
- 47 comments
#4623 - [TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend
Pull Request -
State: open - Opened by LinPoly 2 months ago
- 8 comments
#4621 - [nvbugs/5274894] fix: Sort requests for functional correctness and performance (adapted from #4608)
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 18 comments
#4617 - [nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 6 comments
#4615 - feat: large-scale EP(part 4: Static EP load balancer integration)
Pull Request -
State: closed - Opened by syuoni 2 months ago
- 11 comments
#4614 - Chore: refine shutdown signal of PyExecutor
Pull Request -
State: closed - Opened by QiJune 2 months ago
- 9 comments
#4613 - fix: test trtllm-bench mgmn
Pull Request -
State: open - Opened by Superjomn 2 months ago
- 5 comments
#4612 - Update fmha v2 and switch to cu
Pull Request -
State: open - Opened by qsang-nv 2 months ago
- 15 comments
#4611 - feat: Integration of Fused QKNorm+RoPE.
Pull Request -
State: open - Opened by bobboli 2 months ago
- 6 comments
#4609 - Waive L0 test
Pull Request -
State: closed - Opened by yiqingy0 2 months ago
- 3 comments
#4607 - chore: sort llm request state enums in chronological order
Pull Request -
State: closed - Opened by zhengd-nv 2 months ago
- 6 comments
#4605 - [test] Unwaive testcases
Pull Request -
State: open - Opened by zongfeijing 2 months ago
- 12 comments
#4603 - chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs
Pull Request -
State: closed - Opened by Superjomn 2 months ago
- 32 comments
#4602 - [TRTLLM-5327] - Add scan stage
Pull Request -
State: closed - Opened by yiqingy0 2 months ago
- 8 comments
#4600 - fix: build_config in TorchLlmArgs and avoid invalid args
Pull Request -
State: open - Opened by Superjomn 2 months ago
- 6 comments
#4597 - fix: random fail of cache router test
Pull Request -
State: open - Opened by zhengd-nv 2 months ago
- 21 comments
#4594 - [TRTLLM-4647][fix] Fix the no fusion allreduce hanging
Pull Request -
State: open - Opened by timlee0212 2 months ago
- 8 comments
#4593 - [AutoDeploy] Arch2: Model Support: VLM, Long-Context, and Linear Attention
Issue -
State: open - Opened by sugunav14 2 months ago
Labels: AutoDeploy
#4583 - [fix] Make add_special_tokens to false for completions API
Pull Request -
State: open - Opened by Pernekhan 2 months ago
- 6 comments
Labels: triaged, Community want to contribute, OpenAI API
#4581 - chore: introduce KvCacheCreator
Pull Request -
State: open - Opened by ixlmar 2 months ago
- 44 comments
#4577 - Release 0.20 to main
Pull Request -
State: open - Opened by amirkl94 2 months ago
- 24 comments
#4569 - [TRTLLM-5000][feat] NGrams V2
Pull Request -
State: open - Opened by wili-65535 2 months ago
- 10 comments
#4561 - fix: fix dsr1 min lat cga ar rate drop(0.2)
Pull Request -
State: open - Opened by yunruis 2 months ago
- 6 comments
#4560 - Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL
Pull Request -
State: open - Opened by yunruis 2 months ago
- 8 comments
#4558 - [TRTLLM-3456] Speculation: Draft Target in new FW
Pull Request -
State: closed - Opened by IzzyPutterman 2 months ago
- 41 comments
#4557 - Cherry-pick feat/llama4's updates
Pull Request -
State: closed - Opened by nvpohanh 2 months ago
#4552 - [Infra]Remove some old keyword
Pull Request -
State: open - Opened by EmmaQiaoCh 2 months ago
- 11 comments
#4550 - fix: llmapi-launch add add trtllm-bench test with engine building (#4…
Pull Request -
State: open - Opened by Superjomn 2 months ago
- 6 comments
#4549 - test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv 2 months ago
- 3 comments
#4543 - chore: update transformers version to 4.52.1
Pull Request -
State: open - Opened by nv-guomingz 2 months ago
- 15 comments
#4541 - fix: Move cv2 import to load_video function
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 6 comments
#4536 - [https://nvbugs/5271281][fix] fix a pd+mtp accuracy issue
Pull Request -
State: closed - Opened by lfr-0531 2 months ago
- 34 comments
#4533 - [DON'T MERGE] NGram V2 draft
Pull Request -
State: closed - Opened by wili-65535 2 months ago
- 7 comments
#4532 - fix: max_num_sequences calculation with overlap scheduling
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 35 comments
#4529 - fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu
Pull Request -
State: open - Opened by Superjomn 2 months ago
- 9 comments
#4525 - feat: better build_wheel.py venv handling
Pull Request -
State: closed - Opened by tongyuantongyu 2 months ago
- 15 comments
#4522 - test: conditional disagg and cache aware balancing for deepseek v3
Pull Request -
State: open - Opened by zhengd-nv 2 months ago
- 78 comments
#4514 - feat: Skip sampler for intermediate pp stages.
Pull Request -
State: closed - Opened by yuxianq 2 months ago
- 27 comments
#4510 - test: rcca https://nvbugs/5223130
Pull Request -
State: open - Opened by xinhe-nv 2 months ago
- 10 comments
#4498 - fix: Handle additional model outputs based on pipeline parallel rank
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 9 comments
#4497 - feat: forward exceptions to Python and catch OOMs
Pull Request -
State: open - Opened by ixlmar 2 months ago
- 29 comments
#4494 - [TRTLLM-4783][feat] Mamba2 kernel updates for Nemotron-H
Pull Request -
State: open - Opened by tomeras91 2 months ago
- 26 comments
#4492 - fix: Remove duplicate tokenization in generation server
Pull Request -
State: closed - Opened by Shunkangz 2 months ago
- 13 comments
#4480 - [TEST] Pytest skip for Llama4 tests fix
Pull Request -
State: open - Opened by EmmaQiaoCh 2 months ago
- 5 comments
#4467 - [feat] Piecewise cuda graph support for MLA
Pull Request -
State: open - Opened by liji-nv 2 months ago
- 15 comments
#4466 - feat: Enhance AutoTuner inference path and code readability
Pull Request -
State: open - Opened by hyukn 2 months ago
- 50 comments
#4464 - draft: KV Cache GPUDirect Storage
Pull Request -
State: open - Opened by achartier 2 months ago
- 36 comments
#4458 - No module named 'tensorrt_llm.bindings
Issue -
State: open - Opened by Shegun93 2 months ago
- 3 comments
Labels: triaged, Installation, Investigating
#4454 - [Infra] - Multi-GPU testing support with Slurm
Pull Request -
State: open - Opened by yuanjingx87 2 months ago
- 72 comments
#4450 - refactor: Update decoder buffer and logits management
Pull Request -
State: open - Opened by Funatiq 2 months ago
- 55 comments
#4436 - Feature flux
Pull Request -
State: open - Opened by forrestl111 2 months ago
- 1 comment
#4406 - [AutoDeploy] Refactor AutoDeploy torch custom op to attach `auto_deploy` prefix to the op namespace
Issue -
State: open - Opened by suyoggupta 2 months ago
Labels: AutoDeploy
#4401 - feature: make trtllmsampler new_tokens format the universal format
Pull Request -
State: closed - Opened by netanel-haber 2 months ago
- 84 comments
#4398 - refactor: DisaggExecutorTest
Pull Request -
State: closed - Opened by Funatiq 2 months ago
- 22 comments
#4383 - feat: add support for florence2
Pull Request -
State: open - Opened by ducviet00 2 months ago
- 4 comments
Labels: triaged, feature request, Community want to contribute, new model, Generic Runtime, Community Engagement
#4379 - fix: fix accuracy and illegal memory access issues when using mtp + attention dp
Pull Request -
State: open - Opened by lfr-0531 2 months ago
- 20 comments
#4367 - [AutoDeploy] Example Transformation in new configuration system
Issue -
State: closed - Opened by lucaslie 2 months ago
Labels: AutoDeploy
#4364 - [AutoDeploy] Support overlap scheduler
Issue -
State: closed - Opened by lucaslie 2 months ago
- 2 comments
Labels: bug, triaged, AutoDeploy
#4332 - Change the method to calculate kv memory size in tests
Pull Request -
State: closed - Opened by HuiGao-NV 2 months ago
- 14 comments
#4331 - [Draft] [TRTLLM-5227]: Remove HMAC code
Pull Request -
State: open - Opened by yibinl-nvidia 2 months ago
- 21 comments
#4327 - [AutoDeploy] Arch1: Configurable transformation pipeline
Issue -
State: closed - Opened by lucaslie 2 months ago
Labels: AutoDeploy
#4318 - [AutoDeploy] Arch0: AutoSharder v2
Issue -
State: open - Opened by lucaslie 2 months ago
Labels: triaged, AutoDeploy
#4312 - [AutoDeploy] Advanced MLA Support
Issue -
State: open - Opened by lucaslie 2 months ago
Labels: triaged, AutoDeploy