GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#5522 - [DRAFT] feat: transfer multimodal_data and refactor HyperCLOVAX & Qwen2/2.5-VL
Pull Request -
State: open - Opened by yechank-nvidia 30 days ago
#5518 - [don't review] Fp8 blockwise gemm autotune
Pull Request -
State: open - Opened by limin2021 about 1 month ago
#5513 - feature: unify new_tokens format sample state to trtllm samper tokens format
Pull Request -
State: closed - Opened by netanel-haber about 1 month ago
- 9 comments
#5507 - chore: Mass integration of release/0.21
Pull Request -
State: open - Opened by dc3671 about 1 month ago
- 21 comments
#5505 - Cache transceiver support VSWA
Pull Request -
State: open - Opened by chuangz0 about 1 month ago
- 6 comments
#5503 - [fix]: Fix main test skip issue
Pull Request -
State: closed - Opened by yizhang-nv about 1 month ago
- 23 comments
#5500 - Qwen2.5-VL-3B pytorch backend with cuda graph error result
Issue -
State: open - Opened by specter2018 about 1 month ago
- 4 comments
Labels: bug, triaged
#5489 - [TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest
Pull Request -
State: open - Opened by Funatiq about 1 month ago
- 34 comments
#5480 - Enable cuda graph as default testing.
Pull Request -
State: open - Opened by dominicshanshan about 1 month ago
- 17 comments
#5476 - feat: reduce unnecessary kernel generation
Pull Request -
State: open - Opened by tongyuantongyu about 1 month ago
- 32 comments
#5465 - [nvbug 5300551] test: increase block count in eviction test
Pull Request -
State: closed - Opened by zhengd-nv about 1 month ago
- 15 comments
#5439 - draft: migration from pybind11 to nanobind
Pull Request -
State: open - Opened by Linda-Stadter about 1 month ago
- 29 comments
#5437 - test: Reduce number of C++ test cases
Pull Request -
State: open - Opened by Funatiq about 1 month ago
- 40 comments
#5432 - perf: better heuristic for allreduce
Pull Request -
State: open - Opened by yilin-void about 1 month ago
- 28 comments
#5431 - [TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1)
Pull Request -
State: open - Opened by Superjomn about 1 month ago
- 61 comments
#5419 - Feat/apply autotuner to cute dsl fp8 gemm
Pull Request -
State: closed - Opened by limin2021 about 1 month ago
#5418 - Chore: split _build_model method for TorchLlm and TrtLlm
Pull Request -
State: open - Opened by QiJune about 1 month ago
- 2 comments
#5417 - [nvbug 5273941] fix: broken cyclic reference detect
Pull Request -
State: open - Opened by Superjomn about 1 month ago
#5416 - Fix test Pytorch model engine
Pull Request -
State: open - Opened by Tabrizian about 1 month ago
- 2 comments
#5415 - [DRAFT] fix: Enable num_return_sequences (`n`) support in PyTorch backend
Pull Request -
State: open - Opened by jaedeok-nvidia about 1 month ago
#5414 - add fmha test
Pull Request -
State: open - Opened by qsang-nv about 1 month ago
#5413 - Fix load balancing router bug
Pull Request -
State: open - Opened by Shunkangz about 1 month ago
#5412 - Make moe permute and final as custom op
Pull Request -
State: open - Opened by limin2021 about 1 month ago
- 1 comment
#5411 - fix: Fix static EPLB
Pull Request -
State: open - Opened by syuoni about 1 month ago
- 5 comments
#5410 - feat: Expose bias and FP8_MXFP4 MOE CUTLASS backend features to pytorch
Pull Request -
State: open - Opened by djns99 about 1 month ago
#5409 - chore: delete mamba hybrid, since it is now called NemotronH
Pull Request -
State: open - Opened by vegaluisjose about 1 month ago
- 2 comments
#5408 - Gibberish from Llama-3.3-70B-Instruct-FP8
Issue -
State: open - Opened by sarmiena about 1 month ago
- 5 comments
Labels: bug, triaged
#5407 - Draft: test: [CI] Add failed cases into waives.txt
Pull Request -
State: closed - Opened by xinhe-nv about 1 month ago
#5406 - [DRAFT] refactor: PyExecutor uses a list-type for response handling
Pull Request -
State: open - Opened by jaedeok-nvidia about 1 month ago
- 4 comments
#5405 - Add unit test for routing kernels
Pull Request -
State: open - Opened by ChristinaZ about 1 month ago
- 3 comments
#5404 - [#5403][perf] Conditionally enable SWAP AB for speculative decoding
Pull Request -
State: open - Opened by zoheth about 1 month ago
- 10 comments
Labels: Community want to contribute
#5403 - [Perf] Conditionally enable SWAP AB for speculative decoding
Issue -
State: open - Opened by zoheth about 1 month ago
#5402 - when enable_kv_cache_reuse enabled,but only need to cache Prompt ,not cache ecoder, how to config?
Issue -
State: open - Opened by w066650 about 1 month ago
#5401 - [DON'T MERGE] NGram V2 test
Pull Request -
State: open - Opened by wili-65535 about 1 month ago
- 4 comments
#5400 - test: [CI] remove closed bugs
Pull Request -
State: open - Opened by xinhe-nv about 1 month ago
- 2 comments
#5399 - remove libnuma conan dependency
Pull Request -
State: open - Opened by dongxuy04 about 1 month ago
- 3 comments
#5398 - Add sleep function for disagg gen-only benchmarking
Pull Request -
State: open - Opened by qiaoxj07 about 1 month ago
- 3 comments
#5397 - test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests
Pull Request -
State: closed - Opened by yizhang-nv about 1 month ago
- 66 comments
#5396 - [1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes
Pull Request -
State: open - Opened by chang-l about 1 month ago
- 6 comments
#5394 - [TRTLLM-6019] feat: Remove cutlass min latency code from AutoTuner.
Pull Request -
State: open - Opened by hyukn about 1 month ago
- 6 comments
#5393 - Broken openai api
Issue -
State: open - Opened by Alcanderian about 1 month ago
Labels: bug
#5392 - Adapt a new model with a structure similar to LLaMA3.
Issue -
State: open - Opened by BaiStone2017 about 1 month ago
#5391 - Server stuck with high load
Issue -
State: open - Opened by k-l-lambda about 1 month ago
- 1 comment
Labels: bug
#5390 - RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Issue -
State: open - Opened by moguizhizi about 1 month ago
Labels: bug
#5389 - [TensorRT-LLM][ERROR] Assertion failed: Error occurred when running GEMM!
Issue -
State: open - Opened by momaek about 1 month ago
Labels: bug
#5388 - [test] speedup TRT accuracy tests
Pull Request -
State: closed - Opened by omera-nv about 1 month ago
- 3 comments
#5387 - [doc] update mtp documents
Pull Request -
State: closed - Opened by lfr-0531 about 1 month ago
- 4 comments
#5386 - invalid request_id in `MTPSpecMetadata`
Issue -
State: open - Opened by k-l-lambda about 1 month ago
Labels: bug
#5385 - feat: Remove not used padding_idx in models
Pull Request -
State: open - Opened by HuiGao-NV about 1 month ago
- 6 comments
#5384 - refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead
Pull Request -
State: open - Opened by Funatiq about 1 month ago
- 15 comments
#5383 - Qwen3-0.6B-FP8模型在使用quickstart_advanced.py时报错于 rmsnorm 数据类型 Float 无法调度
Issue -
State: open - Opened by Sevix7766 about 1 month ago
#5382 - Detokenize option in /v1/completions request
Pull Request -
State: open - Opened by Wokzy about 1 month ago
Labels: Community want to contribute, Community Engagement
#5381 - How to Quantify Qwen2.5-VL-Instruct with Tensorrt-LLM?
Issue -
State: open - Opened by buptmengjj about 1 month ago
#5380 - Is the following result normal? Why is prompt_token_ids so long?
Issue -
State: open - Opened by buptmengjj about 1 month ago
#5379 - Mixtral-8x7B-Instruct awq-w4a8 output shows duplicated Chinese text
Issue -
State: open - Opened by wanzhenchn about 1 month ago
Labels: bug
#5378 - Fix: missing clientId when serialize and deserialize response (cherry-pick #5231)
Pull Request -
State: closed - Opened by kaiyux about 1 month ago
- 3 comments
#5377 - add to trtllm-bench user-controllable backend configuration
Issue -
State: open - Opened by nzmora-nvidia about 1 month ago
#5376 - [TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve
Pull Request -
State: open - Opened by talorabr about 1 month ago
#5374 - feat: Misc Opt for large scale EP
Pull Request -
State: closed - Opened by dongxuy04 about 1 month ago
- 6 comments
#5371 - [TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H
Pull Request -
State: open - Opened by tomeras91 about 1 month ago
- 24 comments
#5369 - fix: fix bug of qwen3 + eagle3 + finalize_moe_fusion
Pull Request -
State: open - Opened by byshiue about 1 month ago
- 5 comments
#5367 - Mxfp4 moe
Pull Request -
State: open - Opened by Tracin about 1 month ago
#5364 - feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18
Pull Request -
State: open - Opened by Wanli-Jiang about 1 month ago
- 8 comments
#5358 - Make moe permute and final as custom op
Pull Request -
State: closed - Opened by limin2021 about 1 month ago
- 1 comment
#5355 - fix: Fix skip by mpi size fixture
Pull Request -
State: closed - Opened by yizhang-nv about 1 month ago
- 12 comments
#5348 - test: Add LLGuidance test and refine guided decoding
Pull Request -
State: open - Opened by syuoni about 1 month ago
- 26 comments
#5346 - feat: add LLmArgs option to force using dynamic quantization
Pull Request -
State: open - Opened by achartier about 1 month ago
- 9 comments
#5343 - [fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation
Pull Request -
State: closed - Opened by HuiGao-NV about 1 month ago
- 13 comments
#5341 - [fix][test] parametrize deepseek eval
Pull Request -
State: open - Opened by omera-nv about 1 month ago
- 33 comments
#5340 - fix: pass the correct KV retention config for block eviction
Pull Request -
State: open - Opened by achartier about 1 month ago
- 6 comments
#5336 - [nvbugs/5323043] test: Fix triton_extensive test
Pull Request -
State: open - Opened by Tabrizian about 1 month ago
- 9 comments
#5333 - [TRTLLM-3442] feat: added beam search support to the PyTorch Workflow
Pull Request -
State: open - Opened by stnie about 1 month ago
- 27 comments
#5328 - [TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler
Pull Request -
State: open - Opened by dcampora about 1 month ago
- 18 comments
#5318 - perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf
Pull Request -
State: open - Opened by bobboli about 1 month ago
- 6 comments
#5316 - refactor: Speculative decoding buffers part 2
Pull Request -
State: closed - Opened by Funatiq about 1 month ago
- 33 comments
#5315 - refactor: manage cache indirection in decoder state
Pull Request -
State: open - Opened by Funatiq about 1 month ago
- 15 comments
#5312 - [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default
Pull Request -
State: closed - Opened by Superjomn about 1 month ago
- 16 comments
#5310 - How to run Qwen3 using triton-server + trtllm
Issue -
State: open - Opened by ezioliao about 1 month ago
- 4 comments
Labels: question, triaged, Triton Backend, Investigating
#5291 - [ModelLoad] Concurrent load model
Pull Request -
State: open - Opened by arekay about 1 month ago
- 3 comments
#5282 - Draft: add script to protect perf. DO-NOT-MERGE
Pull Request -
State: open - Opened by litaotju about 1 month ago
#5281 - ci: unwaive llmapi launch test
Pull Request -
State: open - Opened by Superjomn about 1 month ago
- 8 comments
#5270 - feat: Dynamically remove servers in PD
Pull Request -
State: open - Opened by Shunkangz about 1 month ago
- 12 comments
#5265 - Fix : fix build for sm120
Pull Request -
State: open - Opened by peaceh-nv about 1 month ago
- 21 comments
#5262 - draft: fix cudaStreamSynchronize when using relaxed acceptance
Pull Request -
State: open - Opened by yweng0828 about 1 month ago
#5261 - chore: remove torch_compile prefix for TorchCompileConfig field members
Pull Request -
State: open - Opened by nv-guomingz about 1 month ago
- 3 comments
#5260 - [Torchflow] Can't run torchflow on llamaforcausallm with fp32. rmsnorm(at::Tensor&, at::Tensor&, at::Tensor&, double, bool)::<lambda()> failed to dispatch data type Float
Issue -
State: open - Opened by michaelfeil about 1 month ago
Labels: bug
#5259 - Fix: https://nvbugs/5345720
Pull Request -
State: open - Opened by QiJune about 1 month ago
- 2 comments
#5258 - [enhance] Add the ability to write a request timeline.
Pull Request -
State: open - Opened by FrankD412 about 1 month ago
#5257 - [AutoDeploy] Enhance graph transformation test utils
Issue -
State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy
#5256 - [AutoDeploy] Investigate Graph Visualization
Issue -
State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy
#5255 - Migrant other transformations to use torch._inductor pattern matcher
Issue -
State: open - Opened by Fridah-nv about 1 month ago
Labels: AutoDeploy
#5254 - Investigate TRTLLM runtime repetitive issue
Issue -
State: open - Opened by Fridah-nv about 1 month ago
Labels: bug, AutoDeploy
#5253 - fix: only set _mpi_session if world_size is > 1
Pull Request -
State: open - Opened by achartier about 1 month ago
- 4 comments
#5252 - chore: Waive CI failure.
Pull Request -
State: closed - Opened by SimengLiu-nv about 1 month ago
- 3 comments
#5251 - [chore] Remove BaseDraftTokenManager
Pull Request -
State: open - Opened by mikeiovine about 1 month ago
- 6 comments
#5250 - doc:update contributing md for internal developers
Pull Request -
State: open - Opened by nv-guomingz about 1 month ago
- 3 comments
Labels: Documentation
#5249 - Refactor the signature of AD graph transformations
Issue -
State: open - Opened by nzmora-nvidia about 1 month ago
Labels: AutoDeploy
#5248 - [infra] Make test_chunked_prefill faster
Pull Request -
State: closed - Opened by mikeiovine about 1 month ago
- 10 comments
#5247 - quantize.py doesn't recognize cnn_dailymail dataset if the dataset path doesn't include cnn_dailymail
Issue -
State: open - Opened by mwawrzos about 1 month ago
Labels: bug