Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#2436 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux 2 days ago
#2435 - tritonserver is 40x slower than `TensorRT-LLM/examples/run.py`
Issue -
State: open - Opened by ShuaiShao93 3 days ago
Labels: bug
#2434 - Error in data types: using model with lora
Issue -
State: open - Opened by Alireza3242 3 days ago
Labels: bug
#2433 - Is there any restriction on the weight dimension value?
Issue -
State: closed - Opened by zhink 3 days ago
#2432 - integrating support for structured decoding library outlines
Issue -
State: open - Opened by kumar-devesh 4 days ago
Labels: triaged, feature request
#2431 - Adding lora for quantized models
Issue -
State: closed - Opened by Alireza3242 5 days ago
#2430 - trtllm-build ignores `--model_cls_file` and `--model_cls_name`
Issue -
State: open - Opened by abhishekudupa 6 days ago
Labels: bug, triaged
#2429 - trt_build for Llama 3.1 70B fp8 fails with CUDA error
Issue -
State: open - Opened by chrisreese-if 6 days ago
- 1 comment
Labels: bug, triaged
#2428 - trt_build for Llama 3.1 70B w4a8 fails with CUDA error
Issue -
State: open - Opened by chrisreese-if 6 days ago
- 1 comment
Labels: bug, triaged, quantization
#2427 - why Dit does not support pp_size > 1
Issue -
State: open - Opened by algorithmconquer 6 days ago
- 1 comment
Labels: question, triaged
#2426 - [TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Issue -
State: open - Opened by Rumeysakeskin 6 days ago
- 3 comments
Labels: triaged, installation
#2425 - Small Typo
Issue -
State: open - Opened by MARD1NO 6 days ago
- 1 comment
Labels: documentation, triaged
#2424 - [Question] Document/examples to enable draft model speculative decoding using c++ executor API
Issue -
State: open - Opened by ynwang007 7 days ago
- 1 comment
Labels: question, triaged
#2423 - [Question] Can I build the tritonserver, tensorrtllm_backend and tensorrtllm and then use these build files across servers?
Issue -
State: open - Opened by chrisreese-if 7 days ago
Labels: question, triaged
#2422 - attempt to run benchmark with batch_size>=512 and input_output_len=1024,128 result in tensor volume exceeds 2147483647 error
Issue -
State: open - Opened by dmonakhov 7 days ago
- 2 comments
Labels: triaged, waiting for feedback
#2421 - support FLUX?
Issue -
State: open - Opened by algorithmconquer 7 days ago
- 8 comments
Labels: question, triaged
#2420 - qwen 2-1.5B model build error
Issue -
State: open - Opened by rexmxw02 8 days ago
- 3 comments
Labels: bug, duplicate, triaged
#2419 - Assertion failed: Must set crossKvCacheFraction for encoder-decoder model
Issue -
State: open - Opened by Saeedmatt3r 8 days ago
- 2 comments
Labels: bug, triaged
#2418 - add the missing files
Pull Request -
State: closed - Opened by nv-guomingz 8 days ago
#2417 - CUDA runtime error in cudaMemcpyAsync when enabling kv cache reuse with prompt table and TP > 1.
Issue -
State: open - Opened by jxchenus 9 days ago
- 8 comments
Labels: bug, triaged, Investigating
#2416 - ModuleNotFoundError: No module named 'tensorrt_llm.bindings'
Issue -
State: open - Opened by DeekshithaDPrakash 9 days ago
- 1 comment
Labels: triaged, installation, waiting for feedback
#2415 - Request for Colbert Model
Issue -
State: open - Opened by FernandoDorado 9 days ago
- 6 comments
Labels: question, triaged, new model
#2413 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux 9 days ago
#2412 - Exporting Finetuned Llama models to TensorRT-LLM
Issue -
State: open - Opened by DeekshithaDPrakash 10 days ago
- 1 comment
Labels: question, triaged, waiting for feedback
#2411 - Consistent Output with Same Prompts
Issue -
State: open - Opened by ZhenboYan 10 days ago
- 1 comment
Labels: question, triaged
#2410 - update llm api reference page.
Pull Request -
State: closed - Opened by nv-guomingz 10 days ago
Labels: documentation, triaged
#2409 - fix documents issues
Pull Request -
State: closed - Opened by Shixiaowei02 10 days ago
#2408 - Question: How do `enable_context_fmha` and `use_paged_context_fmha` work?
Issue -
State: open - Opened by dontloo 12 days ago
- 6 comments
Labels: question, triaged
#2407 - run.py --run_profiling respects stop token and is unsuitable for performance comparisons
Issue -
State: open - Opened by aikitoria 13 days ago
- 1 comment
Labels: question, triaged, waiting for feedback
#2406 - logprobs always 0.000
Issue -
State: open - Opened by mmoskal 13 days ago
- 3 comments
Labels: bug, triaged, Investigating
#2405 - `import tensorrt_llm` prints out `[TensorRT-LLM][INFO] Initializing MPI with thread mode 3` and gets stuck there
Issue -
State: closed - Opened by mrakgr 13 days ago
- 7 comments
Labels: bug
#2404 - Update gh-pages
Pull Request -
State: closed - Opened by Shixiaowei02 13 days ago
#2402 - Segmentation fault (11) on 1022dev+TRT 10.4.0
Issue -
State: open - Opened by aliencaocao 13 days ago
- 4 comments
Labels: bug, triaged, waiting for feedback
#2401 - Update TensorRT-LLM v0.14.0
Pull Request -
State: closed - Opened by kaiyux 13 days ago
#2400 - Error Code 9: API Usage Error (Target GPU SM 70 is not supported by this TensorRT release.)
Issue -
State: closed - Opened by aliencaocao 14 days ago
- 7 comments
Labels: question, triaged
#2399 - Error when running llava on v0.13.0
Issue -
State: closed - Opened by zhangts20 14 days ago
- 2 comments
Labels: bug, triaged, Investigating
#2398 - T5 out of memory
Issue -
State: open - Opened by ydm-amazon 14 days ago
- 10 comments
Labels: bug, triaged
#2397 - th::optional -> std::optional
Pull Request -
State: open - Opened by r-barnes 14 days ago
Labels: triaged
#2396 - How to rewrite this kernel without referencing the implementation of cutlass
Issue -
State: closed - Opened by zhink 14 days ago
- 5 comments
Labels: question, triaged
#2395 - Why is the performance worse than release 0.12.0 when I run the benchmark of release 0.13.0
Issue -
State: open - Opened by rexmxw02 14 days ago
- 9 comments
Labels: triaged, performance issue, waiting for feedback
#2394 - add support internvl2
Pull Request -
State: open - Opened by Jeremy-J-J 14 days ago
- 5 comments
Labels: triaged, feature request, waiting for feedback
#2393 - illegal memory access with mpirun and cpp example
Issue -
State: closed - Opened by mmoskal 15 days ago
- 3 comments
Labels: bug, triaged, waiting for feedback
#2392 - Qwen2-72B w4a8 empty output
Issue -
State: open - Opened by lishicheng1996 15 days ago
- 4 comments
Labels: bug, triaged, quantization
#2391 - Update the latest news
Pull Request -
State: closed - Opened by kaiyux 16 days ago
#2389 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux 16 days ago
#2388 - Qwen2-1.5B-Instruct convert_checkpoint.py failed
Issue -
State: open - Opened by 1994 16 days ago
- 2 comments
Labels: bug, triaged, waiting for feedback
#2387 - How to use Medusa to support encoder decoder model?
Issue -
State: open - Opened by TianzhongSong 16 days ago
- 1 comment
Labels: question, triaged, feature request
#2386 - Error in benchmarks/python/all_reduce.py
Issue -
State: closed - Opened by wpybtw 16 days ago
- 2 comments
Labels: bug, triaged
#2385 - Flash attention issue while converting checkpoint
Issue -
State: open - Opened by Aaryanverma 17 days ago
- 1 comment
Labels: triaged, installation, waiting for feedback
#2384 - attention mechanism toggle added
Pull Request -
State: open - Opened by Aaryanverma 17 days ago
- 1 comment
Labels: triaged, waiting for feedback, functionality issue
#2383 - What is the difference between stop_words_list and end_id
Issue -
State: open - Opened by tonylek 18 days ago
- 3 comments
Labels: question, triaged
#2382 - fix load_model_on_cpu on qwen/convert_checkpoint.py
Pull Request -
State: open - Opened by lkm2835 18 days ago
Labels: triaged, feature request
#2381 - CUDA Out of Memory Error when Running Nemotron-51B with TensorRT-LLM on 4xA100
Issue -
State: open - Opened by ShivamSphn 19 days ago
- 1 comment
Labels: Investigating
#2380 - Error while importing tensorrt_llm
Issue -
State: open - Opened by Aaryanverma 19 days ago
- 1 comment
Labels: question, triaged, installation
#2379 - build bert: build does not load model
Issue -
State: closed - Opened by Alireza3242 19 days ago
- 3 comments
Labels: bug, triaged, build
#2378 - network: fix broken onnx export
Pull Request -
State: open - Opened by ishandhanani 20 days ago
- 1 comment
Labels: bug, duplicate, triaged, Merged
#2377 - FP8 Conversion failure when using Mixtral 8x7B with use_fp8_rowwise
Issue -
State: closed - Opened by ValeGian 20 days ago
- 8 comments
Labels: bug, triaged, build
#2376 - ModelRunner cannot start engine with "multi-rank nemo LoRA" checkpoints
Issue -
State: open - Opened by jolyons123 20 days ago
- 1 comment
Labels: bug, triaged, build
#2375 - ModuleNotFoundError: No module named 'tensorrt_bindings'
Issue -
State: closed - Opened by whoo9112 21 days ago
#2374 - TPOT=0 without In-flight Batching in benckmark
Issue -
State: open - Opened by mltloveyy 21 days ago
Labels: question, triaged, performance issue, benchmark
#2373 - Bug in build bert
Issue -
State: closed - Opened by Alireza3242 21 days ago
- 1 comment
Labels: bug, triaged, build
#2372 - XQA kernel works slower with fp8 kv than with fp16 kv on H100
Issue -
State: open - Opened by ttim 21 days ago
- 2 comments
Labels: question, triaged, performance issue
#2371 - How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine...
Issue -
State: open - Opened by JoJoLev 21 days ago
- 9 comments
Labels: question, triaged, build
#2370 - Fix errors when using smoothquant to quantize Qwen2 model
Pull Request -
State: open - Opened by Missmiaom 21 days ago
- 1 comment
Labels: triaged, quantization
#2369 - UnsupportedOperatorError: ONNX export failed on an operator with unrecognized namespace flash_attn::_flash_attn_forward. If you are trying to export a custom operator, make sure you registered it with the right domain and version.
Issue -
State: closed - Opened by scuizhibin 21 days ago
- 2 comments
Labels: triaged, Investigating
#2367 - return_log_probs slow down generation
Issue -
State: open - Opened by Desmond819 22 days ago
- 3 comments
Labels: bug, performance issue, Investigating
#2366 - Allow for LoRA modules with different rank dimensions when using HF format
Pull Request -
State: closed - Opened by AlessioNetti 22 days ago
- 2 comments
#2365 - fast-forward tokens in logits post processor
Issue -
State: open - Opened by mmoskal 23 days ago
- 2 comments
Labels: triaged, feature request, runtime
#2363 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux 23 days ago
#2362 - Inconsistent Results Between Python Runtime and Python-Binding-C++ When Running TRT-LLM Multimodel
Issue -
State: open - Opened by Oldpan 23 days ago
- 1 comment
Labels: bug, triaged, runtime
#2361 - c++ inference example
Issue -
State: open - Opened by scuizhibin 23 days ago
- 1 comment
Labels: question, runtime
#2360 - Error when run 'sudo make -C docker release_build'
Issue -
State: open - Opened by SouthWest7 24 days ago
- 1 comment
Labels: question, build, Investigating
#2359 - How could i set ptuning prompt embedding table by c++ api?
Issue -
State: closed - Opened by zhaocc1106 24 days ago
- 1 comment
Labels: question
#2358 - Encountered an error in forwardAsync function: [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemcpyAsync(dst, src.data(), src.getSizeInBytes(), cudaMemcpyDefault, mStream->get()): invalid argument
Issue -
State: open - Opened by zhaocc1106 26 days ago
- 9 comments
Labels: bug, triaged
#2357 - openai_server error
Issue -
State: open - Opened by imilli 26 days ago
- 1 comment
Labels: question, triaged, runtime
#2356 - convert_checkpoint report error
Issue -
State: open - Opened by imilli 26 days ago
- 1 comment
Labels: bug, triaged, build
#2355 - Build and run nvidia/Llama-3_1-Nemotron-51B-Instruct on a single A100 80Gb
Issue -
State: open - Opened by edesalve 26 days ago
Labels: question, triaged, quantization
#2354 - test_cpp.py
Issue -
State: open - Opened by weizhi-wang 27 days ago
- 3 comments
Labels: Investigating, waiting for feedback
#2353 - qwen, tensorrt-llm=0.12.0
Issue -
State: open - Opened by yanglongbiao 27 days ago
- 1 comment
Labels: question, runtime
#2352 - Passing gpt_variant to model conversion
Pull Request -
State: closed - Opened by tonylek 27 days ago
- 2 comments
Labels: triaged, build
#2351 - [Question] Int8 Gemm's perf degraded in real models.
Issue -
State: open - Opened by foreverlms 27 days ago
Labels: question, triaged, quantization
#2350 - free_gpu_memory_fraction not working for examples/apps/openai_server.py
Issue -
State: closed - Opened by anaivebird 28 days ago
- 2 comments
Labels: bug, triaged
#2349 - Is there any extra demo for in-flight batch strategy?
Issue -
State: closed - Opened by Noblezhong 28 days ago
- 2 comments
Labels: question, triaged, runtime
#2348 - unknown flag: --trt_root
Issue -
State: open - Opened by Gu0725 28 days ago
- 3 comments
Labels: triaged, build, Investigating
#2347 - trtllm-bench "No module named 'tensorrt_llm.bench.datamodels'" in v0.13.0
Issue -
State: open - Opened by activezhao 28 days ago
- 2 comments
Labels: bug, triaged, benchmark
#2346 - _SyncQueue class attributeError:
Issue -
State: open - Opened by vonodiripsa 29 days ago
- 3 comments
Labels: bug, triaged
#2345 - Status of TensorRT-LLM Eagle Implementation
Issue -
State: closed - Opened by avianion 29 days ago
- 1 comment
Labels: question, triaged, not a bug
#2344 - When I used convert_checkpoint.py to convert Gemma hf format, It print killed
Issue -
State: open - Opened by imilli 29 days ago
Labels: question, triaged
#2343 - Specify Llama 3.x information in example readme
Pull Request -
State: closed - Opened by laikhtewari 29 days ago
- 1 comment
#2342 - LLM in TTS
Issue -
State: open - Opened by CallmeZhangChenchen 29 days ago
- 2 comments
Labels: question, triaged
#2341 - Where is **MedusaDecodingLayer** be executed ?
Issue -
State: closed - Opened by RichardWooSJTU 29 days ago
#2340 - Support for DeepseekV2ForCausalLM
Issue -
State: open - Opened by tgandrew 30 days ago
- 4 comments
Labels: triaged, feature request, new model
#2339 - checkpoint conversion script (/llama/convert_checkpoint.py) for Llama-3.2-3B-Instruct is failing with the following error
Issue -
State: open - Opened by GaneshDoosa 30 days ago
- 1 comment
Labels: bug, triaged
#2338 - Whisper Encoder issues with Executor API
Issue -
State: open - Opened by MahmoudAshraf97 about 1 month ago
- 6 comments
Labels: question, triaged, runtime
#2337 - hang up using mpirun -n 2
Issue -
State: open - Opened by Hukongtao about 1 month ago
- 3 comments
Labels: bug, triaged, installation, Investigating
#2336 - support qwen2.5 models
Issue -
State: open - Opened by wxsms about 1 month ago
- 3 comments
Labels: triaged, feature request, new model
#2335 - gpu memory leak when max_tokens = 1 and gather_all_token_logits
Issue -
State: open - Opened by anaivebird about 1 month ago
- 9 comments
Labels: bug, triaged, Investigating
#2333 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux about 1 month ago
#2332 - [json.exception.out_of_range.403] key 'builder_config' not found with v0.13.0
Issue -
State: closed - Opened by activezhao about 1 month ago
Labels: bug