Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/TensorRT-LLM issues and pull requests

#2436 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux 1 day ago

#2434 - Error in data types: using model with lora

Issue - State: open - Opened by Alireza3242 1 day ago
Labels: bug

#2432 - integrating support for structured decoding library outlines

Issue - State: open - Opened by kumar-devesh 2 days ago
Labels: triaged, feature request

#2431 - Adding lora for quantized models

Issue - State: closed - Opened by Alireza3242 4 days ago

#2430 - trtllm-build ignores `--model_cls_file` and `--model_cls_name`

Issue - State: open - Opened by abhishekudupa 4 days ago
Labels: bug, triaged

#2429 - trt_build for Llama 3.1 70B fp8 fails with CUDA error

Issue - State: open - Opened by chrisreese-if 4 days ago - 1 comment
Labels: bug, triaged

#2428 - trt_build for Llama 3.1 70B w4a8 fails with CUDA error

Issue - State: open - Opened by chrisreese-if 4 days ago - 1 comment
Labels: bug, triaged, quantization

#2427 - why Dit does not support pp_size > 1

Issue - State: open - Opened by algorithmconquer 5 days ago - 1 comment
Labels: question, triaged

#2426 - [TensorRT-LLM][INFO] Initializing MPI with thread mode 3

Issue - State: open - Opened by Rumeysakeskin 5 days ago - 3 comments
Labels: triaged, installation

#2425 - Small Typo

Issue - State: open - Opened by MARD1NO 5 days ago - 1 comment
Labels: documentation, triaged

#2424 - [Question] Document/examples to enable draft model speculative decoding using c++ executor API

Issue - State: open - Opened by ynwang007 5 days ago - 1 comment
Labels: question, triaged

#2421 - support FLUX?

Issue - State: open - Opened by algorithmconquer 6 days ago - 8 comments
Labels: question, triaged

#2420 - qwen 2-1.5B model build error

Issue - State: open - Opened by rexmxw02 7 days ago - 3 comments
Labels: bug, duplicate, triaged

#2419 - Assertion failed: Must set crossKvCacheFraction for encoder-decoder model

Issue - State: open - Opened by Saeedmatt3r 7 days ago - 2 comments
Labels: bug, triaged

#2418 - add the missing files

Pull Request - State: closed - Opened by nv-guomingz 7 days ago

#2417 - CUDA runtime error in cudaMemcpyAsync when enabling kv cache reuse with prompt table and TP > 1.

Issue - State: open - Opened by jxchenus 7 days ago - 8 comments
Labels: bug, triaged, Investigating

#2416 - ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

Issue - State: open - Opened by DeekshithaDPrakash 7 days ago - 1 comment
Labels: triaged, installation, waiting for feedback

#2415 - Request for Colbert Model

Issue - State: open - Opened by FernandoDorado 8 days ago - 6 comments
Labels: question, triaged, new model

#2413 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux 8 days ago

#2412 - Exporting Finetuned Llama models to TensorRT-LLM

Issue - State: open - Opened by DeekshithaDPrakash 8 days ago - 1 comment
Labels: question, triaged, waiting for feedback

#2411 - Consistent Output with Same Prompts

Issue - State: open - Opened by ZhenboYan 8 days ago - 1 comment
Labels: question, triaged

#2410 - update llm api reference page.

Pull Request - State: closed - Opened by nv-guomingz 9 days ago
Labels: documentation, triaged

#2409 - fix documents issues

Pull Request - State: closed - Opened by Shixiaowei02 9 days ago

#2408 - Question: How do `enable_context_fmha` and `use_paged_context_fmha` work?

Issue - State: open - Opened by dontloo 10 days ago - 6 comments
Labels: question, triaged

#2407 - run.py --run_profiling respects stop token and is unsuitable for performance comparisons

Issue - State: open - Opened by aikitoria 11 days ago - 1 comment
Labels: question, triaged, waiting for feedback

#2406 - logprobs always 0.000

Issue - State: open - Opened by mmoskal 12 days ago - 3 comments
Labels: bug, triaged, Investigating

#2404 - Update gh-pages

Pull Request - State: closed - Opened by Shixiaowei02 12 days ago

#2402 - Segmentation fault (11) on 1022dev+TRT 10.4.0

Issue - State: open - Opened by aliencaocao 12 days ago - 4 comments
Labels: bug, triaged, waiting for feedback

#2401 - Update TensorRT-LLM v0.14.0

Pull Request - State: closed - Opened by kaiyux 12 days ago

#2400 - Error Code 9: API Usage Error (Target GPU SM 70 is not supported by this TensorRT release.)

Issue - State: closed - Opened by aliencaocao 12 days ago - 7 comments
Labels: question, triaged

#2399 - Error when running llava on v0.13.0

Issue - State: closed - Opened by zhangts20 12 days ago - 2 comments
Labels: bug, triaged, Investigating

#2398 - T5 out of memory

Issue - State: open - Opened by ydm-amazon 13 days ago - 10 comments
Labels: bug, triaged

#2397 - th::optional -> std::optional

Pull Request - State: open - Opened by r-barnes 13 days ago
Labels: triaged

#2396 - How to rewrite this kernel without referencing the implementation of cutlass

Issue - State: closed - Opened by zhink 13 days ago - 5 comments
Labels: question, triaged

#2395 - Why is the performance worse than release 0.12.0 when I run the benchmark of release 0.13.0

Issue - State: open - Opened by rexmxw02 13 days ago - 9 comments
Labels: triaged, performance issue, waiting for feedback

#2394 - add support internvl2

Pull Request - State: open - Opened by Jeremy-J-J 13 days ago - 5 comments
Labels: triaged, feature request, waiting for feedback

#2393 - illegal memory access with mpirun and cpp example

Issue - State: closed - Opened by mmoskal 13 days ago - 3 comments
Labels: bug, triaged, waiting for feedback

#2392 - Qwen2-72B w4a8 empty output

Issue - State: open - Opened by lishicheng1996 14 days ago - 4 comments
Labels: bug, triaged, quantization

#2391 - Update the latest news

Pull Request - State: closed - Opened by kaiyux 15 days ago

#2389 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux 15 days ago

#2388 - Qwen2-1.5B-Instruct convert_checkpoint.py failed

Issue - State: open - Opened by 1994 15 days ago - 2 comments
Labels: bug, triaged, waiting for feedback

#2387 - How to use Medusa to support encoder decoder model?

Issue - State: open - Opened by TianzhongSong 15 days ago - 1 comment
Labels: question, triaged, feature request

#2386 - Error in benchmarks/python/all_reduce.py

Issue - State: closed - Opened by wpybtw 15 days ago - 2 comments
Labels: bug, triaged

#2385 - Flash attention issue while converting checkpoint

Issue - State: open - Opened by Aaryanverma 16 days ago - 1 comment
Labels: triaged, installation, waiting for feedback

#2384 - attention mechanism toggle added

Pull Request - State: open - Opened by Aaryanverma 16 days ago - 1 comment
Labels: triaged, waiting for feedback, functionality issue

#2383 - What is the difference between stop_words_list and end_id

Issue - State: open - Opened by tonylek 17 days ago - 3 comments
Labels: question, triaged

#2382 - fix load_model_on_cpu on qwen/convert_checkpoint.py

Pull Request - State: open - Opened by lkm2835 17 days ago
Labels: triaged, feature request

#2381 - CUDA Out of Memory Error when Running Nemotron-51B with TensorRT-LLM on 4xA100

Issue - State: open - Opened by ShivamSphn 17 days ago - 1 comment
Labels: Investigating

#2380 - Error while importing tensorrt_llm

Issue - State: open - Opened by Aaryanverma 18 days ago - 1 comment
Labels: question, triaged, installation

#2379 - build bert: build does not load model

Issue - State: closed - Opened by Alireza3242 18 days ago - 3 comments
Labels: bug, triaged, build

#2378 - network: fix broken onnx export

Pull Request - State: open - Opened by ishandhanani 18 days ago - 1 comment
Labels: bug, duplicate, triaged, Merged

#2377 - FP8 Conversion failure when using Mixtral 8x7B with use_fp8_rowwise

Issue - State: closed - Opened by ValeGian 19 days ago - 8 comments
Labels: bug, triaged, build

#2376 - ModelRunner cannot start engine with "multi-rank nemo LoRA" checkpoints

Issue - State: open - Opened by jolyons123 19 days ago - 1 comment
Labels: bug, triaged, build

#2374 - TPOT=0 without In-flight Batching in benckmark

Issue - State: open - Opened by mltloveyy 19 days ago
Labels: question, triaged, performance issue, benchmark

#2373 - Bug in build bert

Issue - State: closed - Opened by Alireza3242 19 days ago - 1 comment
Labels: bug, triaged, build

#2372 - XQA kernel works slower with fp8 kv than with fp16 kv on H100

Issue - State: open - Opened by ttim 20 days ago - 2 comments
Labels: question, triaged, performance issue

#2371 - How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine...

Issue - State: open - Opened by JoJoLev 20 days ago - 9 comments
Labels: question, triaged, build

#2370 - Fix errors when using smoothquant to quantize Qwen2 model

Pull Request - State: open - Opened by Missmiaom 20 days ago - 1 comment
Labels: triaged, quantization

#2367 - return_log_probs slow down generation

Issue - State: open - Opened by Desmond819 20 days ago - 3 comments
Labels: bug, performance issue, Investigating

#2366 - Allow for LoRA modules with different rank dimensions when using HF format

Pull Request - State: closed - Opened by AlessioNetti 21 days ago - 2 comments

#2365 - fast-forward tokens in logits post processor

Issue - State: open - Opened by mmoskal 21 days ago - 2 comments
Labels: triaged, feature request, runtime

#2363 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux 22 days ago

#2362 - Inconsistent Results Between Python Runtime and Python-Binding-C++ When Running TRT-LLM Multimodel

Issue - State: open - Opened by Oldpan 22 days ago - 1 comment
Labels: bug, triaged, runtime

#2361 - c++ inference example

Issue - State: open - Opened by scuizhibin 22 days ago - 1 comment
Labels: question, runtime

#2360 - Error when run 'sudo make -C docker release_build'

Issue - State: open - Opened by SouthWest7 23 days ago - 1 comment
Labels: question, build, Investigating

#2359 - How could i set ptuning prompt embedding table by c++ api?

Issue - State: closed - Opened by zhaocc1106 23 days ago - 1 comment
Labels: question

#2357 - openai_server error

Issue - State: open - Opened by imilli 25 days ago - 1 comment
Labels: question, triaged, runtime

#2356 - convert_checkpoint report error

Issue - State: open - Opened by imilli 25 days ago - 1 comment
Labels: bug, triaged, build

#2355 - Build and run nvidia/Llama-3_1-Nemotron-51B-Instruct on a single A100 80Gb

Issue - State: open - Opened by edesalve 25 days ago
Labels: question, triaged, quantization

#2354 - test_cpp.py

Issue - State: open - Opened by weizhi-wang 26 days ago - 3 comments
Labels: Investigating, waiting for feedback

#2353 - qwen, tensorrt-llm=0.12.0

Issue - State: open - Opened by yanglongbiao 26 days ago - 1 comment
Labels: question, runtime

#2352 - Passing gpt_variant to model conversion

Pull Request - State: closed - Opened by tonylek 26 days ago - 2 comments
Labels: triaged, build

#2351 - [Question] Int8 Gemm's perf degraded in real models.

Issue - State: open - Opened by foreverlms 26 days ago
Labels: question, triaged, quantization

#2350 - free_gpu_memory_fraction not working for examples/apps/openai_server.py

Issue - State: closed - Opened by anaivebird 26 days ago - 2 comments
Labels: bug, triaged

#2349 - Is there any extra demo for in-flight batch strategy?

Issue - State: closed - Opened by Noblezhong 27 days ago - 2 comments
Labels: question, triaged, runtime

#2348 - unknown flag: --trt_root

Issue - State: open - Opened by Gu0725 27 days ago - 3 comments
Labels: triaged, build, Investigating

#2347 - trtllm-bench "No module named 'tensorrt_llm.bench.datamodels'" in v0.13.0

Issue - State: open - Opened by activezhao 27 days ago - 2 comments
Labels: bug, triaged, benchmark

#2346 - _SyncQueue class attributeError:

Issue - State: open - Opened by vonodiripsa 27 days ago - 3 comments
Labels: bug, triaged

#2345 - Status of TensorRT-LLM Eagle Implementation

Issue - State: closed - Opened by avianion 27 days ago - 1 comment
Labels: question, triaged, not a bug

#2344 - When I used convert_checkpoint.py to convert Gemma hf format, It print killed

Issue - State: open - Opened by imilli 27 days ago
Labels: question, triaged

#2343 - Specify Llama 3.x information in example readme

Pull Request - State: closed - Opened by laikhtewari 28 days ago - 1 comment

#2342 - LLM in TTS

Issue - State: open - Opened by CallmeZhangChenchen 28 days ago - 2 comments
Labels: question, triaged

#2341 - Where is **MedusaDecodingLayer** be executed ?

Issue - State: closed - Opened by RichardWooSJTU 28 days ago

#2340 - Support for DeepseekV2ForCausalLM

Issue - State: open - Opened by tgandrew 28 days ago - 4 comments
Labels: triaged, feature request, new model

#2338 - Whisper Encoder issues with Executor API

Issue - State: open - Opened by MahmoudAshraf97 29 days ago - 6 comments
Labels: question, triaged, runtime

#2337 - hang up using mpirun -n 2

Issue - State: open - Opened by Hukongtao 29 days ago - 3 comments
Labels: bug, triaged, installation, Investigating

#2336 - support qwen2.5 models

Issue - State: open - Opened by wxsms 29 days ago - 3 comments
Labels: triaged, feature request, new model

#2335 - gpu memory leak when max_tokens = 1 and gather_all_token_logits

Issue - State: open - Opened by anaivebird 29 days ago - 9 comments
Labels: bug, triaged, Investigating

#2333 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux 29 days ago