triton-inference-server/tensorrtllm_backend issues and pull requests

#791 - LoRa weights not applied without warnings/errors when mismatch in type

Issue - State: open - Opened by rahchuenmonroe 3 months ago
Labels: bug

#750 - lora_config shape mismatch when using converted LoRA at runtime

Issue - State: open - Opened by paulhendricks 7 months ago
Labels: bug

#743 - whisper tensorrt-llm backend drop the accuracy for small.en model

Issue - State: closed - Opened by hualin-wu-2000 7 months ago
Labels: bug

#739 - no /app folder in container nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3

Issue - State: open - Opened by smennes 7 months ago
Labels: bug

#738 - docker export than import the image，run error

Issue - State: closed - Opened by XiaoBin1992 7 months ago
Labels: bug

#727 - tensorrtllm_backend doesn't work with remote repos

Issue - State: open - Opened by ShuaiShao93 8 months ago
Labels: bug

#721 - /app is not present in nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 image

Issue - State: closed - Opened by adityarajsahu 9 months ago
Labels: bug

#720 - Fail to test DraftTarget model with triton server tensorrtllm backend

Issue - State: open - Opened by gloritygithub11 9 months ago
Labels: bug

#712 - run llava1.5-7b with triton get ERROR: "ERROR: Failed to create instance: Stub process 'multimodal_encoders_0_0' is not healthy."

Issue - State: open - Opened by HPUedCSLearner 9 months ago
Labels: bug

#711 - Missing lookAheadRuntimeConfig in Triton Server with TensorRT-LLM backend HTTP Request

Issue - State: open - Opened by shaylapid 9 months ago
Labels: bug

#710 - Tritonserver Fails to Start with TensorRT-LLM Backend with lookahead_decoding mode - Assertion Failure in lookaheadDecodingLayer.cpp

Issue - State: open - Opened by shaylapid 9 months ago - 1 comment
Labels: bug

#707 - Failed to build TensorRT-LLM whisper Decoder

Issue - State: open - Opened by muhammad-faizan-122 10 months ago
Labels: bug

#705 - Inconsistent Batch Index Order in Decoupled Mode with trt-llm and triton trtllm backend

Issue - State: closed - Opened by Oldpan 10 months ago - 2 comments
Labels: bug

#702 - numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request

Issue - State: open - Opened by freedom-168 10 months ago
Labels: bug

#692 - Mllama ignores input image when deployed in triton

Issue - State: open - Opened by mutkach 10 months ago
Labels: bug

#686 - Unable to build from source for tag `v0.16.0`.

Issue - State: open - Opened by jingzhaoou 10 months ago
Labels: bug

#685 - DeepSeek-R1-Distill-Qwen-32B FP16 model does not work with Triton server + tensorrtllm_backend (but works with just TensorRT-LLM)

Issue - State: open - Opened by kelkarn 10 months ago
Labels: bug

#682 - Beam search diversity lost with in-flight batching

Issue - State: open - Opened by Grace-YingHuang 10 months ago
Labels: bug

#679 - Assertion failed: sizeof(T) <= remaining_buffer_size

Issue - State: open - Opened by gawain000000 11 months ago
Labels: bug

#678 - Inference error with using draft target model

Issue - State: open - Opened by pimang62 11 months ago
Labels: bug

#672 - Whisper - Missing parameters for triton deployment using tensorrt_llm backend

Issue - State: open - Opened by eleapttn 11 months ago
Labels: bug

#667 - Inflight Batching not working

Issue - State: open - Opened by frosk1 11 months ago
Labels: bug

#662 - Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the model configuration.

Issue - State: open - Opened by ChristophHandschuh 12 months ago
Labels: bug

#661 - triton server multi request dynamic_batching not work

Issue - State: open - Opened by kazyun 12 months ago
Labels: bug

#656 - Qwen2_5-0_5B-Instruct convert_checkpoint error

Issue - State: open - Opened by giftyang 12 months ago
Labels: bug

#651 - triton streaming is not working as expected

Issue - State: open - Opened by robosina about 1 year ago
Labels: bug

#646 - Stub process 'whisper_bls_0_0' is not healthy.

Issue - State: open - Opened by MrD005 about 1 year ago
Labels: bug

#642 - With same engine, trtllm backend is 40x slower than TensorRT-LLM/examples/run.py

Issue - State: closed - Opened by ShuaiShao93 about 1 year ago - 1 comment
Labels: bug

#640 - problem with streaming

Issue - State: closed - Opened by Alireza3242 about 1 year ago - 1 comment
Labels: bug

#639 - Support non-detached mode for python trtllm backend

Issue - State: open - Opened by ShuaiShao93 about 1 year ago
Labels: bug

#630 - the output of bls is unstable

Issue - State: open - Opened by dwq370 about 1 year ago
Labels: bug

#626 - Streaming Inference Failure

Issue - State: open - Opened by imilli about 1 year ago
Labels: bug

#625 - The GPU memory usage is too high.

Issue - State: open - Opened by imilli about 1 year ago
Labels: bug

#624 - Garbage response when input tokens is longer than 4096 on Llama-3.1-8B-Instruct

Issue - State: open - Opened by winstxnhdw about 1 year ago
Labels: bug

#623 - Failed install in nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3

Issue - State: open - Opened by wwx007121 about 1 year ago
Labels: bug

#619 - Throw ZeroDivisionError when benchmark

Issue - State: closed - Opened by moyerlee about 1 year ago
Labels: bug

#618 - unable to load shared library: /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm_common.so: undefined symbol: _ZNK12tensorrt_llm8executor8Response11getErrorMsgB5cxx11Ev;

Issue - State: open - Opened by wwx007121 about 1 year ago
Labels: bug

#616 - fill_template.py and gpu_device_ids

Issue - State: open - Opened by Alireza3242 about 1 year ago
Labels: bug

#613 - An error that `Shape does not match true shape of 'data' field` occurs when using tensorrt_llm model alone in inflight_batcher_llm

Issue - State: closed - Opened by junstar92 about 1 year ago - 1 comment
Labels: bug

#610 - Is ReDrafter supported by the TensorRT-LLM backend?

Issue - State: open - Opened by vkc1vk about 1 year ago - 2 comments
Labels: bug

#609 - Dynamic batching not working

Issue - State: open - Opened by ShuaiShao93 about 1 year ago
Labels: bug

#603 - Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton

Issue - State: open - Opened by alvaroalfaro612 about 1 year ago
Labels: bug

#601 - Qwen2-14B inference garbled

Issue - State: open - Opened by kazyun about 1 year ago
Labels: bug

#598 - generation logits dtype bug

Issue - State: open - Opened by binhtranmcs about 1 year ago - 3 comments
Labels: bug

#596 - request is blocked and non output when using tensor parallelism with multi gpus

Issue - State: open - Opened by dwq370 about 1 year ago
Labels: bug

#595 - Can't build GPT-J 6B

Issue - State: open - Opened by coppock about 1 year ago
Labels: bug

#593 - Is `no_repeat_ngram_size` generation option supported?

Issue - State: open - Opened by vnkc1 about 1 year ago
Labels: bug

#587 - Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests

Issue - State: open - Opened by wangpeilin over 1 year ago
Labels: bug

#584 - multi lora infer error : [TensorRT-LLm Error][fpA_intB Runner] Failed to run cutlass fpA_intB gemm. Error: Error Internal

Issue - State: closed - Opened by PAOPAO6 over 1 year ago
Labels: bug

#583 - Trion server + lora multiple times the same input results are different

Issue - State: closed - Opened by PAOPAO6 over 1 year ago
Labels: bug

#582 - Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

Issue - State: open - Opened by ajagetia2001 over 1 year ago
Labels: bug

#580 - The Docker container stops when using `python3 scripts/launch_triton_server.py --world_size 1 --model_repo=model_repo/` as the starting command in the Docker Compose YAML file.

Issue - State: open - Opened by Aquasar11 over 1 year ago
Labels: bug

#579 - unable to launch model with tensorrt_llm

Issue - State: closed - Opened by janpetrov over 1 year ago - 8 comments
Labels: bug

#577 - Unable to launch triton server with TP

Issue - State: open - Opened by dhruvmullick over 1 year ago
Labels: bug

#576 - Unable to launch triton server with TP

Pull Request - State: closed - Opened by dhruvmullick over 1 year ago - 1 comment
Labels: bug

#573 - Inference server stalling

Issue - State: open - Opened by siddhatiwari over 1 year ago - 5 comments
Labels: bug

#572 - Failed to launch triton server, the tensorrt_llm protobuf file failed to load

Issue - State: open - Opened by KuntaiDu over 1 year ago - 2 comments
Labels: bug

#569 - LLAMA3: Unable to launch with tp 2

Issue - State: open - Opened by mindhash over 1 year ago
Labels: bug

#568 - Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'use_custom_all_reduce' not found;

Issue - State: closed - Opened by seyunchoi over 1 year ago - 1 comment
Labels: bug

#566 - Build Qwen2-72B model to TensorRT engines failed

Issue - State: open - Opened by wangpeilin over 1 year ago
Labels: bug

#565 - Encountered an error when fetching new request: Prompt length (200) exceeds maximum input length (1)

Issue - State: open - Opened by jayakommuru over 1 year ago
Labels: bug

#564 - v0.11.0 release fails when TP>1

Issue - State: open - Opened by daulet over 1 year ago
Labels: bug

#563 - Triton crashes on boot

Issue - State: open - Opened by daulet over 1 year ago
Labels: bug

#562 - Unable to initialize shared memory key 'triton_python_backend_shm_region_2'

Pull Request - State: closed - Opened by zhangyu68 over 1 year ago - 1 comment
Labels: bug

#561 - Trying to compile the latest trtllm (under the v0.12 main branch) in triton 24.07-trtllm-python-py3 reports an error

Issue - State: open - Opened by gzy19990617 over 1 year ago
Labels: bug

#560 - How to calculate the number of loras that can be cached to host cache?

Issue - State: open - Opened by limertang over 1 year ago
Labels: bug

#559 - How to calculate the number of cached loras

Pull Request - State: open - Opened by limertang over 1 year ago
Labels: bug

#558 - inflight_batcher_llm example batching

Issue - State: open - Opened by PKaralupov over 1 year ago
Labels: bug

#557 - `min_length` parameter doesn't work

Issue - State: open - Opened by vnkc1 over 1 year ago
Labels: bug

#555 - Llama 3.1 Tool-Calling Support

Issue - State: open - Opened by LanceB57 over 1 year ago
Labels: bug

#554 - get_parameter(model_config, "max_attention_window_size", int) not support list

Issue - State: open - Opened by Alireza3242 over 1 year ago
Labels: bug

#553 - Server stuck after `Starting Python backend stub`

Issue - State: open - Opened by DZADSL72-00558 over 1 year ago
Labels: bug

#550 - bugs in v0.10.0 version with tensorrtllm_backend

Issue - State: closed - Opened by x-transformers over 1 year ago - 2 comments
Labels: bug

#545 - unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9 using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3

Issue - State: open - Opened by jlewi over 1 year ago
Labels: bug

#542 - Unable to build tensorrt_llm backend; problems with CXX11 ABI

Issue - State: closed - Opened by jlewi over 1 year ago - 3 comments
Labels: bug

#540 - Following enc-dec workflow with V100 fails, unable to load shared library libtriton_tensorrtllm_common.so

Issue - State: open - Opened by owenonline over 1 year ago
Labels: bug

#539 - Model inference error (unexpected shape) when sending async requests for in-flight batching

Issue - State: closed - Opened by ngockhanh5110 over 1 year ago
Labels: bug

#532 - Achieving Benchmark Performance on Triton Inference Server

Issue - State: open - Opened by LanceB57 over 1 year ago
Labels: bug

#531 - Deserializing Engine Version Mismatch

Issue - State: closed - Opened by LanceB57 over 1 year ago - 1 comment
Labels: bug

#529 - Assertion failed: Cannot determine if hopper is specialised without a selected config at runtime (Latest commit)

Issue - State: closed - Opened by christian-ci over 1 year ago
Labels: bug

#526 - Invalid argument: unable to find backend library for backend '${triton_backend}'

Issue - State: open - Opened by chenchunhui97 over 1 year ago
Labels: bug

#525 - Issue Mixtral 8x7b failed to load preprocessing model.

Issue - State: closed - Opened by christian-ci over 1 year ago - 1 comment
Labels: bug

#524 - launch multi-gpu triton server and got an Error

Issue - State: open - Opened by dwq370 over 1 year ago
Labels: bug

#521 - Error: terminate called after throwing an instance of 'boost::interprocess::lock_exception'

Issue - State: open - Opened by Pedrochem over 1 year ago
Labels: bug

#520 - Ensemble and tensorrt_llm_bls have different results when using accumulate_tokens

Issue - State: open - Opened by activezhao over 1 year ago
Labels: bug

#514 - Qwen 7B giving error internal: ValueError: Input None is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers

Issue - State: open - Opened by ankitqx3 over 1 year ago - 1 comment
Labels: bug

#513 - Accumulation of tokens while beam_width > 1

Issue - State: open - Opened by wxsms over 1 year ago
Labels: bug

#511 - Exception when disabling "inflight_fused_batching"

Issue - State: open - Opened by TheCodeWrangler over 1 year ago
Labels: bug

#510 - How to solve the problem of errors when loading qwen1.5-7B (using two GPUs) and llama3-8B (using two GPUs) models simultaneously using tritonserver?

Issue - State: open - Opened by ChengShuting over 1 year ago
Labels: bug

#509 - 3rd Tritonserver fails to respond

Pull Request - State: open - Opened by njaramish over 1 year ago
Labels: bug

#508 - Assertion failed: Invalid tensor name: decoder_input_lengths

Issue - State: open - Opened by HowardChenRV over 1 year ago
Labels: bug

#506 - Key 'lora_config' not found

Issue - State: open - Opened by LanceB57 over 1 year ago
Labels: bug

#505 - how to set `ignore_eos` when benchmark TensorRT LLM

Issue - State: closed - Opened by zhyncs over 1 year ago - 2 comments
Labels: bug

#503 - No Text Output

Issue - State: open - Opened by Adevils over 1 year ago
Labels: bug

#502 - "error":"Unable to parse 'data': Shape does not match true shape of 'data' field"

Issue - State: open - Opened by ljm565 over 1 year ago
Labels: bug

#500 - UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal

Issue - State: closed - Opened by Naphat-Khoprasertthaworn over 1 year ago - 1 comment
Labels: bug

#493 - Deepseek model streaming mode with Chinese character �?

Issue - State: open - Opened by activezhao over 1 year ago
Labels: bug

#488 - Error in streaming mode noting that execute function should return None

Issue - State: closed - Opened by kisseternity over 1 year ago - 2 comments
Labels: bug, triaged, need more info

#487 - Got repeated answer while deploying LLaMA3-Instruct-8B model in triton server

Issue - State: closed - Opened by AndyZZt over 1 year ago - 2 comments
Labels: bug

#486 - [Bug] Output generation does not stop at stop token </s>

Issue - State: closed - Opened by Hao-YunDeng over 1 year ago - 2 comments
Labels: bug

GitHub / triton-inference-server/tensorrtllm_backend issues and pull requests