GitHub / triton-inference-server/tensorrtllm_backend issues and pull requests
Labelled with: bug
#791 - LoRa weights not applied without warnings/errors when mismatch in type
Issue -
State: open - Opened by rahchuenmonroe 3 months ago
Labels: bug
#750 - lora_config shape mismatch when using converted LoRA at runtime
Issue -
State: open - Opened by paulhendricks 7 months ago
Labels: bug
#743 - whisper tensorrt-llm backend drop the accuracy for small.en model
Issue -
State: closed - Opened by hualin-wu-2000 7 months ago
Labels: bug
#739 - no /app folder in container nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
Issue -
State: open - Opened by smennes 7 months ago
Labels: bug
#738 - docker export than import the image,run error
Issue -
State: closed - Opened by XiaoBin1992 7 months ago
Labels: bug
#727 - tensorrtllm_backend doesn't work with remote repos
Issue -
State: open - Opened by ShuaiShao93 8 months ago
Labels: bug
#721 - /app is not present in nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 image
Issue -
State: closed - Opened by adityarajsahu 9 months ago
Labels: bug
#720 - Fail to test DraftTarget model with triton server tensorrtllm backend
Issue -
State: open - Opened by gloritygithub11 9 months ago
Labels: bug
#712 - run llava1.5-7b with triton get ERROR: "ERROR: Failed to create instance: Stub process 'multimodal_encoders_0_0' is not healthy."
Issue -
State: open - Opened by HPUedCSLearner 9 months ago
Labels: bug
#711 - Missing lookAheadRuntimeConfig in Triton Server with TensorRT-LLM backend HTTP Request
Issue -
State: open - Opened by shaylapid 9 months ago
Labels: bug
#710 - Tritonserver Fails to Start with TensorRT-LLM Backend with lookahead_decoding mode - Assertion Failure in lookaheadDecodingLayer.cpp
Issue -
State: open - Opened by shaylapid 9 months ago
- 1 comment
Labels: bug
#707 - Failed to build TensorRT-LLM whisper Decoder
Issue -
State: open - Opened by muhammad-faizan-122 10 months ago
Labels: bug
#705 - Inconsistent Batch Index Order in Decoupled Mode with trt-llm and triton trtllm backend
Issue -
State: closed - Opened by Oldpan 10 months ago
- 2 comments
Labels: bug
#702 - numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
Issue -
State: open - Opened by freedom-168 10 months ago
Labels: bug
#692 - Mllama ignores input image when deployed in triton
Issue -
State: open - Opened by mutkach 10 months ago
Labels: bug
#686 - Unable to build from source for tag `v0.16.0`.
Issue -
State: open - Opened by jingzhaoou 10 months ago
Labels: bug
#685 - DeepSeek-R1-Distill-Qwen-32B FP16 model does not work with Triton server + tensorrtllm_backend (but works with just TensorRT-LLM)
Issue -
State: open - Opened by kelkarn 10 months ago
Labels: bug
#682 - Beam search diversity lost with in-flight batching
Issue -
State: open - Opened by Grace-YingHuang 10 months ago
Labels: bug
#679 - Assertion failed: sizeof(T) <= remaining_buffer_size
Issue -
State: open - Opened by gawain000000 11 months ago
Labels: bug
#678 - Inference error with using draft target model
Issue -
State: open - Opened by pimang62 11 months ago
Labels: bug
#672 - Whisper - Missing parameters for triton deployment using tensorrt_llm backend
Issue -
State: open - Opened by eleapttn 11 months ago
Labels: bug
#667 - Inflight Batching not working
Issue -
State: open - Opened by frosk1 11 months ago
Labels: bug
#662 - Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the model configuration.
Issue -
State: open - Opened by ChristophHandschuh 12 months ago
Labels: bug
#661 - triton server multi request dynamic_batching not work
Issue -
State: open - Opened by kazyun 12 months ago
Labels: bug
#656 - Qwen2___5-0___5B-Instruct convert_checkpoint error
Issue -
State: open - Opened by giftyang 12 months ago
Labels: bug
#651 - triton streaming is not working as expected
Issue -
State: open - Opened by robosina about 1 year ago
Labels: bug
#646 - Stub process 'whisper_bls_0_0' is not healthy.
Issue -
State: open - Opened by MrD005 about 1 year ago
Labels: bug
#642 - With same engine, trtllm backend is 40x slower than TensorRT-LLM/examples/run.py
Issue -
State: closed - Opened by ShuaiShao93 about 1 year ago
- 1 comment
Labels: bug
#640 - problem with streaming
Issue -
State: closed - Opened by Alireza3242 about 1 year ago
- 1 comment
Labels: bug
#639 - Support non-detached mode for python trtllm backend
Issue -
State: open - Opened by ShuaiShao93 about 1 year ago
Labels: bug
#630 - the output of bls is unstable
Issue -
State: open - Opened by dwq370 about 1 year ago
Labels: bug
#626 - Streaming Inference Failure
Issue -
State: open - Opened by imilli about 1 year ago
Labels: bug
#625 - The GPU memory usage is too high.
Issue -
State: open - Opened by imilli about 1 year ago
Labels: bug
#624 - Garbage response when input tokens is longer than 4096 on Llama-3.1-8B-Instruct
Issue -
State: open - Opened by winstxnhdw about 1 year ago
Labels: bug
#623 - Failed install in nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
Issue -
State: open - Opened by wwx007121 about 1 year ago
Labels: bug
#619 - Throw ZeroDivisionError when benchmark
Issue -
State: closed - Opened by moyerlee about 1 year ago
Labels: bug
#618 - unable to load shared library: /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm_common.so: undefined symbol: _ZNK12tensorrt_llm8executor8Response11getErrorMsgB5cxx11Ev;
Issue -
State: open - Opened by wwx007121 about 1 year ago
Labels: bug
#616 - fill_template.py and gpu_device_ids
Issue -
State: open - Opened by Alireza3242 about 1 year ago
Labels: bug
#613 - An error that `Shape does not match true shape of 'data' field` occurs when using tensorrt_llm model alone in inflight_batcher_llm
Issue -
State: closed - Opened by junstar92 about 1 year ago
- 1 comment
Labels: bug
#610 - Is ReDrafter supported by the TensorRT-LLM backend?
Issue -
State: open - Opened by vkc1vk about 1 year ago
- 2 comments
Labels: bug
#609 - Dynamic batching not working
Issue -
State: open - Opened by ShuaiShao93 about 1 year ago
Labels: bug
#603 - Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton
Issue -
State: open - Opened by alvaroalfaro612 about 1 year ago
Labels: bug
#601 - Qwen2-14B inference garbled
Issue -
State: open - Opened by kazyun about 1 year ago
Labels: bug
#598 - generation logits dtype bug
Issue -
State: open - Opened by binhtranmcs about 1 year ago
- 3 comments
Labels: bug
#596 - request is blocked and non output when using tensor parallelism with multi gpus
Issue -
State: open - Opened by dwq370 about 1 year ago
Labels: bug
#595 - Can't build GPT-J 6B
Issue -
State: open - Opened by coppock about 1 year ago
Labels: bug
#593 - Is `no_repeat_ngram_size` generation option supported?
Issue -
State: open - Opened by vnkc1 about 1 year ago
Labels: bug
#587 - Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests
Issue -
State: open - Opened by wangpeilin over 1 year ago
Labels: bug
#584 - multi lora infer error : [TensorRT-LLm Error][fpA_intB Runner] Failed to run cutlass fpA_intB gemm. Error: Error Internal
Issue -
State: closed - Opened by PAOPAO6 over 1 year ago
Labels: bug
#583 - Trion server + lora multiple times the same input results are different
Issue -
State: closed - Opened by PAOPAO6 over 1 year ago
Labels: bug
#582 - Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side
Issue -
State: open - Opened by ajagetia2001 over 1 year ago
Labels: bug
#580 - The Docker container stops when using `python3 scripts/launch_triton_server.py --world_size 1 --model_repo=model_repo/` as the starting command in the Docker Compose YAML file.
Issue -
State: open - Opened by Aquasar11 over 1 year ago
Labels: bug
#579 - unable to launch model with tensorrt_llm
Issue -
State: closed - Opened by janpetrov over 1 year ago
- 8 comments
Labels: bug
#577 - Unable to launch triton server with TP
Issue -
State: open - Opened by dhruvmullick over 1 year ago
Labels: bug
#576 - Unable to launch triton server with TP
Pull Request -
State: closed - Opened by dhruvmullick over 1 year ago
- 1 comment
Labels: bug
#573 - Inference server stalling
Issue -
State: open - Opened by siddhatiwari over 1 year ago
- 5 comments
Labels: bug
#572 - Failed to launch triton server, the tensorrt_llm protobuf file failed to load
Issue -
State: open - Opened by KuntaiDu over 1 year ago
- 2 comments
Labels: bug
#569 - LLAMA3: Unable to launch with tp 2
Issue -
State: open - Opened by mindhash over 1 year ago
Labels: bug
#568 - Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'use_custom_all_reduce' not found;
Issue -
State: closed - Opened by seyunchoi over 1 year ago
- 1 comment
Labels: bug
#566 - Build Qwen2-72B model to TensorRT engines failed
Issue -
State: open - Opened by wangpeilin over 1 year ago
Labels: bug
#565 - Encountered an error when fetching new request: Prompt length (200) exceeds maximum input length (1)
Issue -
State: open - Opened by jayakommuru over 1 year ago
Labels: bug
#564 - v0.11.0 release fails when TP>1
Issue -
State: open - Opened by daulet over 1 year ago
Labels: bug
#563 - Triton crashes on boot
Issue -
State: open - Opened by daulet over 1 year ago
Labels: bug
#562 - Unable to initialize shared memory key 'triton_python_backend_shm_region_2'
Pull Request -
State: closed - Opened by zhangyu68 over 1 year ago
- 1 comment
Labels: bug
#561 - Trying to compile the latest trtllm (under the v0.12 main branch) in triton 24.07-trtllm-python-py3 reports an error
Issue -
State: open - Opened by gzy19990617 over 1 year ago
Labels: bug
#560 - How to calculate the number of loras that can be cached to host cache?
Issue -
State: open - Opened by limertang over 1 year ago
Labels: bug
#559 - How to calculate the number of cached loras
Pull Request -
State: open - Opened by limertang over 1 year ago
Labels: bug
#558 - inflight_batcher_llm example batching
Issue -
State: open - Opened by PKaralupov over 1 year ago
Labels: bug
#557 - `min_length` parameter doesn't work
Issue -
State: open - Opened by vnkc1 over 1 year ago
Labels: bug
#555 - Llama 3.1 Tool-Calling Support
Issue -
State: open - Opened by LanceB57 over 1 year ago
Labels: bug
#554 - get_parameter(model_config, "max_attention_window_size", int) not support list
Issue -
State: open - Opened by Alireza3242 over 1 year ago
Labels: bug
#553 - Server stuck after `Starting Python backend stub`
Issue -
State: open - Opened by DZADSL72-00558 over 1 year ago
Labels: bug
#550 - bugs in v0.10.0 version with tensorrtllm_backend
Issue -
State: closed - Opened by x-transformers over 1 year ago
- 2 comments
Labels: bug
#545 - unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9 using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
Issue -
State: open - Opened by jlewi over 1 year ago
Labels: bug
#542 - Unable to build tensorrt_llm backend; problems with CXX11 ABI
Issue -
State: closed - Opened by jlewi over 1 year ago
- 3 comments
Labels: bug
#540 - Following enc-dec workflow with V100 fails, unable to load shared library libtriton_tensorrtllm_common.so
Issue -
State: open - Opened by owenonline over 1 year ago
Labels: bug
#539 - Model inference error (unexpected shape) when sending async requests for in-flight batching
Issue -
State: closed - Opened by ngockhanh5110 over 1 year ago
Labels: bug
#532 - Achieving Benchmark Performance on Triton Inference Server
Issue -
State: open - Opened by LanceB57 over 1 year ago
Labels: bug
#531 - Deserializing Engine Version Mismatch
Issue -
State: closed - Opened by LanceB57 over 1 year ago
- 1 comment
Labels: bug
#529 - Assertion failed: Cannot determine if hopper is specialised without a selected config at runtime (Latest commit)
Issue -
State: closed - Opened by christian-ci over 1 year ago
Labels: bug
#526 - Invalid argument: unable to find backend library for backend '${triton_backend}'
Issue -
State: open - Opened by chenchunhui97 over 1 year ago
Labels: bug
#525 - Issue Mixtral 8x7b failed to load preprocessing model.
Issue -
State: closed - Opened by christian-ci over 1 year ago
- 1 comment
Labels: bug
#524 - launch multi-gpu triton server and got an Error
Issue -
State: open - Opened by dwq370 over 1 year ago
Labels: bug
#521 - Error: terminate called after throwing an instance of 'boost::interprocess::lock_exception'
Issue -
State: open - Opened by Pedrochem over 1 year ago
Labels: bug
#520 - Ensemble and tensorrt_llm_bls have different results when using accumulate_tokens
Issue -
State: open - Opened by activezhao over 1 year ago
Labels: bug
#514 - Qwen 7B giving error internal: ValueError: Input None is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers
Issue -
State: open - Opened by ankitqx3 over 1 year ago
- 1 comment
Labels: bug
#513 - Accumulation of tokens while beam_width > 1
Issue -
State: open - Opened by wxsms over 1 year ago
Labels: bug
#511 - Exception when disabling "inflight_fused_batching"
Issue -
State: open - Opened by TheCodeWrangler over 1 year ago
Labels: bug
#510 - How to solve the problem of errors when loading qwen1.5-7B (using two GPUs) and llama3-8B (using two GPUs) models simultaneously using tritonserver?
Issue -
State: open - Opened by ChengShuting over 1 year ago
Labels: bug
#509 - 3rd Tritonserver fails to respond
Pull Request -
State: open - Opened by njaramish over 1 year ago
Labels: bug
#508 - Assertion failed: Invalid tensor name: decoder_input_lengths
Issue -
State: open - Opened by HowardChenRV over 1 year ago
Labels: bug
#506 - Key 'lora_config' not found
Issue -
State: open - Opened by LanceB57 over 1 year ago
Labels: bug
#505 - how to set `ignore_eos` when benchmark TensorRT LLM
Issue -
State: closed - Opened by zhyncs over 1 year ago
- 2 comments
Labels: bug
#503 - No Text Output
Issue -
State: open - Opened by Adevils over 1 year ago
Labels: bug
#502 - "error":"Unable to parse 'data': Shape does not match true shape of 'data' field"
Issue -
State: open - Opened by ljm565 over 1 year ago
Labels: bug
#500 - UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
Issue -
State: closed - Opened by Naphat-Khoprasertthaworn over 1 year ago
- 1 comment
Labels: bug
#493 - Deepseek model streaming mode with Chinese character �?
Issue -
State: open - Opened by activezhao over 1 year ago
Labels: bug
#488 - Error in streaming mode noting that execute function should return None
Issue -
State: closed - Opened by kisseternity over 1 year ago
- 2 comments
Labels: bug, triaged, need more info
#487 - Got repeated answer while deploying LLaMA3-Instruct-8B model in triton server
Issue -
State: closed - Opened by AndyZZt over 1 year ago
- 2 comments
Labels: bug
#486 - [Bug] Output generation does not stop at stop token </s>
Issue -
State: closed - Opened by Hao-YunDeng over 1 year ago
- 2 comments
Labels: bug