triton-inference-server/tensorrtllm_backend issues and pull requests

#530 - [Question] Understanding Generation Logits & Context Logits

Issue - State: closed - Opened by here4dadata over 1 year ago - 2 comments
Labels: question, triaged

#488 - Error in streaming mode noting that execute function should return None

Issue - State: closed - Opened by kisseternity over 1 year ago - 2 comments
Labels: bug, triaged, need more info

#482 - Fixed README.md for broken links

Pull Request - State: closed - Opened by buvnswrn over 1 year ago - 1 comment
Labels: triaged

#478 - [Docs] Fixed inference-request.md dead link

Pull Request - State: closed - Opened by DefTruth over 1 year ago - 1 comment
Labels: triaged

#470 - [Bugfix] Launch Triton server without waiting for a signal

Pull Request - State: closed - Opened by michaelnny over 1 year ago - 2 comments
Labels: triaged

#464 - [Bug] Zero temperature curl request affects non-zero temperature requests

Issue - State: closed - Opened by Hao-YunDeng over 1 year ago - 5 comments
Labels: bug, triaged

#449 - FIX link reference in README.md

Pull Request - State: closed - Opened by sunjiabin17 over 1 year ago - 1 comment
Labels: triaged

#447 - [MINOR] Fix typo in README

Pull Request - State: closed - Opened by kooyunmo over 1 year ago - 1 comment
Labels: triaged

#414 - Clarification on KV Cache Configuration Parameters

Issue - State: closed - Opened by un-certainty over 1 year ago - 1 comment
Labels: triaged

#401 - the tensorrtllm backends and onnxruntime backends

Issue - State: closed - Opened by tricky61 over 1 year ago - 2 comments
Labels: triaged

#398 - There is no option to set world_size in config file in model repository

Issue - State: closed - Opened by Saigut over 1 year ago - 2 comments
Labels: triaged

#379 - Deployment of TensorRT-LLM Model on Triton Server

Issue - State: closed - Opened by jasonngap1 over 1 year ago - 2 comments
Labels: triaged

#364 - [BUG] Missing `tokenizer_type` parameter to config.pbtxt

Issue - State: open - Opened by esnvidia almost 2 years ago - 2 comments
Labels: documentation, triaged

#347 - xverse-65b error

Issue - State: closed - Opened by lwbmowgli almost 2 years ago - 6 comments
Labels: triaged

#342 - [Question] Any plan to support mixtral or other MoE model?

Issue - State: closed - Opened by kisseternity almost 2 years ago - 6 comments
Labels: triaged

#306 - Inflight Batching via Python Client

Issue - State: closed - Opened by hackassin almost 2 years ago - 5 comments
Labels: triaged

#287 - Model output keeps generating

Issue - State: closed - Opened by eladamittai almost 2 years ago - 4 comments
Labels: triaged

#284 - Question about the metrics: `tensorrt_llm` (is always 0)

Issue - State: closed - Opened by xihajun almost 2 years ago - 2 comments
Labels: triaged

#280 - [Question] How can I use ensemble model to get output token one at a time before it's sent to the client ?

Issue - State: closed - Opened by ZihanLiao almost 2 years ago - 4 comments
Labels: triaged

#268 - decoupled mode test fail

Issue - State: closed - Opened by wangxshen almost 2 years ago - 3 comments
Labels: question, triaged

#262 - Error occurs when Docker Build. (Option 3. Build via Docker)

Issue - State: closed - Opened by KimMinSang96 almost 2 years ago - 1 comment
Labels: question, triaged

#258 - Triton server core dump when deploying, while trt-llm local run is ok.

Issue - State: closed - Opened by zhaocc1106 almost 2 years ago - 6 comments
Labels: triaged

#254 - Always shows timeout when concurrently requesting

Issue - State: closed - Opened by ZihanLiao almost 2 years ago - 3 comments
Labels: triaged

#252 - About name: "stop" in config.pbtxt

Issue - State: closed - Opened by callmezhangchenchenokay almost 2 years ago - 2 comments
Labels: question, triaged

#247 - Why request_id must be of type int64 instead of string?

Issue - State: closed - Opened by wjj19950828 almost 2 years ago
Labels: triaged

#246 - Assertion failed: input_ids: expected 2 dims, provided 1 dims

Issue - State: closed - Opened by ccf-yang almost 2 years ago - 34 comments
Labels: triaged

#242 - Can I use triton server tensorrtllm backend to host other tensorrt built models? If not what do you suggest if our models stack is mixed of LLM and non-LLM models

Issue - State: open - Opened by zmy1116 almost 2 years ago - 9 comments
Labels: triaged

#238 - tensorrt_llm model metrics are all `0` in ensemble mode

Issue - State: closed - Opened by hxer7963 almost 2 years ago - 2 comments
Labels: triaged

#237 - random_seed does't work

Issue - State: closed - Opened by moseshu almost 2 years ago - 5 comments
Labels: triaged

#230 - About inflight_batcher_llm, In this mode, shouldn't requests be processed as soon as they come in

Issue - State: closed - Opened by callmezhangchenchenokay almost 2 years ago - 11 comments
Labels: triaged

#221 - Where do I set my temperature

Issue - State: closed - Opened by shatealaboxiaowang almost 2 years ago - 14 comments
Labels: triaged

#218 - Does this backend support the Lora model

Issue - State: closed - Opened by runtianchen almost 2 years ago - 5 comments
Labels: triaged

#216 - NGC 23.11-trtllm-python-py3 container does not have tensort_llm backend models or the python module installed?

Issue - State: closed - Opened by nikhilshandilya almost 2 years ago - 3 comments
Labels: question, triaged

#211 - NameError: name 'use_gemm_woq_plugin' is not defined

Issue - State: closed - Opened by wjueyao almost 2 years ago - 7 comments
Labels: bug, triaged

#206 - mpirun noticed that process rank 0 with PID 0 on node ubuntu exited on signal 11 (Segmentation fault).

Issue - State: open - Opened by zhaoxjmail almost 2 years ago - 10 comments
Labels: triaged

#195 - CUDA OOM when running llama2-70b on tp8

Issue - State: closed - Opened by flexwang almost 2 years ago - 6 comments
Labels: triaged

#189 - Concurrent request very slow

Issue - State: closed - Opened by flexwang about 2 years ago - 11 comments
Labels: triaged

#181 - gpt_model_path with Triton's S3 based model repository support

Issue - State: open - Opened by sacdroid about 2 years ago - 3 comments
Labels: triaged

#168 - ERROR: Failed to create instance: unexpected error when creating modelInstanceState: maxTokensInPagedKvCache must be large enough to process at least 1 sequence to completion (i.e. must be larger than beam_width * tokensPerBlock * maxBlocksPerSeq)

Issue - State: closed - Opened by NarenZen about 2 years ago - 15 comments
Labels: triaged

#165 - How to add no_repeat_ngram_size in inflight_batcher_llm ?

Issue - State: closed - Opened by matichon-vultureprime about 2 years ago - 1 comment
Labels: triaged

#111 - Feature request: add Baichuan support

Issue - State: closed - Opened by chrjxj about 2 years ago - 5 comments
Labels: triaged

#105 - Feature request: support multiple model instances on TensorRT LLM triton backend.

Issue - State: closed - Opened by wengsnow about 2 years ago - 16 comments
Labels: triaged, feature request

#100 - Triton server no response when setting end_id in request

Issue - State: open - Opened by CaesarWWK about 2 years ago - 3 comments
Labels: triaged

#98 - tritonserver return error result for codellama

Issue - State: open - Opened by Lzhang-hub about 2 years ago - 7 comments
Labels: triaged

#93 - The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect

Issue - State: closed - Opened by StarrickLiu about 2 years ago - 1 comment
Labels: triaged

#90 - I want to accelerate the BERT model using TensorRT-LLM, why am I encountering an error?

Issue - State: open - Opened by loredunk about 2 years ago - 3 comments
Labels: triaged

#89 - Feature request: Output only generated text

Issue - State: open - Opened by jiangshining about 2 years ago - 3 comments
Labels: triaged, feature request

#88 - Segmentation fault in tritonserver streaming inference with TensorRT Baichuan model

Issue - State: open - Opened by yingjie1011 about 2 years ago - 10 comments
Labels: triaged

#87 - Feature request: Flag indicate end of stream

Issue - State: open - Opened by yunfeng-scale about 2 years ago - 2 comments
Labels: triaged, feature request

#85 - Can Triton server obtain the inference status of a specific request_id during the inference process?

Issue - State: closed - Opened by isky-cd about 2 years ago - 1 comment
Labels: triaged

#84 - stop words

Issue - State: open - Opened by UncleFB about 2 years ago - 3 comments
Labels: triaged

#76 - CUDA runtime error while pressure test

Issue - State: open - Opened by jiangshining about 2 years ago - 11 comments
Labels: triaged

#75 - is support multi node in triton inference server?

Issue - State: open - Opened by amazingkmy about 2 years ago - 4 comments
Labels: triaged

#74 - How to get more than one Inference results with one request?

Issue - State: open - Opened by activezhao about 2 years ago - 11 comments
Labels: triaged

#73 - How to get the parameters introduction in config.pbtxt and the relationship between them?

Issue - State: open - Opened by activezhao about 2 years ago - 2 comments
Labels: documentation, triaged

#67 - [Question]What does the service parameter max_tokens_in_paged_kv_cache mean?

Issue - State: open - Opened by wjj19950828 about 2 years ago - 3 comments
Labels: triaged

#62 - baichuan2-13b exec error

Issue - State: closed - Opened by zhanglv0209 about 2 years ago - 14 comments
Labels: triaged

#61 - Failed to load llama in triton server, parsed error

Issue - State: closed - Opened by GooVincent about 2 years ago - 6 comments
Labels: triaged

#58 - feature request: Expose logprob of output tokens

Issue - State: closed - Opened by yunfeng-scale about 2 years ago - 3 comments
Labels: triaged, feature request

#48 - Unload Model using REST API not relasing GPU memory

Issue - State: closed - Opened by kamalkraj about 2 years ago - 2 comments
Labels: triaged

#47 - The stop_words does not work with codeLlama?

Issue - State: closed - Opened by activezhao about 2 years ago - 38 comments
Labels: triaged

#46 - Failed, NCCL error 'internal error - please report this is sue to the NCCL developers'

Issue - State: closed - Opened by callmezhangchenchenokay about 2 years ago - 9 comments
Labels: triaged

#44 - Error when Building the TensorRT-LLM Backend with Option 3.

Issue - State: closed - Opened by dongluw about 2 years ago - 3 comments
Labels: triaged

#40 - How to load llama. I tried, but I got an error

Issue - State: closed - Opened by UncleFB about 2 years ago - 10 comments
Labels: triaged

#39 - Problems when deploying Triton Server according to the readme

Issue - State: closed - Opened by isky-cd about 2 years ago - 29 comments
Labels: triaged

#37 - LLama2 request receive 400

Issue - State: closed - Opened by sunhailin-Leo about 2 years ago - 9 comments
Labels: triaged

#32 - [0.5.0][Bug] Release build failure

Issue - State: closed - Opened by lanking520 about 2 years ago - 3 comments
Labels: bug, triaged

#30 - dockerfile failed at build_wheel.py. "file STRINGS file "/include/NvInferVersion.h" cannot be read."

Issue - State: closed - Opened by hayleyhu about 2 years ago - 2 comments
Labels: triaged

GitHub / triton-inference-server/tensorrtllm_backend issues and pull requests