GitHub / triton-inference-server/tensorrtllm_backend issues and pull requests
Labelled with: triaged
#530 - [Question] Understanding Generation Logits & Context Logits
Issue -
State: closed - Opened by here4dadata over 1 year ago
- 2 comments
Labels: question, triaged
#488 - Error in streaming mode noting that execute function should return None
Issue -
State: closed - Opened by kisseternity over 1 year ago
- 2 comments
Labels: bug, triaged, need more info
#482 - Fixed README.md for broken links
Pull Request -
State: closed - Opened by buvnswrn over 1 year ago
- 1 comment
Labels: triaged
#478 - [Docs] Fixed inference-request.md dead link
Pull Request -
State: closed - Opened by DefTruth over 1 year ago
- 1 comment
Labels: triaged
#470 - [Bugfix] Launch Triton server without waiting for a signal
Pull Request -
State: closed - Opened by michaelnny over 1 year ago
- 2 comments
Labels: triaged
#464 - [Bug] Zero temperature curl request affects non-zero temperature requests
Issue -
State: closed - Opened by Hao-YunDeng over 1 year ago
- 5 comments
Labels: bug, triaged
#449 - FIX link reference in README.md
Pull Request -
State: closed - Opened by sunjiabin17 over 1 year ago
- 1 comment
Labels: triaged
#447 - [MINOR] Fix typo in README
Pull Request -
State: closed - Opened by kooyunmo over 1 year ago
- 1 comment
Labels: triaged
#414 - Clarification on KV Cache Configuration Parameters
Issue -
State: closed - Opened by un-certainty over 1 year ago
- 1 comment
Labels: triaged
#401 - the tensorrtllm backends and onnxruntime backends
Issue -
State: closed - Opened by tricky61 over 1 year ago
- 2 comments
Labels: triaged
#398 - There is no option to set world_size in config file in model repository
Issue -
State: closed - Opened by Saigut over 1 year ago
- 2 comments
Labels: triaged
#379 - Deployment of TensorRT-LLM Model on Triton Server
Issue -
State: closed - Opened by jasonngap1 over 1 year ago
- 2 comments
Labels: triaged
#364 - [BUG] Missing `tokenizer_type` parameter to config.pbtxt
Issue -
State: open - Opened by esnvidia almost 2 years ago
- 2 comments
Labels: documentation, triaged
#347 - xverse-65b error
Issue -
State: closed - Opened by lwbmowgli almost 2 years ago
- 6 comments
Labels: triaged
#342 - [Question] Any plan to support mixtral or other MoE model?
Issue -
State: closed - Opened by kisseternity almost 2 years ago
- 6 comments
Labels: triaged
#306 - Inflight Batching via Python Client
Issue -
State: closed - Opened by hackassin almost 2 years ago
- 5 comments
Labels: triaged
#287 - Model output keeps generating
Issue -
State: closed - Opened by eladamittai almost 2 years ago
- 4 comments
Labels: triaged
#284 - Question about the metrics: `tensorrt_llm` (is always 0)
Issue -
State: closed - Opened by xihajun almost 2 years ago
- 2 comments
Labels: triaged
#280 - [Question] How can I use ensemble model to get output token one at a time before it's sent to the client ?
Issue -
State: closed - Opened by ZihanLiao almost 2 years ago
- 4 comments
Labels: triaged
#268 - decoupled mode test fail
Issue -
State: closed - Opened by wangxshen almost 2 years ago
- 3 comments
Labels: question, triaged
#262 - Error occurs when Docker Build. (Option 3. Build via Docker)
Issue -
State: closed - Opened by KimMinSang96 almost 2 years ago
- 1 comment
Labels: question, triaged
#258 - Triton server core dump when deploying, while trt-llm local run is ok.
Issue -
State: closed - Opened by zhaocc1106 almost 2 years ago
- 6 comments
Labels: triaged
#254 - Always shows timeout when concurrently requesting
Issue -
State: closed - Opened by ZihanLiao almost 2 years ago
- 3 comments
Labels: triaged
#252 - About name: "stop" in config.pbtxt
Issue -
State: closed - Opened by callmezhangchenchenokay almost 2 years ago
- 2 comments
Labels: question, triaged
#247 - Why request_id must be of type int64 instead of string?
Issue -
State: closed - Opened by wjj19950828 almost 2 years ago
Labels: triaged
#246 - Assertion failed: input_ids: expected 2 dims, provided 1 dims
Issue -
State: closed - Opened by ccf-yang almost 2 years ago
- 34 comments
Labels: triaged
#242 - Can I use triton server tensorrtllm backend to host other tensorrt built models? If not what do you suggest if our models stack is mixed of LLM and non-LLM models
Issue -
State: open - Opened by zmy1116 almost 2 years ago
- 9 comments
Labels: triaged
#238 - tensorrt_llm model metrics are all `0` in ensemble mode
Issue -
State: closed - Opened by hxer7963 almost 2 years ago
- 2 comments
Labels: triaged
#237 - random_seed does't work
Issue -
State: closed - Opened by moseshu almost 2 years ago
- 5 comments
Labels: triaged
#230 - About inflight_batcher_llm, In this mode, shouldn't requests be processed as soon as they come in
Issue -
State: closed - Opened by callmezhangchenchenokay almost 2 years ago
- 11 comments
Labels: triaged
#221 - Where do I set my temperature
Issue -
State: closed - Opened by shatealaboxiaowang almost 2 years ago
- 14 comments
Labels: triaged
#218 - Does this backend support the Lora model
Issue -
State: closed - Opened by runtianchen almost 2 years ago
- 5 comments
Labels: triaged
#216 - NGC 23.11-trtllm-python-py3 container does not have tensort_llm backend models or the python module installed?
Issue -
State: closed - Opened by nikhilshandilya almost 2 years ago
- 3 comments
Labels: question, triaged
#211 - NameError: name 'use_gemm_woq_plugin' is not defined
Issue -
State: closed - Opened by wjueyao almost 2 years ago
- 7 comments
Labels: bug, triaged
#206 - mpirun noticed that process rank 0 with PID 0 on node ubuntu exited on signal 11 (Segmentation fault).
Issue -
State: open - Opened by zhaoxjmail almost 2 years ago
- 10 comments
Labels: triaged
#195 - CUDA OOM when running llama2-70b on tp8
Issue -
State: closed - Opened by flexwang almost 2 years ago
- 6 comments
Labels: triaged
#189 - Concurrent request very slow
Issue -
State: closed - Opened by flexwang about 2 years ago
- 11 comments
Labels: triaged
#181 - gpt_model_path with Triton's S3 based model repository support
Issue -
State: open - Opened by sacdroid about 2 years ago
- 3 comments
Labels: triaged
#168 - ERROR: Failed to create instance: unexpected error when creating modelInstanceState: maxTokensInPagedKvCache must be large enough to process at least 1 sequence to completion (i.e. must be larger than beam_width * tokensPerBlock * maxBlocksPerSeq)
Issue -
State: closed - Opened by NarenZen about 2 years ago
- 15 comments
Labels: triaged
#165 - How to add no_repeat_ngram_size in inflight_batcher_llm ?
Issue -
State: closed - Opened by matichon-vultureprime about 2 years ago
- 1 comment
Labels: triaged
#111 - Feature request: add Baichuan support
Issue -
State: closed - Opened by chrjxj about 2 years ago
- 5 comments
Labels: triaged
#105 - Feature request: support multiple model instances on TensorRT LLM triton backend.
Issue -
State: closed - Opened by wengsnow about 2 years ago
- 16 comments
Labels: triaged, feature request
#100 - Triton server no response when setting end_id in request
Issue -
State: open - Opened by CaesarWWK about 2 years ago
- 3 comments
Labels: triaged
#98 - tritonserver return error result for codellama
Issue -
State: open - Opened by Lzhang-hub about 2 years ago
- 7 comments
Labels: triaged
#93 - The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect
Issue -
State: closed - Opened by StarrickLiu about 2 years ago
- 1 comment
Labels: triaged
#90 - I want to accelerate the BERT model using TensorRT-LLM, why am I encountering an error?
Issue -
State: open - Opened by loredunk about 2 years ago
- 3 comments
Labels: triaged
#89 - Feature request: Output only generated text
Issue -
State: open - Opened by jiangshining about 2 years ago
- 3 comments
Labels: triaged, feature request
#88 - Segmentation fault in tritonserver streaming inference with TensorRT Baichuan model
Issue -
State: open - Opened by yingjie1011 about 2 years ago
- 10 comments
Labels: triaged
#87 - Feature request: Flag indicate end of stream
Issue -
State: open - Opened by yunfeng-scale about 2 years ago
- 2 comments
Labels: triaged, feature request
#85 - Can Triton server obtain the inference status of a specific request_id during the inference process?
Issue -
State: closed - Opened by isky-cd about 2 years ago
- 1 comment
Labels: triaged
#84 - stop words
Issue -
State: open - Opened by UncleFB about 2 years ago
- 3 comments
Labels: triaged
#76 - CUDA runtime error while pressure test
Issue -
State: open - Opened by jiangshining about 2 years ago
- 11 comments
Labels: triaged
#75 - is support multi node in triton inference server?
Issue -
State: open - Opened by amazingkmy about 2 years ago
- 4 comments
Labels: triaged
#74 - How to get more than one Inference results with one request?
Issue -
State: open - Opened by activezhao about 2 years ago
- 11 comments
Labels: triaged
#73 - How to get the parameters introduction in config.pbtxt and the relationship between them?
Issue -
State: open - Opened by activezhao about 2 years ago
- 2 comments
Labels: documentation, triaged
#67 - [Question]What does the service parameter max_tokens_in_paged_kv_cache mean?
Issue -
State: open - Opened by wjj19950828 about 2 years ago
- 3 comments
Labels: triaged
#62 - baichuan2-13b exec error
Issue -
State: closed - Opened by zhanglv0209 about 2 years ago
- 14 comments
Labels: triaged
#61 - Failed to load llama in triton server, parsed error
Issue -
State: closed - Opened by GooVincent about 2 years ago
- 6 comments
Labels: triaged
#58 - feature request: Expose logprob of output tokens
Issue -
State: closed - Opened by yunfeng-scale about 2 years ago
- 3 comments
Labels: triaged, feature request
#48 - Unload Model using REST API not relasing GPU memory
Issue -
State: closed - Opened by kamalkraj about 2 years ago
- 2 comments
Labels: triaged
#47 - The stop_words does not work with codeLlama?
Issue -
State: closed - Opened by activezhao about 2 years ago
- 38 comments
Labels: triaged
#46 - Failed, NCCL error 'internal error - please report this is sue to the NCCL developers'
Issue -
State: closed - Opened by callmezhangchenchenokay about 2 years ago
- 9 comments
Labels: triaged
#44 - Error when Building the TensorRT-LLM Backend with Option 3.
Issue -
State: closed - Opened by dongluw about 2 years ago
- 3 comments
Labels: triaged
#40 - How to load llama. I tried, but I got an error
Issue -
State: closed - Opened by UncleFB about 2 years ago
- 10 comments
Labels: triaged
#39 - Problems when deploying Triton Server according to the readme
Issue -
State: closed - Opened by isky-cd about 2 years ago
- 29 comments
Labels: triaged
#37 - LLama2 request receive 400
Issue -
State: closed - Opened by sunhailin-Leo about 2 years ago
- 9 comments
Labels: triaged
#32 - [0.5.0][Bug] Release build failure
Issue -
State: closed - Opened by lanking520 about 2 years ago
- 3 comments
Labels: bug, triaged
#30 - dockerfile failed at build_wheel.py. "file STRINGS file "/include/NvInferVersion.h" cannot be read."
Issue -
State: closed - Opened by hayleyhu about 2 years ago
- 2 comments
Labels: triaged