Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/TensorRT-LLM issues and pull requests

#2331 - MoE TP vs. EP

Issue - State: closed - Opened by Mutinifni about 1 month ago - 8 comments
Labels: question, triaged, not a bug

#2330 - Why AWQ 4bit takes more ram than we expected?

Issue - State: open - Opened by Alireza3242 about 1 month ago - 4 comments
Labels: question, triaged, quantization, not a bug

#2329 - Why speed does not increase with AWQ

Issue - State: open - Opened by Alireza3242 about 1 month ago
Labels: question, triaged

#2328 - GptManager vs Executorl: Why using an Executor instead of a GptManager the release version?

Issue - State: closed - Opened by Guangjun-A about 1 month ago - 3 comments
Labels: question, triaged, not a bug

#2327 - awq quantization with gemma 2 9b

Issue - State: open - Opened by Alireza3242 about 1 month ago - 5 comments
Labels: bug, triaged, quantization

#2326 - Are there any ways to get QK scores from attention?

Issue - State: closed - Opened by ttim about 1 month ago - 2 comments
Labels: question, triaged, not a bug

#2325 - A bug when building tensorRT image

Issue - State: open - Opened by Noblezhong about 1 month ago - 4 comments
Labels: installation

#2324 - problem in build gemma 2 9B

Issue - State: closed - Opened by Alireza3242 about 1 month ago
Labels: bug

#2323 - AttributeError: '_SyncQueue' object has no attribute 'get'

Issue - State: open - Opened by imadoualid about 1 month ago - 5 comments
Labels: bug, triaged

#2322 - C++ Executor Leader Mode

Issue - State: closed - Opened by Guangjun-A about 1 month ago - 2 comments
Labels: question, triaged, not a bug

#2321 - logits_post_processor not work for ModelRunnerCpp

Issue - State: open - Opened by GooVincent about 1 month ago
Labels: triaged, feature request, runtime

#2320 - TRT-LLM Support for Llama3.2

Issue - State: open - Opened by JoJoLev about 1 month ago - 12 comments
Labels: Investigating, new model

#2319 - Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant

Issue - State: open - Opened by liu21yd about 1 month ago - 2 comments
Labels: bug, triaged

#2318 - [feature request] prepopulatedPromptLen for executor::Response

Issue - State: open - Opened by akhoroshev about 1 month ago - 1 comment
Labels: triaged, feature request, runtime

#2317 - Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models

Issue - State: open - Opened by jasonngap1 about 1 month ago
Labels: question, triaged, runtime

#2316 - docs: clarify the slurm case

Pull Request - State: closed - Opened by stas00 about 1 month ago - 2 comments
Labels: documentation, triaged, Merged

#2315 - Add missing headers for mpiUtils.h to compile with gcc13

Pull Request - State: closed - Opened by mfuntowicz about 1 month ago - 1 comment
Labels: build

#2314 - Regarding the GPU memory usage and inference speed issues of the qwen2 0.5b model

Issue - State: open - Opened by GuangyanZhang about 1 month ago
Labels: question, triaged, performance issue

#2313 - Phi-3-mini-128k error

Issue - State: closed - Opened by scuizhibin about 1 month ago - 2 comments
Labels: bug

#2312 - question about flased multi head attention in trtllm-build

Issue - State: open - Opened by yoon5862 about 1 month ago
Labels: question, triaged

#2311 - In-flight batching and mixed batch

Issue - State: closed - Opened by huijjj about 1 month ago - 5 comments
Labels: question, triaged, performance issue, runtime

#2310 - qwen2_1.5b+tp4 convert_checkpoint failed

Issue - State: open - Opened by sun2011yao about 1 month ago - 3 comments
Labels: bug, triaged

#2309 - kv cache quant lead to model accuracy loss serious?

Issue - State: open - Opened by liguodongiot about 1 month ago - 1 comment
Labels: question, triaged, quantization

#2308 - CPU Inference

Issue - State: closed - Opened by JocelynPanPan about 1 month ago - 1 comment
Labels: question

#2307 - [question] How to achieve maximum GPU utilization with TensoRT-LLM lib using ```openai-server.py```

Issue - State: open - Opened by thehumit about 1 month ago - 1 comment
Labels: question, triaged, performance issue

#2306 - request interruption

Issue - State: open - Opened by weizhi-wang about 1 month ago - 1 comment
Labels: Investigating

#2305 - Inference hangs after running Llama 3.1 8B engine built with either TP=4 or PP=4 but works ok if built with TP=1

Issue - State: open - Opened by imihic about 1 month ago
Labels: bug, triaged, Triton backend

#2303 - llama examples fail to run

Issue - State: open - Opened by stas00 about 1 month ago
Labels: bug, triaged

#2302 - llama 3.2 checkpoint conversion fails

Issue - State: open - Opened by stas00 about 1 month ago - 11 comments
Labels: documentation, question

#2301 - crashing on exit

Issue - State: closed - Opened by stas00 about 1 month ago - 1 comment
Labels: question

#2300 - Performance of W4A8 throughput on Hopper GPU.

Issue - State: open - Opened by zkf331 about 1 month ago - 1 comment
Labels: question, triaged, performance issue

#2297 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux about 1 month ago

#2296 - [Question] Support Async Senc/Recv?

Issue - State: open - Opened by jiahy0825 about 1 month ago
Labels: question, triaged

#2295 - Kv cache reuse is not taken account of in request scheduling

Issue - State: open - Opened by pankajroark about 1 month ago - 1 comment
Labels: bug, triaged, runtime

#2294 - Succeeded in Python runtime, but failed in C++ runtime

Issue - State: closed - Opened by yjjuan about 1 month ago - 4 comments
Labels: bug, triaged, runtime

#2293 - how to use row major weight data int8_sq_launcher?

Issue - State: closed - Opened by zhink about 1 month ago

#2292 - usage in deepstream

Issue - State: open - Opened by haiderasad about 1 month ago - 2 comments
Labels: question, triaged

#2290 - Fixed minor typo in advanced docs

Pull Request - State: closed - Opened by SachinVarghese about 1 month ago - 2 comments
Labels: documentation, Merged

#2289 - old datasets requirement

Issue - State: open - Opened by stas00 about 1 month ago
Labels: triaged, feature request

#2288 - need a copy code widget to be able to copy code snippets

Issue - State: open - Opened by stas00 about 1 month ago - 2 comments
Labels: documentation, triaged, feature request, not a bug

#2287 - outdated doc

Issue - State: open - Opened by stas00 about 1 month ago - 2 comments
Labels: documentation, triaged, not a bug

#2286 - various mpi4py issues and solutions

Issue - State: closed - Opened by stas00 about 1 month ago - 4 comments
Labels: bug, triaged

#2285 - doc: add the missing BF16

Pull Request - State: closed - Opened by stas00 about 1 month ago
Labels: documentation, Merged

#2284 - ModelRunnerCpp throws UnboundLocalError: local variable 'vocab_size' referenced before assignment

Issue - State: open - Opened by jxchenus about 1 month ago - 5 comments
Labels: bug, triaged, stale, waiting for feedback

#2283 - issues using Lora adaptors with Mistral Nemo

Issue - State: closed - Opened by kristoffer-bernhem about 1 month ago - 2 comments
Labels: bug, triaged, waiting for feedback

#2282 - [Encoder-Decoder] LoRA - BART not working - LoraParams and input dims don't match, lora tokens 1 input tokens 0

Issue - State: open - Opened by thanhlt998 about 1 month ago - 1 comment
Labels: triaged, Investigating

#2281 - Can I consider mistral as llama?

Issue - State: open - Opened by liyi-xia about 1 month ago - 1 comment
Labels: question, triaged, not a bug

#2280 - URL for downloading TensorRT 10.4 produces 404 Not Found.

Issue - State: closed - Opened by jxchenus about 1 month ago - 1 comment
Labels: bug

#2279 - RISC-V Support?

Issue - State: closed - Opened by JocelynPanPan about 1 month ago - 1 comment
Labels: question, not a bug

#2278 - Building INT8 Engine for hugging face models

Issue - State: open - Opened by prawin-srini about 1 month ago - 1 comment
Labels: bug, triaged

#2277 - The installation of tensorrt-llm for version 0.11.0 failed

Issue - State: open - Opened by coppock about 1 month ago - 3 comments
Labels: triaged, installation, stale, waiting for feedback

#2276 - Update gh-pages

Pull Request - State: closed - Opened by Shixiaowei02 about 1 month ago

#2275 - Add the known issue to windows installation guide

Pull Request - State: closed - Opened by pamelap-nvidia about 1 month ago

#2273 - Update TensorRT-LLM

Pull Request - State: closed - Opened by DanBlanaru about 1 month ago

#2272 - Wrong output when input is packed in Whisper with C++ runtime

Issue - State: open - Opened by sasikr2 about 1 month ago - 3 comments
Labels: bug, triaged, runtime

#2271 - update gh-pages

Pull Request - State: closed - Opened by Shixiaowei02 about 1 month ago

#2269 - TensorRT-LLM v0.13 Update

Pull Request - State: closed - Opened by Shixiaowei02 about 1 month ago

#2268 - how to calculate the Number of blocks in C++ runtime

Issue - State: open - Opened by w066650 about 2 months ago
Labels: question, triaged, not a bug

#2267 - Qwen2 1.5B checkpoint conversion broken(tensorrt_llm=0.14.0)

Issue - State: closed - Opened by yanglongbiao about 2 months ago - 2 comments
Labels: bug

#2265 - TensorRT python custom layer plugin support

Issue - State: open - Opened by jiahy0825 about 2 months ago - 5 comments
Labels: feature request, not a bug, stale

#2264 - Fix errors when quantizing Llama model

Pull Request - State: open - Opened by dleunji about 2 months ago - 1 comment
Labels: triaged, quantization

#2263 - [Bug] Lookahead decoding is nondeterministic and wrong after the first call to runner.generate

Issue - State: open - Opened by tloen about 2 months ago - 2 comments
Labels: bug, triaged

#2262 - how to use FusedMHARunnerV2?

Issue - State: open - Opened by woaixiaoxiao about 2 months ago
Labels: question, triaged, runtime, not a bug

#2261 - How to run multi-batch with qwenvl?

Issue - State: open - Opened by LiuYi-Up about 2 months ago
Labels: question, triaged, not a bug

#2260 - qwen2(7b) generate duplicate text

Issue - State: closed - Opened by w066650 about 2 months ago - 10 comments
Labels: question, triaged

#2259 - fix: none prompt to string

Pull Request - State: open - Opened by dongs0104 about 2 months ago - 1 comment
Labels: waiting for feedback

#2258 - Bump version to `0.14.0.dev2024092401`

Pull Request - State: closed - Opened by kaiyux about 2 months ago

#2257 - Adding a Model

Issue - State: closed - Opened by lcx1874000 about 2 months ago - 1 comment

#2256 - CUDA 12.2 and would like to know the highest version?

Issue - State: closed - Opened by GuangyanZhang about 2 months ago - 1 comment
Labels: question, triaged, not a bug

#2255 - [bug] --use_paged_context_fmha enable broken

Issue - State: open - Opened by akhoroshev about 2 months ago - 1 comment
Labels: bug, triaged

#2253 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux about 2 months ago

#2252 - Question regarding Executor(BufferView, ...) constructor

Issue - State: closed - Opened by muscowite about 2 months ago - 2 comments
Labels: question, triaged

#2251 - The problem of repeated output of large models in llama3

Issue - State: open - Opened by qimingyangyang about 2 months ago - 3 comments
Labels: triaged, not a bug, stale

#2250 - [issue] C++ runtime support multimodal model llava-one-vision

Issue - State: open - Opened by deepindeed2022 about 2 months ago - 4 comments
Labels: question, triaged, not a bug

#2249 - Support for encoder_repetition_penalty

Issue - State: open - Opened by PKaralupov about 2 months ago - 1 comment
Labels: triaged, feature request, stale

#2248 - KeyError: 'ChatGLMForConditionalGeneration',glm4-9b,

Issue - State: open - Opened by scutzhe about 2 months ago - 3 comments
Labels: feature request, new model

#2247 - Invalid MIT-MAGIC-COOKIE-1 key

Issue - State: open - Opened by sherlcok314159 about 2 months ago - 5 comments
Labels: bug, triaged, stale

#2246 - why is ModelRunneCpp await_responses blocked?

Issue - State: open - Opened by GooVincent about 2 months ago - 10 comments
Labels: question, triaged, stale

#2245 - Does TensorRT-LLM support input_embeds as input?

Issue - State: closed - Opened by OswaldoBornemann about 2 months ago - 1 comment
Labels: question, triaged

#2244 - README.md: Add 3rd Party Inference Speed Dashboard

Pull Request - State: open - Opened by matichon-vultureprime about 2 months ago - 1 comment
Labels: documentation, triaged

#2242 - Does tensorrt llm support llama3.1 sequence classification?

Issue - State: closed - Opened by fan-niu about 2 months ago - 11 comments
Labels: question, triaged

#2241 - The accuracy of trt-llm-qwen-vl-chat is low.

Issue - State: closed - Opened by xiangxinhello about 2 months ago - 1 comment
Labels: triaged, not a bug

#2240 - Linear increase in latency with batch size

Issue - State: closed - Opened by mkserge about 2 months ago - 4 comments
Labels: question, triaged, stale

#2239 - Cannot set earlyStopping 0 when using ModelRunnerCpp with beam size > 1

Issue - State: closed - Opened by PKaralupov about 2 months ago - 4 comments
Labels: triaged, not a bug

#2238 - How to set TensorRT-LLM to use Flash Attention 3

Issue - State: closed - Opened by kanebay about 2 months ago - 4 comments
Labels: question, triaged

#2237 - Working with vllm is much easier than working with tensorrt

Issue - State: closed - Opened by Alireza3242 about 2 months ago - 6 comments
Labels: triaged, feature request

#2236 - metrics compare in gptSession vs gptManager

Issue - State: closed - Opened by ZJLi2013 about 2 months ago

#2234 - Bump version to `0.14.0.dev2024091700`

Pull Request - State: closed - Opened by kaiyux about 2 months ago

#2233 - gemma-2-27b bad outputs

Issue - State: closed - Opened by siddhatiwari about 2 months ago - 3 comments
Labels: bug, stale

#2232 - Fix check_share_embedding

Pull Request - State: closed - Opened by lkm2835 about 2 months ago - 1 comment
Labels: Merged

#2230 - Update TensorRT-LLM

Pull Request - State: closed - Opened by kaiyux about 2 months ago

#2229 - FP8 rowwise support possible for SM89?

Issue - State: closed - Opened by aikitoria about 2 months ago - 4 comments
Labels: stale

#2228 - why do ouput include <|im_end|>

Issue - State: closed - Opened by w066650 2 months ago - 1 comment
Labels: question, triaged

#2227 - whisper-medium decoder Compile blocking

Issue - State: closed - Opened by skyCreateXian 2 months ago - 1 comment
Labels: bug

#2226 - "use_embedding_sharing" option not working for llama model.

Issue - State: closed - Opened by jxchenus 2 months ago - 6 comments
Labels: bug, stale