Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#2331 - MoE TP vs. EP
Issue -
State: closed - Opened by Mutinifni about 1 month ago
- 8 comments
Labels: question, triaged, not a bug
#2330 - Why AWQ 4bit takes more ram than we expected?
Issue -
State: open - Opened by Alireza3242 about 1 month ago
- 4 comments
Labels: question, triaged, quantization, not a bug
#2329 - Why speed does not increase with AWQ
Issue -
State: open - Opened by Alireza3242 about 1 month ago
Labels: question, triaged
#2328 - GptManager vs Executorl: Why using an Executor instead of a GptManager the release version?
Issue -
State: closed - Opened by Guangjun-A about 1 month ago
- 3 comments
Labels: question, triaged, not a bug
#2327 - awq quantization with gemma 2 9b
Issue -
State: open - Opened by Alireza3242 about 1 month ago
- 5 comments
Labels: bug, triaged, quantization
#2326 - Are there any ways to get QK scores from attention?
Issue -
State: closed - Opened by ttim about 1 month ago
- 2 comments
Labels: question, triaged, not a bug
#2325 - A bug when building tensorRT image
Issue -
State: open - Opened by Noblezhong about 1 month ago
- 4 comments
Labels: installation
#2324 - problem in build gemma 2 9B
Issue -
State: closed - Opened by Alireza3242 about 1 month ago
Labels: bug
#2323 - AttributeError: '_SyncQueue' object has no attribute 'get'
Issue -
State: open - Opened by imadoualid about 1 month ago
- 5 comments
Labels: bug, triaged
#2322 - C++ Executor Leader Mode
Issue -
State: closed - Opened by Guangjun-A about 1 month ago
- 2 comments
Labels: question, triaged, not a bug
#2321 - logits_post_processor not work for ModelRunnerCpp
Issue -
State: open - Opened by GooVincent about 1 month ago
Labels: triaged, feature request, runtime
#2320 - TRT-LLM Support for Llama3.2
Issue -
State: open - Opened by JoJoLev about 1 month ago
- 12 comments
Labels: Investigating, new model
#2319 - Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant
Issue -
State: open - Opened by liu21yd about 1 month ago
- 2 comments
Labels: bug, triaged
#2318 - [feature request] prepopulatedPromptLen for executor::Response
Issue -
State: open - Opened by akhoroshev about 1 month ago
- 1 comment
Labels: triaged, feature request, runtime
#2317 - Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models
Issue -
State: open - Opened by jasonngap1 about 1 month ago
Labels: question, triaged, runtime
#2316 - docs: clarify the slurm case
Pull Request -
State: closed - Opened by stas00 about 1 month ago
- 2 comments
Labels: documentation, triaged, Merged
#2315 - Add missing headers for mpiUtils.h to compile with gcc13
Pull Request -
State: closed - Opened by mfuntowicz about 1 month ago
- 1 comment
Labels: build
#2314 - Regarding the GPU memory usage and inference speed issues of the qwen2 0.5b model
Issue -
State: open - Opened by GuangyanZhang about 1 month ago
Labels: question, triaged, performance issue
#2313 - Phi-3-mini-128k error
Issue -
State: closed - Opened by scuizhibin about 1 month ago
- 2 comments
Labels: bug
#2312 - question about flased multi head attention in trtllm-build
Issue -
State: open - Opened by yoon5862 about 1 month ago
Labels: question, triaged
#2311 - In-flight batching and mixed batch
Issue -
State: closed - Opened by huijjj about 1 month ago
- 5 comments
Labels: question, triaged, performance issue, runtime
#2310 - qwen2_1.5b+tp4 convert_checkpoint failed
Issue -
State: open - Opened by sun2011yao about 1 month ago
- 3 comments
Labels: bug, triaged
#2309 - kv cache quant lead to model accuracy loss serious?
Issue -
State: open - Opened by liguodongiot about 1 month ago
- 1 comment
Labels: question, triaged, quantization
#2308 - CPU Inference
Issue -
State: closed - Opened by JocelynPanPan about 1 month ago
- 1 comment
Labels: question
#2307 - [question] How to achieve maximum GPU utilization with TensoRT-LLM lib using ```openai-server.py```
Issue -
State: open - Opened by thehumit about 1 month ago
- 1 comment
Labels: question, triaged, performance issue
#2306 - request interruption
Issue -
State: open - Opened by weizhi-wang about 1 month ago
- 1 comment
Labels: Investigating
#2305 - Inference hangs after running Llama 3.1 8B engine built with either TP=4 or PP=4 but works ok if built with TP=1
Issue -
State: open - Opened by imihic about 1 month ago
Labels: bug, triaged, Triton backend
#2304 - How can I implement and use a customized kernel, or how can I transfer kv_cache from one GPU to another GPU efficiently?
Issue -
State: closed - Opened by GGBond8488 about 1 month ago
- 2 comments
Labels: question, not a bug
#2303 - llama examples fail to run
Issue -
State: open - Opened by stas00 about 1 month ago
Labels: bug, triaged
#2302 - llama 3.2 checkpoint conversion fails
Issue -
State: open - Opened by stas00 about 1 month ago
- 11 comments
Labels: documentation, question
#2301 - crashing on exit
Issue -
State: closed - Opened by stas00 about 1 month ago
- 1 comment
Labels: question
#2300 - Performance of W4A8 throughput on Hopper GPU.
Issue -
State: open - Opened by zkf331 about 1 month ago
- 1 comment
Labels: question, triaged, performance issue
#2297 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux about 1 month ago
#2296 - [Question] Support Async Senc/Recv?
Issue -
State: open - Opened by jiahy0825 about 1 month ago
Labels: question, triaged
#2295 - Kv cache reuse is not taken account of in request scheduling
Issue -
State: open - Opened by pankajroark about 1 month ago
- 1 comment
Labels: bug, triaged, runtime
#2294 - Succeeded in Python runtime, but failed in C++ runtime
Issue -
State: closed - Opened by yjjuan about 1 month ago
- 4 comments
Labels: bug, triaged, runtime
#2293 - how to use row major weight data int8_sq_launcher?
Issue -
State: closed - Opened by zhink about 1 month ago
#2292 - usage in deepstream
Issue -
State: open - Opened by haiderasad about 1 month ago
- 2 comments
Labels: question, triaged
#2291 - how to create cubin.cpp file for arch sm87 agx orin in tensorrt_llm/kernels/decoderMaskedMultiheadAttention /cubin/
Issue -
State: open - Opened by johnsonwag03 about 1 month ago
Labels: question, triaged
#2290 - Fixed minor typo in advanced docs
Pull Request -
State: closed - Opened by SachinVarghese about 1 month ago
- 2 comments
Labels: documentation, Merged
#2289 - old datasets requirement
Issue -
State: open - Opened by stas00 about 1 month ago
Labels: triaged, feature request
#2288 - need a copy code widget to be able to copy code snippets
Issue -
State: open - Opened by stas00 about 1 month ago
- 2 comments
Labels: documentation, triaged, feature request, not a bug
#2287 - outdated doc
Issue -
State: open - Opened by stas00 about 1 month ago
- 2 comments
Labels: documentation, triaged, not a bug
#2286 - various mpi4py issues and solutions
Issue -
State: closed - Opened by stas00 about 1 month ago
- 4 comments
Labels: bug, triaged
#2285 - doc: add the missing BF16
Pull Request -
State: closed - Opened by stas00 about 1 month ago
Labels: documentation, Merged
#2284 - ModelRunnerCpp throws UnboundLocalError: local variable 'vocab_size' referenced before assignment
Issue -
State: open - Opened by jxchenus about 1 month ago
- 5 comments
Labels: bug, triaged, stale, waiting for feedback
#2283 - issues using Lora adaptors with Mistral Nemo
Issue -
State: closed - Opened by kristoffer-bernhem about 1 month ago
- 2 comments
Labels: bug, triaged, waiting for feedback
#2282 - [Encoder-Decoder] LoRA - BART not working - LoraParams and input dims don't match, lora tokens 1 input tokens 0
Issue -
State: open - Opened by thanhlt998 about 1 month ago
- 1 comment
Labels: triaged, Investigating
#2281 - Can I consider mistral as llama?
Issue -
State: open - Opened by liyi-xia about 1 month ago
- 1 comment
Labels: question, triaged, not a bug
#2280 - URL for downloading TensorRT 10.4 produces 404 Not Found.
Issue -
State: closed - Opened by jxchenus about 1 month ago
- 1 comment
Labels: bug
#2279 - RISC-V Support?
Issue -
State: closed - Opened by JocelynPanPan about 1 month ago
- 1 comment
Labels: question, not a bug
#2278 - Building INT8 Engine for hugging face models
Issue -
State: open - Opened by prawin-srini about 1 month ago
- 1 comment
Labels: bug, triaged
#2277 - The installation of tensorrt-llm for version 0.11.0 failed
Issue -
State: open - Opened by coppock about 1 month ago
- 3 comments
Labels: triaged, installation, stale, waiting for feedback
#2276 - Update gh-pages
Pull Request -
State: closed - Opened by Shixiaowei02 about 1 month ago
#2275 - Add the known issue to windows installation guide
Pull Request -
State: closed - Opened by pamelap-nvidia about 1 month ago
#2273 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by DanBlanaru about 1 month ago
#2272 - Wrong output when input is packed in Whisper with C++ runtime
Issue -
State: open - Opened by sasikr2 about 1 month ago
- 3 comments
Labels: bug, triaged, runtime
#2271 - update gh-pages
Pull Request -
State: closed - Opened by Shixiaowei02 about 1 month ago
#2269 - TensorRT-LLM v0.13 Update
Pull Request -
State: closed - Opened by Shixiaowei02 about 1 month ago
#2268 - how to calculate the Number of blocks in C++ runtime
Issue -
State: open - Opened by w066650 about 2 months ago
Labels: question, triaged, not a bug
#2267 - Qwen2 1.5B checkpoint conversion broken(tensorrt_llm=0.14.0)
Issue -
State: closed - Opened by yanglongbiao about 2 months ago
- 2 comments
Labels: bug
#2266 - built tensorrt_llm-0.14.0.dev2024092401-cp310-cp310-linux_aarch64.whl on Jetson AGX Orin Developer Kit 32gb
Issue -
State: closed - Opened by whitesscott about 2 months ago
- 1 comment
Labels: bug
#2265 - TensorRT python custom layer plugin support
Issue -
State: open - Opened by jiahy0825 about 2 months ago
- 5 comments
Labels: feature request, not a bug, stale
#2264 - Fix errors when quantizing Llama model
Pull Request -
State: open - Opened by dleunji about 2 months ago
- 1 comment
Labels: triaged, quantization
#2263 - [Bug] Lookahead decoding is nondeterministic and wrong after the first call to runner.generate
Issue -
State: open - Opened by tloen about 2 months ago
- 2 comments
Labels: bug, triaged
#2262 - how to use FusedMHARunnerV2?
Issue -
State: open - Opened by woaixiaoxiao about 2 months ago
Labels: question, triaged, runtime, not a bug
#2261 - How to run multi-batch with qwenvl?
Issue -
State: open - Opened by LiuYi-Up about 2 months ago
Labels: question, triaged, not a bug
#2260 - qwen2(7b) generate duplicate text
Issue -
State: closed - Opened by w066650 about 2 months ago
- 10 comments
Labels: question, triaged
#2259 - fix: none prompt to string
Pull Request -
State: open - Opened by dongs0104 about 2 months ago
- 1 comment
Labels: waiting for feedback
#2258 - Bump version to `0.14.0.dev2024092401`
Pull Request -
State: closed - Opened by kaiyux about 2 months ago
#2257 - Adding a Model
Issue -
State: closed - Opened by lcx1874000 about 2 months ago
- 1 comment
#2256 - CUDA 12.2 and would like to know the highest version?
Issue -
State: closed - Opened by GuangyanZhang about 2 months ago
- 1 comment
Labels: question, triaged, not a bug
#2255 - [bug] --use_paged_context_fmha enable broken
Issue -
State: open - Opened by akhoroshev about 2 months ago
- 1 comment
Labels: bug, triaged
#2253 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux about 2 months ago
#2252 - Question regarding Executor(BufferView, ...) constructor
Issue -
State: closed - Opened by muscowite about 2 months ago
- 2 comments
Labels: question, triaged
#2251 - The problem of repeated output of large models in llama3
Issue -
State: open - Opened by qimingyangyang about 2 months ago
- 3 comments
Labels: triaged, not a bug, stale
#2250 - [issue] C++ runtime support multimodal model llava-one-vision
Issue -
State: open - Opened by deepindeed2022 about 2 months ago
- 4 comments
Labels: question, triaged, not a bug
#2249 - Support for encoder_repetition_penalty
Issue -
State: open - Opened by PKaralupov about 2 months ago
- 1 comment
Labels: triaged, feature request, stale
#2248 - KeyError: 'ChatGLMForConditionalGeneration',glm4-9b,
Issue -
State: open - Opened by scutzhe about 2 months ago
- 3 comments
Labels: feature request, new model
#2247 - Invalid MIT-MAGIC-COOKIE-1 key
Issue -
State: open - Opened by sherlcok314159 about 2 months ago
- 5 comments
Labels: bug, triaged, stale
#2246 - why is ModelRunneCpp await_responses blocked?
Issue -
State: open - Opened by GooVincent about 2 months ago
- 10 comments
Labels: question, triaged, stale
#2245 - Does TensorRT-LLM support input_embeds as input?
Issue -
State: closed - Opened by OswaldoBornemann about 2 months ago
- 1 comment
Labels: question, triaged
#2244 - README.md: Add 3rd Party Inference Speed Dashboard
Pull Request -
State: open - Opened by matichon-vultureprime about 2 months ago
- 1 comment
Labels: documentation, triaged
#2243 - fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama
Pull Request -
State: closed - Opened by Bhuvanesh09 about 2 months ago
- 4 comments
Labels: Merged
#2242 - Does tensorrt llm support llama3.1 sequence classification?
Issue -
State: closed - Opened by fan-niu about 2 months ago
- 11 comments
Labels: question, triaged
#2241 - The accuracy of trt-llm-qwen-vl-chat is low.
Issue -
State: closed - Opened by xiangxinhello about 2 months ago
- 1 comment
Labels: triaged, not a bug
#2240 - Linear increase in latency with batch size
Issue -
State: closed - Opened by mkserge about 2 months ago
- 4 comments
Labels: question, triaged, stale
#2239 - Cannot set earlyStopping 0 when using ModelRunnerCpp with beam size > 1
Issue -
State: closed - Opened by PKaralupov about 2 months ago
- 4 comments
Labels: triaged, not a bug
#2238 - How to set TensorRT-LLM to use Flash Attention 3
Issue -
State: closed - Opened by kanebay about 2 months ago
- 4 comments
Labels: question, triaged
#2237 - Working with vllm is much easier than working with tensorrt
Issue -
State: closed - Opened by Alireza3242 about 2 months ago
- 6 comments
Labels: triaged, feature request
#2236 - metrics compare in gptSession vs gptManager
Issue -
State: closed - Opened by ZJLi2013 about 2 months ago
#2235 - Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration
Issue -
State: open - Opened by GGBond8488 about 2 months ago
- 3 comments
Labels: question, triaged, stale
#2234 - Bump version to `0.14.0.dev2024091700`
Pull Request -
State: closed - Opened by kaiyux about 2 months ago
#2233 - gemma-2-27b bad outputs
Issue -
State: closed - Opened by siddhatiwari about 2 months ago
- 3 comments
Labels: bug, stale
#2232 - Fix check_share_embedding
Pull Request -
State: closed - Opened by lkm2835 about 2 months ago
- 1 comment
Labels: Merged
#2230 - Update TensorRT-LLM
Pull Request -
State: closed - Opened by kaiyux about 2 months ago
#2229 - FP8 rowwise support possible for SM89?
Issue -
State: closed - Opened by aikitoria about 2 months ago
- 4 comments
Labels: stale
#2228 - why do ouput include <|im_end|>
Issue -
State: closed - Opened by w066650 2 months ago
- 1 comment
Labels: question, triaged
#2227 - whisper-medium decoder Compile blocking
Issue -
State: closed - Opened by skyCreateXian 2 months ago
- 1 comment
Labels: bug
#2226 - "use_embedding_sharing" option not working for llama model.
Issue -
State: closed - Opened by jxchenus 2 months ago
- 6 comments
Labels: bug, stale