Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/TensorRT-LLM issues and pull requests

#2028 - update links in overview section of README

Pull Request - State: closed - Opened by Tayef-Shah 4 months ago - 1 comment
Labels: documentation, Merged

#2027 - [Question] In-flight batching with Python

Issue - State: closed - Opened by jayfont 4 months ago - 2 comments
Labels: question, stale

#2026 - Wasteful computations in cross attention?

Issue - State: open - Opened by thefacetakt 4 months ago - 4 comments
Labels: question, stale

#2025 - [Bug] llama3.1-8b smoothquant error (use latest version: 5fa9436)

Issue - State: closed - Opened by fan-niu 4 months ago - 7 comments
Labels: bug, stale, functionality issue

#2024 - [Feature request] Mistral Large 2 support

Issue - State: open - Opened by aikitoria 4 months ago - 4 comments
Labels: feature request, stale, new model

#2023 - Latency increase when using Leader Mode in tritonserver

Issue - State: closed - Opened by junam2 4 months ago - 5 comments
Labels: bug, performance issue, stale, Investigating

#2022 - Different output with transformers lib and tensorrt llm when using lora

Issue - State: open - Opened by Alireza3242 4 months ago - 3 comments
Labels: bug, stale, Investigating, functionality issue

#2020 - [Lookahead] UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: bad optional access

Issue - State: closed - Opened by deepindeed2022 4 months ago - 2 comments
Labels: bug, stale, functionality issue

#2019 - [Bug] Error while converting multimodal Phi 3 Vision model to TRT-LLM checkpoints

Issue - State: closed - Opened by monoclex 4 months ago - 5 comments
Labels: bug, stale, functionality issue

#2014 - failed to use TensorRT-LLM/examples/apps/fastapi_server.py

Issue - State: open - Opened by AGI-player 4 months ago - 10 comments
Labels: stale, Investigating, functionality issue

#2011 - FP8 quantization / KV cache support in CogVLM

Issue - State: closed - Opened by sxyu 4 months ago - 3 comments
Labels: feature request, stale

#2007 - Problem with Qwen2-7B-Instruct inference after quantization in FP8

Issue - State: closed - Opened by VladislavDuma 4 months ago - 3 comments
Labels: bug, stale, others

#2005 - [Bug] Mistral Nemo 12B smoothquant convert error

Issue - State: closed - Opened by fan-niu 4 months ago - 4 comments
Labels: feature request, stale, new model

#2004 - Not found: unable to load shared library: libtensorrt_llm.so: cannot open shared object file: No such file or directory

Issue - State: open - Opened by nikhilcms 4 months ago - 11 comments
Labels: stale, functionality issue

#2001 - Question: Can Context FMHA be used to implement Transformer in a vision encoder for multimodal models?

Issue - State: closed - Opened by lmcl90 4 months ago - 4 comments
Labels: question, stale

#2000 - Add support for interleaved moe

Pull Request - State: closed - Opened by Macchiato123000 4 months ago - 9 comments

#1999 - T5 model, large difference in results when `remove_input_padding` is enabled

Issue - State: open - Opened by ogaloglu 4 months ago - 36 comments
Labels: bug, Investigating, functionality issue

#1998 - make -C docker release_run LOCAL_USER=1 is failing

Issue - State: closed - Opened by jayakommuru 4 months ago - 6 comments
Labels: question

#1995 - Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error

Issue - State: open - Opened by nikhilcms 4 months ago - 5 comments
Labels: stale, Investigating, functionality issue

#1991 - Is it possible to implement a quantize method like Q2_K in llama.cpp?

Issue - State: open - Opened by gloritygithub11 4 months ago - 3 comments
Labels: question

#1990 - Performance issues with TP and PP settings

Issue - State: closed - Opened by luoyang1999 4 months ago - 6 comments
Labels: question, stale

#1986 - how to use tensorrt_llm backend in tritonserver

Issue - State: closed - Opened by AndreWanga 4 months ago - 3 comments
Labels: question, stale

#1985 - Support for Mistral Nemo

Issue - State: open - Opened by hongjunchoi92 4 months ago - 9 comments
Labels: feature request, new model

#1984 - [model support] please support gemma2

Issue - State: open - Opened by lullabies777 4 months ago - 10 comments
Labels: feature request, new model

#1983 - Documentation?

Issue - State: open - Opened by slobodaapl 4 months ago - 1 comment
Labels: documentation, question, stale

#1982 - gptSessionBenchmark failed due to invalid OptProfilerSelector shape

Issue - State: open - Opened by ZJLi2013 4 months ago - 4 comments
Labels: bug, stale, Investigating, functionality issue

#1981 - why fp8_e4m3 min_scaling_factor divide 512?

Issue - State: closed - Opened by suxi1314 4 months ago - 2 comments
Labels: question, stale

#1980 - Error: MOE-FP8 quantize Integer divide-by-zero in H20 (llama-70B fp8 quantize is fine)

Issue - State: open - Opened by joerong666 4 months ago - 3 comments
Labels: stale, Investigating, functionality issue

#1978 - [0.11.0] T5 model running issue

Issue - State: open - Opened by lanking520 4 months ago - 4 comments
Labels: bug, triaged, stale, Investigating, functionality issue

#1967 - trtllm-build qwen2 0.5B failed

Issue - State: open - Opened by wenshuai-xiaomi 4 months ago - 4 comments
Labels: stale, functionality issue

#1965 - fused_multihead_attention_v2 CUDA Error: CUDA_ERROR_INVALID_VALUE

Issue - State: open - Opened by inkinworld 4 months ago - 2 comments
Labels: bug, stale, Investigating, functionality issue

#1962 - [model support] please add support minicpm

Issue - State: open - Opened by LDLINGLINGLING 4 months ago - 1 comment
Labels: feature request, new model

#1959 - Is MPI required even multi device is disabled?

Issue - State: open - Opened by jlewi 4 months ago - 5 comments
Labels: question, Investigating

#1957 - Model Performance Degraded when using BFLOAT16 LoRa Adapters

Issue - State: open - Opened by TheCodeWrangler 4 months ago - 8 comments
Labels: bug, performance issue, Investigating

#1947 - [Feature]: FlashAttention 3 support

Issue - State: open - Opened by fan-niu 4 months ago - 7 comments
Labels: feature request, stale

#1943 - [new] discord channel for tensorrt

Issue - State: open - Opened by geraldstanje 4 months ago - 3 comments
Labels: question

#1942 - Mixtral-8x7B repetitive answers

Issue - State: closed - Opened by BugsBuggy 4 months ago - 4 comments
Labels: bug, Investigating, functionality issue

#1941 - `tensorrt_llm.bindings.Request` class is not usable for non-text inputs

Issue - State: closed - Opened by MahmoudAshraf97 4 months ago - 3 comments
Labels: feature request

#1939 - chore(docs): fix typos

Pull Request - State: closed - Opened by lfz941 4 months ago - 5 comments
Labels: documentation, Merged

#1936 - Correct the version

Pull Request - State: closed - Opened by Shixiaowei02 4 months ago

#1935 - Fix default min length

Pull Request - State: open - Opened by akhoroshev 4 months ago - 3 comments
Labels: triaged

#1934 - [Model Request] InternVL2.0 support

Issue - State: open - Opened by BasicCoder 4 months ago - 6 comments
Labels: feature request, new model

#1931 - [model request] PaliGemma support

Issue - State: open - Opened by kitterive 4 months ago - 3 comments
Labels: feature request, new model

#1930 - failed to load whisper decoder engine with paged kv cache

Issue - State: closed - Opened by MahmoudAshraf97 4 months ago - 7 comments
Labels: bug, functionality issue

#1926 - Add support for falcon2

Pull Request - State: closed - Opened by puneeshkhanna 4 months ago - 5 comments
Labels: triaged, Merged

#1917 - InternLM2 encounters a error when the batch size exceeds 16

Issue - State: open - Opened by Oldpan 4 months ago - 3 comments
Labels: Investigating, functionality issue

#1914 - does NVIDIA L20 GPUs support FP8 quantization?

Issue - State: closed - Opened by jinweida 4 months ago - 10 comments
Labels: question

#1902 - Update setup_build_env.ps1

Pull Request - State: closed - Opened by nero-dv 4 months ago

#1900 - support llava-next model

Issue - State: open - Opened by AmazDeng 4 months ago - 1 comment
Labels: feature request, new model

#1890 - Question about convert Qwen2-7B

Issue - State: open - Opened by sky-fly97 4 months ago - 5 comments
Labels: triaged, waiting for feedback, functionality issue

#1889 - Support Gemma 1.1 model

Issue - State: open - Opened by ttim 4 months ago - 6 comments
Labels: feature request, new model

#1887 - Expand doesn't handle dynamic shaped tensors.

Issue - State: closed - Opened by jxchenus 4 months ago - 3 comments
Labels: not a bug, others

#1868 - llama2 runs normally only on adjacent gpus

Issue - State: closed - Opened by janpetrov 5 months ago - 8 comments
Labels: bug, Investigating, functionality issue

#1839 - No module named 'tensorrt'

Issue - State: closed - Opened by tapansstardog 5 months ago - 11 comments
Labels: triaged, stale

#1832 - Timeline for adding IFB support to more models?

Issue - State: open - Opened by AndyZZt 5 months ago - 7 comments
Labels: triaged, waiting for feedback

#1759 - Internlm2 only runs normally on adjacent GPUs.

Issue - State: closed - Opened by yuanphoenix 5 months ago - 15 comments
Labels: bug, triaged, waiting for feedback

#1758 - DeepSeek MoE support

Pull Request - State: closed - Opened by akhoroshev 5 months ago - 36 comments
Labels: triaged

#1742 - Reference input randomSeeds by idx rather than batchSlot

Pull Request - State: closed - Opened by pathorn 5 months ago - 2 comments
Labels: Merged

#1741 - Quantizing Phi-3 128k Instruct to FP8 fails.

Issue - State: open - Opened by kalradivyanshu 5 months ago - 13 comments
Labels: triaged, feature request, quantization, Investigating, waiting for feedback

#1740 - Performance issue at whisper in many aspects : latency, reproducibility, and more

Issue - State: closed - Opened by lionsheep24 5 months ago - 11 comments
Labels: bug, Investigating

#1731 - Enabling w4a16 and 2:4 sparsity.

Issue - State: closed - Opened by jianyuheng 5 months ago - 2 comments
Labels: question

#1728 - When I used convert_checkpoint.py to convert LLama3 hf format, It print killed

Issue - State: closed - Opened by xjl456852 5 months ago - 4 comments
Labels: bug, triaged

#1711 - [Bug] Output generation does not stop at stop token </s>

Issue - State: closed - Opened by Hao-YunDeng 5 months ago - 5 comments
Labels: triaged, not a bug

#1704 - 24.05-trtllm-python-py3 image size

Issue - State: open - Opened by Prots 6 months ago - 10 comments
Labels: question, triaged, stale

#1700 - MPI Runtime error when running llama3 70B tp_size=8

Issue - State: closed - Opened by WDONG66 6 months ago - 7 comments
Labels: bug, triaged

#1697 - High WER and Incomplete Transcription Issue with Whisper

Issue - State: open - Opened by teith 6 months ago - 9 comments
Labels: bug, triaged, stale

#1676 - quantize.py fails to export important data to config.json (eg rotary scaling)

Issue - State: closed - Opened by janpetrov 6 months ago - 23 comments
Labels: bug, triaged, Investigating

#1657 - [Feature request] Cohere Family of Models (Command-R, Command-R-Plus, Aya23-8B, Aya23-35B, Aya101)

Issue - State: closed - Opened by user-0a 6 months ago - 13 comments
Labels: feature request, new model

#1611 - Add support for non-power-of-two heads with Alibi

Pull Request - State: closed - Opened by vmarkovtsev 6 months ago - 3 comments
Labels: triaged

#1580 - Fail to build int4_awq on Mixtral 8x7b

Issue - State: open - Opened by gloritygithub11 6 months ago - 17 comments
Labels: triaged, feature request, quantization, not a bug

#1576 - Could not found tensorrt after install tensorrt-llm

Issue - State: closed - Opened by thisissum 6 months ago - 2 comments
Labels: bug

#1567 - InternVL-Chat-V1.5 support

Issue - State: open - Opened by chiquitita-101 6 months ago - 6 comments
Labels: feature request

#1544 - Detected layernorm nodes in FP16.

Issue - State: closed - Opened by akhoroshev 6 months ago - 14 comments
Labels: triaged

#1536 - Use first bad_words as extra parameters, and implement min-p

Pull Request - State: open - Opened by pathorn 7 months ago - 2 comments

#1514 - Support SDXL and its distributed inference

Pull Request - State: open - Opened by Zars19 7 months ago - 13 comments
Labels: waiting for feedback

#1506 - How do I specify `max_tokens_in_paged_kv_cache` property during trtllm generation?

Issue - State: closed - Opened by ghost 7 months ago - 5 comments
Labels: question, triaged, Triton backend

#1360 - Support for Cohere Command-R

Issue - State: closed - Opened by tombolano 8 months ago - 7 comments
Labels: feature request

#1343 - Flan t5 xxl result large difference

Issue - State: closed - Opened by sc-gr 8 months ago - 23 comments
Labels: bug, stale

#1296 - Fix examples/server.py returning only one token

Pull Request - State: closed - Opened by pathorn 8 months ago - 1 comment

#1295 - Fix assertion in engine/executor by using TransformersTokenizer

Pull Request - State: closed - Opened by pathorn 8 months ago - 1 comment

#1226 - TensorRT-LLM will support VIT?

Issue - State: closed - Opened by Hukongtao 9 months ago - 3 comments
Labels: feature request

#1213 - Support for text embedding models

Issue - State: open - Opened by SupreethRao99 9 months ago - 2 comments
Labels: feature request

#1188 - gpu memory usage is too high

Issue - State: open - Opened by zaykl 9 months ago - 2 comments
Labels: bug

#1173 - BERT Model is Inaccurate

Issue - State: open - Opened by Broyojo 9 months ago - 10 comments
Labels: bug

#1102 - Pipeline Parallelism slightly faster than single gpu

Issue - State: closed - Opened by SeungsuBaek 9 months ago - 2 comments
Labels: performance issue

#1001 - Update README.md

Pull Request - State: closed - Opened by MustaphaU 10 months ago

#997 - got segment fault and signal code 1 when running llama with fp16

Issue - State: closed - Opened by sharlynxy 10 months ago - 2 comments
Labels: bug

#935 - int8 gemm slower than fp16 on A100.

Issue - State: closed - Opened by beegerous 10 months ago - 6 comments
Labels: triaged

#888 - use TensorRT to accelerate the Reward Model based LLM

Issue - State: open - Opened by XuJing1022 10 months ago - 2 comments

#870 - How can i use embedding as input for llama model?

Issue - State: closed - Opened by littletomatodonkey 10 months ago - 12 comments
Labels: triaged

#853 - Support for other llm like Decilm?

Issue - State: open - Opened by mattm1005 10 months ago - 2 comments
Labels: triaged, feature request, new model

#836 - [feature request] qwen model's query logn-scaling attn

Issue - State: open - Opened by handoku 10 months ago - 9 comments
Labels: feature request

#792 - [Feature Request] support YaRN request

Issue - State: open - Opened by kkr37 11 months ago - 2 comments
Labels: triaged, feature request, new model

#632 - TensorRT-LLM Requests

Issue - State: open - Opened by ncomly-nvidia 11 months ago - 12 comments
Labels: good first issue

#601 - make -C docker run LOCAL_USER=1 FAILED

Issue - State: open - Opened by LoverLost 11 months ago - 15 comments
Labels: triaged

#497 - [Docs] fix docs information error

Pull Request - State: closed - Opened by BasicCoder 12 months ago - 1 comment

#348 - support on V100 GPU

Issue - State: closed - Opened by abhinav-vimal13 about 1 year ago - 15 comments
Labels: duplicate, triaged, build

#334 - Provide an interface similar to OpenAI API

Issue - State: open - Opened by Pevernow about 1 year ago - 21 comments
Labels: triaged, feature request