Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TensorRT-LLM issues and pull requests
#2028 - update links in overview section of README
Pull Request -
State: closed - Opened by Tayef-Shah 4 months ago
- 1 comment
Labels: documentation, Merged
#2027 - [Question] In-flight batching with Python
Issue -
State: closed - Opened by jayfont 4 months ago
- 2 comments
Labels: question, stale
#2026 - Wasteful computations in cross attention?
Issue -
State: open - Opened by thefacetakt 4 months ago
- 4 comments
Labels: question, stale
#2025 - [Bug] llama3.1-8b smoothquant error (use latest version: 5fa9436)
Issue -
State: closed - Opened by fan-niu 4 months ago
- 7 comments
Labels: bug, stale, functionality issue
#2024 - [Feature request] Mistral Large 2 support
Issue -
State: open - Opened by aikitoria 4 months ago
- 4 comments
Labels: feature request, stale, new model
#2023 - Latency increase when using Leader Mode in tritonserver
Issue -
State: closed - Opened by junam2 4 months ago
- 5 comments
Labels: bug, performance issue, stale, Investigating
#2022 - Different output with transformers lib and tensorrt llm when using lora
Issue -
State: open - Opened by Alireza3242 4 months ago
- 3 comments
Labels: bug, stale, Investigating, functionality issue
#2021 - [phi-3-mini-128k-instruct] Triton launch error with 24.06-trtllm-python-py3: [TensorRT-LLM][ERROR] Assertion failed: With communicationMode kLEADER, MPI worldSize is expected to be equal to tp*pp when participantIds are not specified (/tmp/tritonbuild/tensorrtllm/tensorrt_llm/cpp/tensorrt_llm/executor/executorImpl.cpp:356)
Issue -
State: closed - Opened by Ryan-ZL-Lin 4 months ago
- 7 comments
Labels: not a bug, stale, waiting for feedback, functionality issue
#2020 - [Lookahead] UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: bad optional access
Issue -
State: closed - Opened by deepindeed2022 4 months ago
- 2 comments
Labels: bug, stale, functionality issue
#2019 - [Bug] Error while converting multimodal Phi 3 Vision model to TRT-LLM checkpoints
Issue -
State: closed - Opened by monoclex 4 months ago
- 5 comments
Labels: bug, stale, functionality issue
#2014 - failed to use TensorRT-LLM/examples/apps/fastapi_server.py
Issue -
State: open - Opened by AGI-player 4 months ago
- 10 comments
Labels: stale, Investigating, functionality issue
#2011 - FP8 quantization / KV cache support in CogVLM
Issue -
State: closed - Opened by sxyu 4 months ago
- 3 comments
Labels: feature request, stale
#2007 - Problem with Qwen2-7B-Instruct inference after quantization in FP8
Issue -
State: closed - Opened by VladislavDuma 4 months ago
- 3 comments
Labels: bug, stale, others
#2005 - [Bug] Mistral Nemo 12B smoothquant convert error
Issue -
State: closed - Opened by fan-niu 4 months ago
- 4 comments
Labels: feature request, stale, new model
#2004 - Not found: unable to load shared library: libtensorrt_llm.so: cannot open shared object file: No such file or directory
Issue -
State: open - Opened by nikhilcms 4 months ago
- 11 comments
Labels: stale, functionality issue
#2001 - Question: Can Context FMHA be used to implement Transformer in a vision encoder for multimodal models?
Issue -
State: closed - Opened by lmcl90 4 months ago
- 4 comments
Labels: question, stale
#2000 - Add support for interleaved moe
Pull Request -
State: closed - Opened by Macchiato123000 4 months ago
- 9 comments
#1999 - T5 model, large difference in results when `remove_input_padding` is enabled
Issue -
State: open - Opened by ogaloglu 4 months ago
- 36 comments
Labels: bug, Investigating, functionality issue
#1998 - make -C docker release_run LOCAL_USER=1 is failing
Issue -
State: closed - Opened by jayakommuru 4 months ago
- 6 comments
Labels: question
#1995 - Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error
Issue -
State: open - Opened by nikhilcms 4 months ago
- 5 comments
Labels: stale, Investigating, functionality issue
#1991 - Is it possible to implement a quantize method like Q2_K in llama.cpp?
Issue -
State: open - Opened by gloritygithub11 4 months ago
- 3 comments
Labels: question
#1990 - Performance issues with TP and PP settings
Issue -
State: closed - Opened by luoyang1999 4 months ago
- 6 comments
Labels: question, stale
#1986 - how to use tensorrt_llm backend in tritonserver
Issue -
State: closed - Opened by AndreWanga 4 months ago
- 3 comments
Labels: question, stale
#1985 - Support for Mistral Nemo
Issue -
State: open - Opened by hongjunchoi92 4 months ago
- 9 comments
Labels: feature request, new model
#1984 - [model support] please support gemma2
Issue -
State: open - Opened by lullabies777 4 months ago
- 10 comments
Labels: feature request, new model
#1983 - Documentation?
Issue -
State: open - Opened by slobodaapl 4 months ago
- 1 comment
Labels: documentation, question, stale
#1982 - gptSessionBenchmark failed due to invalid OptProfilerSelector shape
Issue -
State: open - Opened by ZJLi2013 4 months ago
- 4 comments
Labels: bug, stale, Investigating, functionality issue
#1981 - why fp8_e4m3 min_scaling_factor divide 512?
Issue -
State: closed - Opened by suxi1314 4 months ago
- 2 comments
Labels: question, stale
#1980 - Error: MOE-FP8 quantize Integer divide-by-zero in H20 (llama-70B fp8 quantize is fine)
Issue -
State: open - Opened by joerong666 4 months ago
- 3 comments
Labels: stale, Investigating, functionality issue
#1978 - [0.11.0] T5 model running issue
Issue -
State: open - Opened by lanking520 4 months ago
- 4 comments
Labels: bug, triaged, stale, Investigating, functionality issue
#1967 - trtllm-build qwen2 0.5B failed
Issue -
State: open - Opened by wenshuai-xiaomi 4 months ago
- 4 comments
Labels: stale, functionality issue
#1965 - fused_multihead_attention_v2 CUDA Error: CUDA_ERROR_INVALID_VALUE
Issue -
State: open - Opened by inkinworld 4 months ago
- 2 comments
Labels: bug, stale, Investigating, functionality issue
#1962 - [model support] please add support minicpm
Issue -
State: open - Opened by LDLINGLINGLING 4 months ago
- 1 comment
Labels: feature request, new model
#1959 - Is MPI required even multi device is disabled?
Issue -
State: open - Opened by jlewi 4 months ago
- 5 comments
Labels: question, Investigating
#1957 - Model Performance Degraded when using BFLOAT16 LoRa Adapters
Issue -
State: open - Opened by TheCodeWrangler 4 months ago
- 8 comments
Labels: bug, performance issue, Investigating
#1947 - [Feature]: FlashAttention 3 support
Issue -
State: open - Opened by fan-niu 4 months ago
- 7 comments
Labels: feature request, stale
#1943 - [new] discord channel for tensorrt
Issue -
State: open - Opened by geraldstanje 4 months ago
- 3 comments
Labels: question
#1942 - Mixtral-8x7B repetitive answers
Issue -
State: closed - Opened by BugsBuggy 4 months ago
- 4 comments
Labels: bug, Investigating, functionality issue
#1941 - `tensorrt_llm.bindings.Request` class is not usable for non-text inputs
Issue -
State: closed - Opened by MahmoudAshraf97 4 months ago
- 3 comments
Labels: feature request
#1939 - chore(docs): fix typos
Pull Request -
State: closed - Opened by lfz941 4 months ago
- 5 comments
Labels: documentation, Merged
#1936 - Correct the version
Pull Request -
State: closed - Opened by Shixiaowei02 4 months ago
#1935 - Fix default min length
Pull Request -
State: open - Opened by akhoroshev 4 months ago
- 3 comments
Labels: triaged
#1934 - [Model Request] InternVL2.0 support
Issue -
State: open - Opened by BasicCoder 4 months ago
- 6 comments
Labels: feature request, new model
#1931 - [model request] PaliGemma support
Issue -
State: open - Opened by kitterive 4 months ago
- 3 comments
Labels: feature request, new model
#1930 - failed to load whisper decoder engine with paged kv cache
Issue -
State: closed - Opened by MahmoudAshraf97 4 months ago
- 7 comments
Labels: bug, functionality issue
#1926 - Add support for falcon2
Pull Request -
State: closed - Opened by puneeshkhanna 4 months ago
- 5 comments
Labels: triaged, Merged
#1917 - InternLM2 encounters a error when the batch size exceeds 16
Issue -
State: open - Opened by Oldpan 4 months ago
- 3 comments
Labels: Investigating, functionality issue
#1914 - does NVIDIA L20 GPUs support FP8 quantization?
Issue -
State: closed - Opened by jinweida 4 months ago
- 10 comments
Labels: question
#1902 - Update setup_build_env.ps1
Pull Request -
State: closed - Opened by nero-dv 4 months ago
#1900 - support llava-next model
Issue -
State: open - Opened by AmazDeng 4 months ago
- 1 comment
Labels: feature request, new model
#1890 - Question about convert Qwen2-7B
Issue -
State: open - Opened by sky-fly97 4 months ago
- 5 comments
Labels: triaged, waiting for feedback, functionality issue
#1889 - Support Gemma 1.1 model
Issue -
State: open - Opened by ttim 4 months ago
- 6 comments
Labels: feature request, new model
#1887 - Expand doesn't handle dynamic shaped tensors.
Issue -
State: closed - Opened by jxchenus 4 months ago
- 3 comments
Labels: not a bug, others
#1868 - llama2 runs normally only on adjacent gpus
Issue -
State: closed - Opened by janpetrov 5 months ago
- 8 comments
Labels: bug, Investigating, functionality issue
#1839 - No module named 'tensorrt'
Issue -
State: closed - Opened by tapansstardog 5 months ago
- 11 comments
Labels: triaged, stale
#1832 - Timeline for adding IFB support to more models?
Issue -
State: open - Opened by AndyZZt 5 months ago
- 7 comments
Labels: triaged, waiting for feedback
#1759 - Internlm2 only runs normally on adjacent GPUs.
Issue -
State: closed - Opened by yuanphoenix 5 months ago
- 15 comments
Labels: bug, triaged, waiting for feedback
#1758 - DeepSeek MoE support
Pull Request -
State: closed - Opened by akhoroshev 5 months ago
- 36 comments
Labels: triaged
#1742 - Reference input randomSeeds by idx rather than batchSlot
Pull Request -
State: closed - Opened by pathorn 5 months ago
- 2 comments
Labels: Merged
#1741 - Quantizing Phi-3 128k Instruct to FP8 fails.
Issue -
State: open - Opened by kalradivyanshu 5 months ago
- 13 comments
Labels: triaged, feature request, quantization, Investigating, waiting for feedback
#1740 - Performance issue at whisper in many aspects : latency, reproducibility, and more
Issue -
State: closed - Opened by lionsheep24 5 months ago
- 11 comments
Labels: bug, Investigating
#1731 - Enabling w4a16 and 2:4 sparsity.
Issue -
State: closed - Opened by jianyuheng 5 months ago
- 2 comments
Labels: question
#1728 - When I used convert_checkpoint.py to convert LLama3 hf format, It print killed
Issue -
State: closed - Opened by xjl456852 5 months ago
- 4 comments
Labels: bug, triaged
#1711 - [Bug] Output generation does not stop at stop token </s>
Issue -
State: closed - Opened by Hao-YunDeng 5 months ago
- 5 comments
Labels: triaged, not a bug
#1704 - 24.05-trtllm-python-py3 image size
Issue -
State: open - Opened by Prots 6 months ago
- 10 comments
Labels: question, triaged, stale
#1700 - MPI Runtime error when running llama3 70B tp_size=8
Issue -
State: closed - Opened by WDONG66 6 months ago
- 7 comments
Labels: bug, triaged
#1697 - High WER and Incomplete Transcription Issue with Whisper
Issue -
State: open - Opened by teith 6 months ago
- 9 comments
Labels: bug, triaged, stale
#1676 - quantize.py fails to export important data to config.json (eg rotary scaling)
Issue -
State: closed - Opened by janpetrov 6 months ago
- 23 comments
Labels: bug, triaged, Investigating
#1657 - [Feature request] Cohere Family of Models (Command-R, Command-R-Plus, Aya23-8B, Aya23-35B, Aya101)
Issue -
State: closed - Opened by user-0a 6 months ago
- 13 comments
Labels: feature request, new model
#1611 - Add support for non-power-of-two heads with Alibi
Pull Request -
State: closed - Opened by vmarkovtsev 6 months ago
- 3 comments
Labels: triaged
#1580 - Fail to build int4_awq on Mixtral 8x7b
Issue -
State: open - Opened by gloritygithub11 6 months ago
- 17 comments
Labels: triaged, feature request, quantization, not a bug
#1576 - Could not found tensorrt after install tensorrt-llm
Issue -
State: closed - Opened by thisissum 6 months ago
- 2 comments
Labels: bug
#1567 - InternVL-Chat-V1.5 support
Issue -
State: open - Opened by chiquitita-101 6 months ago
- 6 comments
Labels: feature request
#1544 - Detected layernorm nodes in FP16.
Issue -
State: closed - Opened by akhoroshev 6 months ago
- 14 comments
Labels: triaged
#1536 - Use first bad_words as extra parameters, and implement min-p
Pull Request -
State: open - Opened by pathorn 7 months ago
- 2 comments
#1514 - Support SDXL and its distributed inference
Pull Request -
State: open - Opened by Zars19 7 months ago
- 13 comments
Labels: waiting for feedback
#1506 - How do I specify `max_tokens_in_paged_kv_cache` property during trtllm generation?
Issue -
State: closed - Opened by ghost 7 months ago
- 5 comments
Labels: question, triaged, Triton backend
#1360 - Support for Cohere Command-R
Issue -
State: closed - Opened by tombolano 8 months ago
- 7 comments
Labels: feature request
#1343 - Flan t5 xxl result large difference
Issue -
State: closed - Opened by sc-gr 8 months ago
- 23 comments
Labels: bug, stale
#1296 - Fix examples/server.py returning only one token
Pull Request -
State: closed - Opened by pathorn 8 months ago
- 1 comment
#1295 - Fix assertion in engine/executor by using TransformersTokenizer
Pull Request -
State: closed - Opened by pathorn 8 months ago
- 1 comment
#1226 - TensorRT-LLM will support VIT?
Issue -
State: closed - Opened by Hukongtao 9 months ago
- 3 comments
Labels: feature request
#1213 - Support for text embedding models
Issue -
State: open - Opened by SupreethRao99 9 months ago
- 2 comments
Labels: feature request
#1210 - How to build TensorRT engine for a finetuned bert model for sequence classification
Issue -
State: open - Opened by parikshitsaikia1619 9 months ago
- 10 comments
#1188 - gpu memory usage is too high
Issue -
State: open - Opened by zaykl 9 months ago
- 2 comments
Labels: bug
#1173 - BERT Model is Inaccurate
Issue -
State: open - Opened by Broyojo 9 months ago
- 10 comments
Labels: bug
#1102 - Pipeline Parallelism slightly faster than single gpu
Issue -
State: closed - Opened by SeungsuBaek 9 months ago
- 2 comments
Labels: performance issue
#1001 - Update README.md
Pull Request -
State: closed - Opened by MustaphaU 10 months ago
#997 - got segment fault and signal code 1 when running llama with fp16
Issue -
State: closed - Opened by sharlynxy 10 months ago
- 2 comments
Labels: bug
#935 - int8 gemm slower than fp16 on A100.
Issue -
State: closed - Opened by beegerous 10 months ago
- 6 comments
Labels: triaged
#888 - use TensorRT to accelerate the Reward Model based LLM
Issue -
State: open - Opened by XuJing1022 10 months ago
- 2 comments
#870 - How can i use embedding as input for llama model?
Issue -
State: closed - Opened by littletomatodonkey 10 months ago
- 12 comments
Labels: triaged
#853 - Support for other llm like Decilm?
Issue -
State: open - Opened by mattm1005 10 months ago
- 2 comments
Labels: triaged, feature request, new model
#836 - [feature request] qwen model's query logn-scaling attn
Issue -
State: open - Opened by handoku 10 months ago
- 9 comments
Labels: feature request
#792 - [Feature Request] support YaRN request
Issue -
State: open - Opened by kkr37 11 months ago
- 2 comments
Labels: triaged, feature request, new model
#632 - TensorRT-LLM Requests
Issue -
State: open - Opened by ncomly-nvidia 11 months ago
- 12 comments
Labels: good first issue
#601 - make -C docker run LOCAL_USER=1 FAILED
Issue -
State: open - Opened by LoverLost 11 months ago
- 15 comments
Labels: triaged
#497 - [Docs] fix docs information error
Pull Request -
State: closed - Opened by BasicCoder 12 months ago
- 1 comment
#348 - support on V100 GPU
Issue -
State: closed - Opened by abhinav-vimal13 about 1 year ago
- 15 comments
Labels: duplicate, triaged, build
#334 - Provide an interface similar to OpenAI API
Issue -
State: open - Opened by Pevernow about 1 year ago
- 21 comments
Labels: triaged, feature request