GitHub / abetlen/llama-cpp-python issues and pull requests
#2136 - Improve installation DX: prebuilt wheels for 3.13/3.14/3.14t + declarative backend selection
Issue -
State: open - Opened by clemlesne 3 days ago
#2133 - feat: Qwen 3.5 GDN support with hybrid model fixes
Pull Request -
State: open - Opened by r-dh 6 days ago
- 1 comment
#2132 - feat: update llama.cpp submodule and bindings for Qwen 3.5 support
Pull Request -
State: open - Opened by codavidgarcia 6 days ago
- 1 comment
#2131 - feat: Add DeepSeek R1 and distilled model support
Pull Request -
State: open - Opened by ljluestc 9 days ago
#2130 - Pre-built CPU-only wheel for Windows (cp313) for version 0.3.16+ (Gemma 3 support)
Issue -
State: open - Opened by selcukXIII 14 days ago
- 1 comment
#2129 - feat: add streaming tool use (rebased #1884 on latest main)
Pull Request -
State: open - Opened by XyLearningProgramming 15 days ago
#2127 - Link https://abetlen.github.io/llama-cpp-python/whl/cu125 returns 404 Not Found.
Issue -
State: open - Opened by wimtdw 16 days ago
- 5 comments
#2123 - Up to date llama.cpp wheel here (native libraries)
Issue -
State: open - Opened by mdjou 25 days ago
#2121 - fix: correct typos 'seperated' and 'seperator' to 'separated' and 'separator'
Pull Request -
State: open - Opened by thecaptain789 29 days ago
#2114 - Add new maintainers and/or archive this project
Issue -
State: open - Opened by davidmezzetti about 2 months ago
- 8 comments
#2113 - Please add support or tell me if there are ANY wheels for llama.cpp that can run on debian/ubuntu
Issue -
State: open - Opened by Ary5272 about 2 months ago
- 3 comments
#2108 - Update to llama.cpp 2026-01-01
Pull Request -
State: open - Opened by avion23 2 months ago
- 27 comments
#2107 - Build error with LLGUIDANCE
Issue -
State: open - Opened by JeremyBickel 3 months ago
#2106 - Build fails with Target "ggml-cuda" links to: CUDA::cublas but the target was not found.
Issue -
State: open - Opened by prasanthreddy-git 3 months ago
- 3 comments
#2105 - Support for LFM2-VL models
Issue -
State: open - Opened by Borzyszkowski 3 months ago
- 1 comment
#2104 - LLM Loading Failure — AttributeError in LlamaModel.__del__
Issue -
State: open - Opened by 2P2O5 3 months ago
- 2 comments
#2103 - Pre-built wheels for Python 3.14 and 3.14 free-threaded
Issue -
State: open - Opened by clemlesne 3 months ago
- 1 comment
#2102 - Fix issue #2096: Handle URLs with embedded HTTP credentials in _load_image
Pull Request -
State: open - Opened by nMaroulis 3 months ago
#2098 - Add support for Qwen3-vl models
Issue -
State: open - Opened by Hansashawn 3 months ago
- 4 comments
#2097 - Update to the current version of llama.cpp to add support for Qwen Next.
Issue -
State: open - Opened by Kenshiro-28 3 months ago
- 1 comment
#2096 - There is a bug in urlopen() when using image_url with credentials.
Issue -
State: open - Opened by WHJ125 3 months ago
- 1 comment
#2095 - Build Failure When Enabling KleidiAI on ARMv9 in llama-cpp-python ≥ 3.10.0
Issue -
State: open - Opened by ZIFENG278 4 months ago
- 1 comment
#2094 - Build Failure When Enabling KleidiAI on ARMv9 in llama-cpp-python ≥ 3.10.0
Issue -
State: closed - Opened by ZIFENG278 4 months ago
#2093 - Can't not enable KLEIDIAI Feature after 0.3.10
Issue -
State: closed - Opened by ZIFENG278 4 months ago
#2091 - cu128 wheel
Issue -
State: open - Opened by CyberSys 4 months ago
- 1 comment
#2090 - vulkan - windows
Issue -
State: open - Opened by jwijffels 4 months ago
- 3 comments
#2087 - 'LlamaModel' object has no attribute 'sampler'
Issue -
State: closed - Opened by raheel-shahzad 4 months ago
- 1 comment
#2085 - Fixed issue #1938
Pull Request -
State: open - Opened by TNing 4 months ago
#2083 - Include x64 directory for CUDA DLLs on Windows
Pull Request -
State: open - Opened by ajparsons 5 months ago
#2082 - Implement GenerationTagIgnore Jinja2 extension
Pull Request -
State: open - Opened by hidehiroanto 5 months ago
#2081 - how to compile on last gcc?
Issue -
State: open - Opened by wipedlifepotato 5 months ago
- 1 comment
#2080 - Feature Request: support qwen3-vl series
Issue -
State: open - Opened by dahwin 5 months ago
- 23 comments
#2079 - CUDA wheel installs, but GPU is never used on Windows 11 (Python 3.11, CUDA 12.1, torch finds GPU)
Issue -
State: open - Opened by feather528project 5 months ago
- 3 comments
#2078 - Direct image input via PIL instead of Base64
Issue -
State: open - Opened by rudolphos 5 months ago
#2077 - support batch embeddings and zero-copy numpy returns
Pull Request -
State: closed - Opened by kavorite 5 months ago
- 1 comment
#2076 - Periodic alignment with upstream
Issue -
State: open - Opened by handshape 5 months ago
#2075 - Support for MiniCPM-V 4.5
Issue -
State: open - Opened by eximius313 5 months ago
#2074 - AttributeError: function 'llama_get_kv_self' not found. Did you mean: 'llama_get_model'? after compiling llama-cpp-python on Windows
Issue -
State: open - Opened by johannesz-codes 5 months ago
- 2 comments
#2072 - Fixed a few typos in README.md
Pull Request -
State: open - Opened by ImadSaddik 5 months ago
#2071 - Llama.cpp@tags/b6490
Pull Request -
State: open - Opened by LongStoryMedia 6 months ago
#2070 - Window Error 1114, Failed to load shared library on SnapDragon X Plus CPU
Issue -
State: open - Opened by NangMPLwin 6 months ago
#2069 - Expose `ggml_backend_load()` and `ggml_backend_load_all()` to make use of builds with `GGML_BACKEND_DL=ON` and `GGML_CPU_ALL_VARIANTS=ON`
Issue -
State: open - Opened by uwu-420 6 months ago
- 1 comment
#2068 - Where can I download wheel for Cuda 12.8? Trying to install llama.cpp to use with ComfyUI custom nodes.
Issue -
State: closed - Opened by overallbit 6 months ago
- 4 comments
#2066 - Better Qwen2.5-VL chat template.
Pull Request -
State: open - Opened by alcoftTAO 6 months ago
#2065 - unknown model architecture: 'gemma-embedding'
Issue -
State: open - Opened by mariocannistra 6 months ago
- 4 comments
#2064 - llama_get_kv_self debug symbols removed
Issue -
State: open - Opened by Bread7 6 months ago
#2063 - Thinking toggle support for Qwen related models
Issue -
State: open - Opened by Kishlay-notabot 6 months ago
- 1 comment
#2062 - ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9
Issue -
State: open - Opened by sequeirawilson2021 6 months ago
- 2 comments
#2061 - ERROR installing v0.3.16 with CUDA enabled on docker
Issue -
State: open - Opened by arditobryan 6 months ago
- 2 comments
#2060 - [Bug Report] Severe VRAM Allocation Instability in PyTorch after llama-cpp-python is Imported
Issue -
State: open - Opened by rookiestar28 6 months ago
#2059 - fix chat handler class name in docs
Pull Request -
State: open - Opened by anakin87 7 months ago
#2058 - Fix multi-sequence embeddings
Pull Request -
State: open - Opened by iamlemec 7 months ago
- 2 comments
#2057 - Cannot install current version of llama-cpp-python on Windows (backend independent)
Issue -
State: open - Opened by devtobi 7 months ago
#2056 - Update hyperlink to llama.cpp build docs
Pull Request -
State: open - Opened by SleepyYui 7 months ago
#2054 - cannot run fine-tuned gpt-oss model correctly
Issue -
State: open - Opened by jiachenguoNU 7 months ago
#2053 - cannot run fine-tuned gpt-oss model correctly
Issue -
State: closed - Opened by jiachenguoNU 7 months ago
#2052 - Adding Audio capabilities
Issue -
State: open - Opened by haixuanTao 7 months ago
#2051 - Can't compute multiple embeddings in a single call
Issue -
State: open - Opened by jeberger 7 months ago
- 4 comments
#2050 - Can't disable CMAKE ARG on Apple: GGML_METAL=OFF
Issue -
State: open - Opened by brendensoares 7 months ago
#2049 - Small updates to allow for `gpt-oss` generation
Pull Request -
State: open - Opened by iamlemec 7 months ago
#2048 - add support for MXFP4 quantization to enable use of new gpt-oss models by OpenAI
Issue -
State: open - Opened by mariocannistra 7 months ago
#2047 - Build fails on Windows with non-CUDA backends (CLBlast, Vulkan) for versions >= 0.2.78
Issue -
State: closed - Opened by ZapPhoenix 7 months ago
- 2 comments
#2046 - fix: rename op_offloat to op_offload in llama.py
Pull Request -
State: closed - Opened by sergey21000 7 months ago
#2045 - Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913
Issue -
State: open - Opened by akarasulu 8 months ago
#2044 - Add timeout and error handling in FastAPI uvicorn server
Pull Request -
State: open - Opened by amandwivedi45 8 months ago
#2041 - Improve error message when model file is missing
Pull Request -
State: open - Opened by NITHIN0710 8 months ago
#2040 - Better chat format for Qwen2.5-VL
Pull Request -
State: open - Opened by alcoftTAO 8 months ago
#2040 - Better chat format for Qwen2.5-VL
Pull Request -
State: open - Opened by alcoftTAO 8 months ago
#2039 - ARM Runners support CUDA SBSA
Pull Request -
State: open - Opened by johnnynunez 8 months ago
#2038 - Inferencing Flan-T5 - GGML_ASSERT error
Issue -
State: open - Opened by railesDev 8 months ago
#2037 - Error calling `llama_kv_cache_clear` in llama.py with 0.3.10
Issue -
State: closed - Opened by davidmezzetti 8 months ago
- 2 comments
#2036 - Fail to install llama
Issue -
State: closed - Opened by Deeffyy 8 months ago
- 3 comments
#2035 - Windows11:ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)
Issue -
State: open - Opened by Lirsakura 8 months ago
- 3 comments
#2033 - 🚀 MAINTAINED FORK: inference-sh/llama-cpp-python – Active, Up-to-date, Contributors Welcome
Issue -
State: open - Opened by okaris 9 months ago
- 9 comments
#2031 - Gemma 3:4B Multimodal CLIP Error [WinError -529697949] Windows Error 0xe06d7363
Issue -
State: open - Opened by PlatDrake2875 9 months ago
- 1 comment
#2030 - Remove llama_kv_cache_view and deprecations were deleted on llama.cpp side too
Pull Request -
State: open - Opened by serhii-nakon 9 months ago
- 2 comments
#2029 - Access Violation issue facing for exe created using pyinstaller
Issue -
State: open - Opened by maniron214 9 months ago
- 3 comments
#2028 - Building and installing llama_cpp from source for RTX 50 Blackwell GPU
Issue -
State: open - Opened by Johnnyboycurtis 9 months ago
#2027 - Update fork
Pull Request -
State: closed - Opened by benzlokzik 9 months ago
- 1 comment
#2027 - Update fork
Pull Request -
State: open - Opened by benzlokzik 9 months ago
#2026 - llama_cpp/lib/libllama.so: undefined symbol: llama_kv_cache_view_init
Issue -
State: open - Opened by opsec-ai 9 months ago
- 3 comments
#2025 - Fix disk-cache LRU logic
Pull Request -
State: open - Opened by donbcd 9 months ago
#2025 - Fix disk-cache LRU logic
Pull Request -
State: open - Opened by donbcd 9 months ago
#2024 - Build is broken in fedora 42 arm64
Issue -
State: closed - Opened by paul-civitas 9 months ago
- 1 comment
#2023 - Support for jinja for custom chat templates
Issue -
State: open - Opened by Z1EMN1AK 10 months ago
- 1 comment
#2022 - Assertion error when offloading Llama 4 layers to CPU
Issue -
State: open - Opened by BrianStucky-USDA 10 months ago
#2021 - Is it possible to run bitnet.cpp through these bindings ?
Issue -
State: open - Opened by IlyasMoutawwakil 10 months ago
#2020 - Installation URL for CUDA 12.5 in README results in 404 error
Issue -
State: open - Opened by k-inoway 10 months ago
#2018 - Add support for Cohere Command models
Pull Request -
State: open - Opened by handshape 10 months ago
- 1 comment
#2018 - Add support for Cohere Command models
Pull Request -
State: open - Opened by handshape 10 months ago
#2016 - Macos wheel fails on 0.35, works on 0.34
Issue -
State: open - Opened by Alex-EEE 10 months ago
#2015 - Flush libc stdout/stderr in suppress_stdout_stderr
Pull Request -
State: open - Opened by AuroraWright 10 months ago
#2015 - Flush libc stdout/stderr in suppress_stdout_stderr
Pull Request -
State: open - Opened by AuroraWright 10 months ago
#2014 - Is llama-cpp-python supports Llama-4?
Issue -
State: open - Opened by rbgo404 10 months ago
#2013 - Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9
Issue -
State: open - Opened by hunainahmedj 10 months ago
- 19 comments
#2012 - How to install the latest version with GPU support
Issue -
State: open - Opened by shigabeev 10 months ago
#2010 - llama-cpp-python 0.3.8 with CUDA
Issue -
State: open - Opened by SeBL4RD 10 months ago
#2009 - Créer haba
Pull Request -
State: closed - Opened by neuroQuantu 10 months ago
- 2 comments
#2008 - Qwen 3 model not working
Issue -
State: closed - Opened by Kenshiro-28 10 months ago
- 13 comments