abetlen/llama-cpp-python issues and pull requests

#1939 - No GPU usage, occupied VRAM but only CPU is working

Issue - State: open - Opened by 0xb1te 3 days ago - 2 comments

#1938 - Specifying additional_files for model files in directory adds additional copy of directory to download URL

Issue - State: open - Opened by zhudotexe 4 days ago

#1937 - Segmentation fault when converting embeddings into tensor

Issue - State: open - Opened by devashishraj 4 days ago

#1936 - Getting seg faults intermittently prior to streaming generation

Issue - State: open - Opened by ekcrisp 4 days ago

#1935 - Using OpenBlas to accelerate has no effect？

Issue - State: open - Opened by buptmengjj 4 days ago

#1933 - I cannot install this package with cuda12.4

Issue - State: open - Opened by BoogonClothman 8 days ago - 3 comments

#1932 - Failed building wheel for llama-cpp-python

Issue - State: open - Opened by pklochowicz 9 days ago - 3 comments

#1931 - `llama.cpp` transfer

Issue - State: open - Opened by d-kleine 10 days ago

#1929 - 0.3.6 (No errors). 0.3.7 Seg Fault?

Issue - State: closed - Opened by icsy7867 11 days ago - 2 comments

#1927 - My system has Vulkan but cmake can't find it while everything else can.

Issue - State: closed - Opened by BlackSwordFIFTY5 13 days ago - 1 comment

#1925 - Please release a cuda build for v0.3.5

Issue - State: open - Opened by ParisNeo 14 days ago - 7 comments

#1924 - Installing Llama cpp python on Debian with no git installed throws an error

Issue - State: open - Opened by ekcrisp 14 days ago

#1923 - Unable to Build llama-cpp-python with Vulkan (Core Dump on Model Load)

Issue - State: open - Opened by Talnz007 15 days ago

#1921 - Distributed Inference

Issue - State: open - Opened by lipere123 16 days ago - 1 comment

#1919 - cc1: error: unknown value ‘native+nodotprod+noi8mm+nosve’ for ‘-mcpu’

Issue - State: closed - Opened by devashishraj 17 days ago - 1 comment

#1918 - Macos Metal Github Release for python 3.12 is broken

Issue - State: open - Opened by haixuanTao 17 days ago

#1917 - See `No CUDA toolset found` log after `-- CUDA Toolkit found.` log

Issue - State: open - Opened by TomaszDlubis 18 days ago - 1 comment

#1916 - How to run CLIP model on GPU?

Issue - State: open - Opened by zhu-j-faceonlive 18 days ago - 1 comment

#1915 - Cuda Build Failed, Please take a look at this

Issue - State: open - Opened by RealUnrealGameDev 20 days ago - 1 comment

#1914 - RPC is broken due to change of interface in llama.cpp main repository (rpc : early register backend devices #11262)

Issue - State: open - Opened by j-lag 22 days ago - 1 comment

#1913 - System.AccessViolationException on llama_backend_init()

Issue - State: closed - Opened by plqplq 24 days ago - 3 comments

#1912 - Migrate to the latest version of llama.cpp APIs and support for DeepSeek models

Pull Request - State: closed - Opened by jaepil 24 days ago - 1 comment

#1911 - fix(docker): correct run.sh path in simple Dockerfile

Pull Request - State: open - Opened by crufter 25 days ago

#1910 - Automate upstream llama.cpp sync

Issue - State: open - Opened by bgs4free 26 days ago - 4 comments

#1909 - Get bug after upgrade v0.3.4 to v0.3.6

Issue - State: closed - Opened by Kar-Su 27 days ago - 1 comment

#1908 - pip install llama-cpp-python got stuck forever at "Configuring CMake" in docker

Issue - State: open - Opened by jiafatom 28 days ago - 4 comments

#1907 - openai API `max_completion_tokens` argument is ignored

Issue - State: open - Opened by BenjaminMarechalEVITECH 28 days ago

#1906 - openai API `n` argument is ignored

Issue - State: open - Opened by BenjaminMarechalEVITECH 28 days ago

#1905 - Wrong chat format for llava 1.5

Issue - State: open - Opened by BenjaminMarechalEVITECH 28 days ago

#1904 - Add minicpm-o and qwen2-vl to the list of supported multimodal models.

Issue - State: open - Opened by kseyhan 28 days ago - 9 comments

#1903 - OSError: exception: access violation reading 0x0000000000000000

Issue - State: open - Opened by andretisch 28 days ago

#1902 - Question - Batch Processing

Issue - State: open - Opened by virentakia about 1 month ago

#1901 - Update llama_cpp: Sync LLAMA_API names with llama.cpp mainline. Needs more testing

Pull Request - State: closed - Opened by JamePeng about 1 month ago - 10 comments

#1900 - DeepSeek-R1-Distill-Qwen-32B-GGUF needs the deepseek-r1-qwen tokenizer

Issue - State: closed - Opened by Kenshiro-28 about 1 month ago - 13 comments

#1899 - chore(deps): bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1

Pull Request - State: open - Opened by dependabot[bot] about 1 month ago
Labels: dependencies, github_actions

#1898 - Fix error showing time spent in llama perf context print

Pull Request - State: closed - Opened by shakalaca about 1 month ago

#1897 - Generate answer from embedding vectors

Issue - State: open - Opened by leewoosub about 1 month ago

#1896 - llama_cpp_python-0.3.4-cp310-CU12 deepseekv3 /home/runner/work/llama-cpp-python/llama-cpp-python/vendor/llama.cpp/src/llama.cpp:5474: GGML_ASSERT(hparams.n_expert <= LLAMA_MAX_EXPERTS) failed

Issue - State: open - Opened by PeifengRen about 1 month ago - 2 comments

#1895 - Does it support the gguf format model of Qwen2-VL-2B-Instruct

Issue - State: open - Opened by helloHKTK about 1 month ago - 5 comments

#1894 - Fix the CUDA workflow

Pull Request - State: closed - Opened by oobabooga about 1 month ago

#1893 - cohere2 architecture issue

Issue - State: open - Opened by MertcanTekin about 1 month ago

#1892 - Dependency issues with uvicorn

Issue - State: open - Opened by jaslatendresse about 1 month ago - 4 comments

#1891 - Error when using .gguf model with llama-cpp: Exception in Llama.del during text generation

Issue - State: closed - Opened by Halibo01 about 1 month ago - 1 comment

#1890 - ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected

Issue - State: closed - Opened by Oussamayousre about 1 month ago

#1889 - Implementation doubt

Issue - State: open - Opened by lmbelo about 2 months ago - 1 comment

#1888 - The C compiler identification is unknown

Issue - State: open - Opened by makrse about 2 months ago

#1886 - MiniCPM-V 2.6 memory leak occurred !!!

Issue - State: open - Opened by Liwx1014 about 2 months ago - 3 comments

#1884 - feat: add streaming tool use

Pull Request - State: open - Opened by lsorber about 2 months ago - 5 comments

#1883 - Feature request: add support for streaming tool use

Issue - State: open - Opened by lsorber about 2 months ago

#1882 - CUDA not detected when running without --gpus all

Issue - State: closed - Opened by nandhiniramanan5 about 2 months ago - 3 comments

#1881 - How to create .so file for llama-cpp-python please help!

Issue - State: open - Opened by ayush20501 about 2 months ago

#1880 - ROCM Build Target or Build Instructions are incorrect

Issue - State: open - Opened by dgdguk about 2 months ago - 2 comments

#1879 - Fix/correct streaming resource lock

Pull Request - State: closed - Opened by gjpower 2 months ago - 3 comments

#1878 - new

Pull Request - State: open - Opened by Harshitha110809 2 months ago

#1877 - Update Dockerfile

Pull Request - State: open - Opened by Smartappli 2 months ago

#1876 - issue tying to install on WSL

Issue - State: open - Opened by silvacarl2 2 months ago - 4 comments

#1874 - Setting verbose=False issues with threads

Issue - State: open - Opened by jeberger 2 months ago - 1 comment

#1872 - Using a pre-built wheel currently requires specifying the right version, e.g. llama-cpp-python==0.3.4

Issue - State: open - Opened by eeegnu 2 months ago - 3 comments

#1871 - fix: replace anyio.Lock with asyncio.Lock to resolve lock handling issues

Pull Request - State: closed - Opened by sergey21000 2 months ago - 3 comments

#1870 - Mambaforge breaks cuda builds, switch to miniforge?

Issue - State: open - Opened by fat-tire 2 months ago - 1 comment

#1869 - chatml-function-callling not adding tool description to the prompt.

Issue - State: open - Opened by undo76 2 months ago

#1868 - Default temperature for llama-cpp-python server

Issue - State: closed - Opened by Qsan1 2 months ago - 2 comments

#1867 - Updated ROCm installation instructions

Pull Request - State: open - Opened by agronholm 2 months ago

#1866 - It seems that LlamaSamplingContext is no longer in use.

Issue - State: open - Opened by Taiki-azrs 2 months ago

#1865 - Confusion regarding operation/terminology of speculative decoding and sampling

Issue - State: open - Opened by MushroomHunting 2 months ago - 2 comments

#1864 - Add: Include option to normalize & truncate embeddings in create_embe…

Pull Request - State: open - Opened by KanishkNavale 2 months ago - 2 comments

#1862 - Fix: Refactor AsyncExitStack usage to resolve lock handling errors

Pull Request - State: closed - Opened by sergey21000 2 months ago - 4 comments

#1861 - Crash due to "The current task is not holding this lock" when {"stream": true}

Issue - State: closed - Opened by Ian321 2 months ago - 7 comments

#1860 - Add an option to enable --runtime-repack in llama.cpp

Issue - State: closed - Opened by ekcrisp 2 months ago - 4 comments

#1858 - fix: add missing await statements for async exit_stack handling

Pull Request - State: closed - Opened by gjpower 2 months ago

#1857 - server chat/completion api fails - coroutine object not callable in llama_proxy

Issue - State: closed - Opened by PurnaChandraPanda 2 months ago - 3 comments

#1856 - "tool_calls" not returning on native http request on a llama cpp server

Issue - State: open - Opened by celsowm 3 months ago

#1855 - llama_get_logits_ith: invalid logits id -1, reason: no logits

Issue - State: open - Opened by devashishraj 3 months ago

#1854 - Request to publish CUDA builds for v0.3.2 to GitHub Releases

Issue - State: open - Opened by lsorber 3 months ago - 3 comments

#1853 - With Intel GPU on Windows, llama_perf_context_print reports invalid performance metrics

Issue - State: open - Opened by dnoliver 3 months ago

#1852 - Windows with Intel GPU fails to build if Ninja is not the selected backend

Issue - State: open - Opened by dnoliver 3 months ago

#1851 - Intel GPU not enabled when using -DLLAVA_BUILD=OFF

Issue - State: open - Opened by dnoliver 3 months ago

#1845 - Fix: add missing 'seed' attribute to llama_context_params initialization

Pull Request - State: closed - Opened by sergey21000 3 months ago - 1 comment

#1844 - chore(deps): bump pypa/cibuildwheel from 2.21.1 to 2.22.0

Pull Request - State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies, github_actions

#1843 - Add support for XTC and DRY samplers

Pull Request - State: open - Opened by zpin 3 months ago - 3 comments

#1842 - Add musa_simple Dockerfile for supporting Moore Threads GPU

Pull Request - State: open - Opened by yeahdongcn 3 months ago

#1841 - [INFORMATION REQUEST] Is it possible to build for GPU enabled target on non-GPU host?

Issue - State: open - Opened by m-o-leary 3 months ago

#1840 - Prebuilt cuda wheel has a GLIBC version mismatch on Ubuntu 20.04.

Issue - State: open - Opened by Moon-404 3 months ago

#1839 - Error when building wheels on Linux FileNotFoundError: [Errno 2] No such file or directory: 'ninja'`

Issue - State: open - Opened by sabaimran 3 months ago - 4 comments

#1838 - Fix: CUDA workflow actions

Pull Request - State: closed - Opened by pabl-o-ce 3 months ago - 1 comment

#1837 - Error when update to 0.3.2

Issue - State: closed - Opened by paoloski97 3 months ago

#1836 - Release the GIL

Issue - State: closed - Opened by simonw 3 months ago - 3 comments

#1835 - updated from 0.2.90 to 0.3.2 and now my GPU won't load

Issue - State: open - Opened by rookiemann 3 months ago - 1 comment

#1834 - use n_threads param to call _embed_image_bytes fun

Pull Request - State: open - Opened by KenForever1 3 months ago

#1833 - Update dev instrunction

Pull Request - State: closed - Opened by Florents-Tselai 3 months ago

#1832 - Mistral-instruct not using system prompt.

Issue - State: closed - Opened by AkiraRy 3 months ago - 2 comments

#1830 - "eval time" and "prompt eval time" is 0.00ms after Ver0.3.0

Issue - State: open - Opened by nai-kon 3 months ago - 2 comments

#1829 - save logits section in eval() sets dtype to np32 apparently unconditionally?

Issue - State: closed - Opened by robbiemu 3 months ago - 1 comment

#1828 - sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='nul' mode='w' encoding='cp932'>

Issue - State: open - Opened by AkiraRy 3 months ago

#1827 - Update related llama.cpp to support Intel AMX instruction

Issue - State: closed - Opened by nai-kon 3 months ago - 1 comment

#1826 - llama-server not using GPU

Issue - State: open - Opened by RakshitAralimatti 3 months ago - 2 comments

#1825 - Update README.md

Pull Request - State: open - Opened by Nik-Kras 4 months ago - 3 comments

#1824 - Request for prebuilt CUDA wheels for newer version

Issue - State: closed - Opened by XJF2332 4 months ago - 10 comments

#1822 - Prebuilt CUDA wheels not working

Issue - State: open - Opened by mjwweb 4 months ago - 2 comments

#1821 - chore(deps): bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0

Pull Request - State: closed - Opened by dependabot[bot] 4 months ago
Labels: dependencies, github_actions

GitHub / abetlen/llama-cpp-python issues and pull requests