ggerganov/llama.cpp issues and pull requests

#9599 - readme: Add offline-ai/cli programmable prompt engine language CLI for llama.cpp server

Pull Request - State: closed - Opened by snowyu about 2 months ago - 3 comments

#9598 - threads: improve ggml_barrier scaling with large number of threads

Pull Request - State: closed - Opened by max-krasnyansky about 2 months ago - 14 comments
Labels: ggml

#9597 - musa: enable VMM support

Pull Request - State: closed - Opened by yeahdongcn about 2 months ago - 3 comments

#9596 - perplexity : remove extra new lines after chunks

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: examples

#9595 - metal : use F32 prec for K*Q in vec FA

Pull Request - State: closed - Opened by ggerganov about 2 months ago

#9594 - CUDA: Enable FP16_MMA for RDNA3 with rocWMMA (PoC)

Pull Request - State: closed - Opened by Nekotekina about 2 months ago - 6 comments
Labels: Nvidia GPU

#9592 - Add basic function calling example using a llama-cli python wrapper

Pull Request - State: open - Opened by dmahurin about 2 months ago
Labels: examples, python

#9591 - Added link to Bielik model

Pull Request - State: closed - Opened by 32bitmicro about 2 months ago

#9589 - ggml: RWKV_WKV: Fix merge error in #9454

Pull Request - State: closed - Opened by MollySophia about 2 months ago

#9588 - Bug: false sharing in threadpool makes ggml_barrier() needlessly slow

Issue - State: closed - Opened by wtarreau about 2 months ago - 1 comment
Labels: bug-unconfirmed, low severity

#9587 - Bug: passing `tfs_z` crashes the server

Issue - State: open - Opened by z80maniac about 2 months ago - 2 comments
Labels: bug-unconfirmed, stale, critical severity

#9586 - nix: update flake.lock

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: nix

#9585 - Feature Request: Support Jina V3 arch

Issue - State: open - Opened by abhishekbhakat about 2 months ago - 5 comments
Labels: enhancement, stale

#9584 - Add theme Rose Pine

Issue - State: open - Opened by k2662 about 2 months ago - 4 comments
Labels: stale

#9583 - Bug: Templates are swapped for Mistral and Llama 2 in llama-server when using --chat-template

Issue - State: open - Opened by StrangeBytesDev about 2 months ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9582 - Bug: Vulkan not compile

Issue - State: closed - Opened by akac97 about 2 months ago - 4 comments
Labels: bug-unconfirmed, critical severity

#9581 - CUDA: enable Gemma FA for HIP/Pascal

Pull Request - State: closed - Opened by JohannesGaessler about 2 months ago
Labels: testing, Nvidia GPU

#9580 - Bug: Gemma2 9B FlashAttention is offloaded to CPU on AMD (HIP)

Issue - State: closed - Opened by Nekotekina about 2 months ago - 1 comment
Labels: bug-unconfirmed, medium severity

#9579 - Revert "[SYCL] fallback mmvq"

Pull Request - State: closed - Opened by qnixsynapse about 2 months ago
Labels: ggml, SYCL

#9578 - Feature Request: Add native int8 pure CUDA Core accelerate for pascal series graphics cards(Like:Tesla P40,Tesla P4)

Issue - State: closed - Opened by SakuraRK about 2 months ago - 2 comments
Labels: enhancement

#9577 - [SYCL] add missed dll file in package

Pull Request - State: closed - Opened by NeoZhangJianyu about 2 months ago
Labels: devops

#9575 - ERROR: Can't Compile llama.cpp on Mac OS Sequoia (September 2024 update)

Issue - State: closed - Opened by joseph777111 about 2 months ago - 5 comments
Labels: bug-unconfirmed, high severity

#9574 - llama: remove redundant loop when constructing ubatch

Pull Request - State: closed - Opened by shankarg87 about 2 months ago
Labels: Review Complexity : Low

#9573 - ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG

Pull Request - State: closed - Opened by slaren about 2 months ago
Labels: ggml

#9572 - Bug: Flash attention reduces vulkan performance by ~50%

Issue - State: closed - Opened by tempstudio about 2 months ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9571 - CUDA: Enable K-shift operation for -ctk q8_0 (limited)

Pull Request - State: closed - Opened by Nekotekina about 2 months ago - 8 comments
Labels: Nvidia GPU

#9570 - quantize : improve type name parsing

Pull Request - State: closed - Opened by slaren about 2 months ago
Labels: examples

#9569 - Bug: Llama-Quantize Not Working with Capital Letters (T^T)

Issue - State: closed - Opened by HatsuneMikuUwU33 about 2 months ago
Labels: bug-unconfirmed, medium severity

#9568 - Bug: ROCM 7900xtx output random garbage with qwen1.5/14B after recent update

Issue - State: open - Opened by sorasoras about 2 months ago - 6 comments
Labels: bug-unconfirmed, stale, critical severity

#9567 - sync : ggml

Pull Request - State: closed - Opened by ggerganov about 2 months ago - 1 comment
Labels: script, testing, Nvidia GPU, Vulkan, ggml, SYCL, Kompute

#9566 - Bug: gguf pypi package corrupts environment

Issue - State: open - Opened by vladmandic about 2 months ago
Labels: bug-unconfirmed, high severity

#9564 - Bug: Release version less accurate than Debug version consistently

Issue - State: closed - Opened by SwamiKannan about 2 months ago - 2 comments
Labels: bug-unconfirmed, low severity

#9563 - Bug: Model isn't loading

Issue - State: open - Opened by iladshyan about 2 months ago - 3 comments
Labels: bug-unconfirmed, stale, high severity

#9562 - CUDA: fix sum.cu compilation for CUDA < 11.7

Pull Request - State: closed - Opened by JohannesGaessler about 2 months ago
Labels: Nvidia GPU, Review Complexity : Low

#9560 - [CANN]Bug: Can't compile ggml/src/CMakeFiles/ggml.dir/ggml-cann/acl_tensor.cpp.o

Issue - State: open - Opened by pangbobi about 2 months ago - 1 comment
Labels: enhancement, Ascend NPU

#9559 - examples : flush log upon ctrl+c

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: examples

#9558 - Bug: llama-cli does not show the results of the performance test when SIGINT

Issue - State: closed - Opened by ownia about 2 months ago - 3 comments
Labels: bug-unconfirmed, medium severity

#9557 - baby-llama : use unnamed namespace in baby_llama_layer

Pull Request - State: open - Opened by danbev about 2 months ago - 10 comments
Labels: examples

#9556 - Bug: llama cpp server arg LLAMA_ARG_N_GPU_LAYERS doesn't follow the same convention as llama cpp python n_gpu_layers

Issue - State: open - Opened by mvonpohle about 2 months ago - 2 comments
Labels: bug-unconfirmed, low severity

#9555 - Bug: Unreadable output from android example project

Issue - State: open - Opened by xunuohope1107 about 2 months ago - 6 comments
Labels: bug-unconfirmed, high severity

#9554 - Bug: Fail to compile after commit 202084d31d4247764fc6d6d40d2e2bda0c89a73a

Issue - State: closed - Opened by AntonioLucibello about 2 months ago - 5 comments
Labels: bug-unconfirmed, high severity

#9552 - Feature Request: Support GRIN-MoE by Microsoft

Issue - State: open - Opened by GlasslessPizza about 2 months ago
Labels: enhancement

#9551 - Bug: KV quantization fails when using vulkan

Issue - State: open - Opened by jmars about 2 months ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9550 - Update CUDA graph on scale change plus clear nodes/params

Pull Request - State: closed - Opened by agray3 about 2 months ago
Labels: Nvidia GPU

#9548 - Perplexity input data should not be unescaped

Pull Request - State: closed - Opened by CISC about 2 months ago
Labels: examples

#9546 - Fix load time calculation error in llama_bench

Pull Request - State: closed - Opened by Septa2112 about 2 months ago - 4 comments
Labels: examples

#9545 - Bug: Build fails on i386 systems

Issue - State: open - Opened by yurivict about 2 months ago - 2 comments
Labels: bug-unconfirmed, Vulkan, low severity

#9544 - server: disable context shift

Pull Request - State: closed - Opened by VJHack about 2 months ago - 6 comments
Labels: examples, server

#9543 - Imatrix input data should not be unescaped

Pull Request - State: closed - Opened by CISC about 2 months ago - 2 comments
Labels: examples

#9542 - Update convert_hf_to_gguf.py

Pull Request - State: closed - Opened by blap about 2 months ago - 1 comment
Labels: python

#9541 - add solar pro support

Pull Request - State: open - Opened by mxyng about 2 months ago - 2 comments
Labels: python

#9540 - Bug: [SYCL] silently failed on windows

Issue - State: closed - Opened by easyfab about 2 months ago - 1 comment
Labels: bug-unconfirmed, critical severity

#9538 - ggml : fix n_threads_cur initialization with one thread

Pull Request - State: closed - Opened by slaren about 2 months ago
Labels: ggml

#9535 - Bug: llama-cli generates incoherent output with full gpu offload

Issue - State: closed - Opened by 8XXD8 about 2 months ago - 3 comments
Labels: bug-unconfirmed, high severity

#9534 - llama : use reserve/emplace_back in sampler_sample

Pull Request - State: closed - Opened by danbev about 2 months ago

#9533 - Error compiling using CUDA on Jetson Orin nx

Issue - State: open - Opened by litao-zhx about 2 months ago - 2 comments

#9532 - Implementations for Q4_0_8_8 quantization based functions - AVX512 version of ggml_gemm_q4_0_8x8_q8_0

Pull Request - State: closed - Opened by Srihari-mcw about 2 months ago - 8 comments
Labels: ggml

#9531 - server : clean-up completed tasks from waiting list

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: examples, server

#9530 - Bug: Lower performance in pre-built binary llama-server, Since llama-b3681-bin-win-cuda-cu12.2.0-x64

Issue - State: closed - Opened by tobchef about 2 months ago - 13 comments
Labels: bug-unconfirmed, medium severity

#9529 - server : fix OpenSSL build by removing invalid `LOG_INFO` references

Pull Request - State: closed - Opened by EZForever about 2 months ago
Labels: examples, server

#9528 - Bug: task ids not removed from waiting_tasks for /v1/chat/completions call

Issue - State: closed - Opened by anagri about 2 months ago - 1 comment
Labels: bug-unconfirmed, medium severity

#9527 - bugfix: structured output response_format does not match openai

Pull Request - State: closed - Opened by VJHack about 2 months ago
Labels: examples, server

#9526 - musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80)

Pull Request - State: closed - Opened by yeahdongcn about 2 months ago
Labels: Nvidia GPU

#9525 - llama: (proposal) propagating the results of `graph_compute` to the user interface

Pull Request - State: open - Opened by Xarbirus about 2 months ago - 9 comments

#9524 - llama-bench: correct argument parsing error message

Pull Request - State: closed - Opened by Xarbirus about 2 months ago
Labels: examples

#9522 - Bug: llama-server structured output response_format does not match openai docs

Issue - State: closed - Opened by Gittingthehubbing about 2 months ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9520 - scripts : verify py deps at the start of compare

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: script, python

#9519 - docs: update server streaming mode documentation

Pull Request - State: open - Opened by CentricStorm about 2 months ago
Labels: examples, server

#9517 - Can't load a Q4 model on 12gb vram

Issue - State: closed - Opened by akagohary about 2 months ago - 1 comment
Labels: bug-unconfirmed, low severity

#9516 - Bug: duplicate vulkan devices being detected on windows

Issue - State: open - Opened by tempstudio about 2 months ago
Labels: bug-unconfirmed, low severity

#9514 - Bug: Crash in Release Mode when built with Xcode 16 (& since Xcode 15.3)

Issue - State: closed - Opened by brittlewis12 about 2 months ago - 6 comments
Labels: bug-unconfirmed, critical severity

#9513 - add env variable for parallel

Pull Request - State: closed - Opened by bertwagner about 2 months ago - 2 comments
Labels: examples, server

#9512 - llama: public llama_n_head

Pull Request - State: closed - Opened by Xarbirus about 2 months ago

#9511 - Fixed n vocab

Pull Request - State: closed - Opened by Xarbirus about 2 months ago

#9510 - llama : add reranking support

Pull Request - State: closed - Opened by ggerganov about 2 months ago - 41 comments
Labels: examples, python, devops, server, merge ready

#9509 - ggml : move common CPU backend impl to new header

Pull Request - State: closed - Opened by slaren about 2 months ago
Labels: ggml

#9508 - llama.cpp: Add a missing header for cpp23

Pull Request - State: closed - Opened by ykhrustalev about 2 months ago

#9507 - metal : increase GPU duty-cycle during inference

Issue - State: closed - Opened by ggerganov about 2 months ago - 1 comment
Labels: help wanted, performance, Apple Metal

#9505 - Bug: Lower performance in SYCL vs IPEX LLM.

Issue - State: open - Opened by adi-lb-phoenix about 2 months ago - 15 comments
Labels: bug-unconfirmed, medium severity

#9504 - llama : rename n_embed to n_embd in rwkv6_time_mix

Pull Request - State: closed - Opened by danbev about 2 months ago

#9502 - Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox

Issue - State: closed - Opened by CentricStorm about 2 months ago - 3 comments
Labels: bug-unconfirmed, medium severity

#9501 - Bug: llama-bench: split-mode flag doesn't recognize argument 'none'

Issue - State: open - Opened by letter-v about 2 months ago - 1 comment
Labels: bug-unconfirmed, stale, low severity

#9499 - gguf-split : add basic checks

Pull Request - State: closed - Opened by slaren about 2 months ago
Labels: examples

#9498 - Bug: can not merge gguf, gguf_init_from_file: invalid magic characters ''

Issue - State: closed - Opened by bss03arg about 2 months ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9497 - CMake: correct order of sycl flags

Pull Request - State: closed - Opened by Xarbirus about 2 months ago - 2 comments

#9496 - [SYCL] fix cmake broken

Pull Request - State: closed - Opened by airMeng about 2 months ago - 3 comments
Labels: devops

#9495 - added null check for llava decode

Pull Request - State: closed - Opened by l3utterfly about 2 months ago

#9493 - Feature Request: RDMA support for rpc back ends

Issue - State: open - Opened by slavonnet about 2 months ago - 2 comments
Labels: enhancement, stale

#9492 - Bug: llama-server api first query very slow

Issue - State: open - Opened by bosmart about 2 months ago - 11 comments
Labels: bug, medium severity

#9490 - Bug: [SYCL] linker fails with undefined reference to symbol

Issue - State: closed - Opened by qnixsynapse about 2 months ago - 3 comments
Labels: bug-unconfirmed, high severity

#9489 - Bug: andriod compiling bug, with vulkan open

Issue - State: open - Opened by bitxsw93 about 2 months ago - 2 comments
Labels: bug-unconfirmed, stale, medium severity

#9488 - nix: update flake.lock

Pull Request - State: closed - Opened by ggerganov about 2 months ago
Labels: nix

#9487 - sycl+intel build fix

Pull Request - State: closed - Opened by Xarbirus about 2 months ago - 2 comments

#9485 - nvidia uses the LLaMAForCausalLM string in their config.json, example…

Pull Request - State: closed - Opened by csabakecskemeti about 2 months ago
Labels: python

#9484 - main: option to disable context shift

Pull Request - State: closed - Opened by VJHack about 2 months ago - 2 comments
Labels: examples, server

#9483 - Bug: ERROR-hf-to-gguf

Issue - State: closed - Opened by xyangyan about 2 months ago - 1 comment
Labels: bug-unconfirmed

#9482 - Update clip.cpp

Pull Request - State: closed - Opened by Tejaakshaykumar about 2 months ago - 5 comments
Labels: examples

#9481 - [CANN]Feature Request: Support OrangeAIPRO 310b CANN

Issue - State: open - Opened by StudyingLover about 2 months ago
Labels: enhancement, Ascend NPU

#9478 - Bug: There is an issue to execute llama-baby-llama.

Issue - State: closed - Opened by Foreverythin about 2 months ago - 2 comments
Labels: bug-unconfirmed, low severity

#9477 - Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server

Issue - State: closed - Opened by jeanromainroy about 2 months ago - 1 comment
Labels: bug-unconfirmed, medium severity

GitHub / ggerganov/llama.cpp issues and pull requests