ggerganov/llama.cpp issues and pull requests

#9753 - nix: update flake.lock

Pull Request - State: closed - Opened by ggerganov about 1 month ago
Labels: nix

#9752 - ggml : add backend registry / device interfaces to BLAS backend

Pull Request - State: closed - Opened by slaren about 1 month ago
Labels: testing, ggml

#9750 - Problem with using llava_surgery_v2.py

Issue - State: open - Opened by ssykee about 1 month ago
Labels: bug-unconfirmed, high severity

#9748 - Feature Request: Anti-slop / fine tuning of a model output in realtime / on the fly for output quality enhancement.

Issue - State: open - Opened by David-AU-github about 1 month ago
Labels: enhancement

#9747 - Single allocation of encode_async block with non-ARC capture in ggml-metal.m

Pull Request - State: closed - Opened by ptsochantaris about 1 month ago - 1 comment

#9745 - llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch

Pull Request - State: open - Opened by ngxson about 1 month ago - 6 comments
Labels: breaking change, android, examples, server

#9742 - sampling : add XTC sampler

Pull Request - State: closed - Opened by MaggotHATE about 1 month ago - 33 comments
Labels: testing, examples, server

#9738 - Feature Request: multimodal on android

Issue - State: open - Opened by surajat17 about 1 month ago - 2 comments
Labels: enhancement

#9737 - rerank : use [SEP] token instead of [BOS]

Pull Request - State: closed - Opened by ggerganov about 1 month ago
Labels: examples, devops, server

#9734 - vulkan : add GGML_VK_FORCE_HEAP_INDEX env var

Pull Request - State: open - Opened by gyf304 about 1 month ago
Labels: Vulkan, ggml

#9733 - ggml: Add POOL2D OP for GPU ACC to the Vulkan backend in the MobileVLM model.

Pull Request - State: closed - Opened by cyzero-kim about 1 month ago - 5 comments
Labels: Vulkan, ggml

#9724 - Potential GPU Usage During CPU Inference (ngl=0)

Issue - State: open - Opened by RakshitAralimatti about 1 month ago - 5 comments

#9722 - Feature Request: SYCL CI online

Issue - State: closed - Opened by airMeng about 1 month ago - 9 comments
Labels: enhancement

#9721 - vulkan : add backend registry / device interfaces

Pull Request - State: closed - Opened by slaren about 1 month ago - 6 comments
Labels: Vulkan, ggml

#9717 - Update convert_llama_ggml_to_gguf.py

Pull Request - State: closed - Opened by Ahmad986Ferdaws about 1 month ago - 2 comments
Labels: python

#9713 - ggml : add metal backend registry / device

Pull Request - State: closed - Opened by ggerganov about 1 month ago - 5 comments
Labels: script, testing, Nvidia GPU, nix, Vulkan, examples, python, devops, server, ggml, SYCL, Apple Metal, Kompute

#9708 - Bug: win-vulkan-x64 crashed since b3831

Issue - State: open - Opened by cwt about 1 month ago
Labels: bug-unconfirmed, critical severity

#9707 - ggml-backend : add device and backend reg interfaces

Pull Request - State: closed - Opened by slaren about 1 month ago - 2 comments
Labels: script, testing, Nvidia GPU, Vulkan, devops, ggml, SYCL, Apple Metal, Kompute

#9706 - Feature Request: Unify GGML logging mechanism

Issue - State: open - Opened by bandoti about 1 month ago
Labels: enhancement

#9705 - [SYCL] Add SYCL Backend registry, device and Event Interfaces

Pull Request - State: closed - Opened by OuadiElfarouki about 1 month ago - 2 comments
Labels: examples, ggml, SYCL

#9704 - examples : remove benchmark

Pull Request - State: open - Opened by ggerganov about 1 month ago
Labels: examples

#9702 - added implementation of DRY sampler (post-refactor)

Pull Request - State: closed - Opened by wwoodsTM about 1 month ago - 37 comments
Labels: testing, examples, server

#9701 - Bug: llama 3.2 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

Issue - State: closed - Opened by guoriyue about 1 month ago - 3 comments
Labels: bug-unconfirmed, critical severity

#9700 - Feature Request: Support FlashAttention-3

Issue - State: open - Opened by hg0428 about 1 month ago
Labels: enhancement

#9698 - metal : reduce command encoding overhead

Pull Request - State: closed - Opened by ggerganov about 1 month ago
Labels: examples, ggml, Apple Metal

#9697 - ci : reduce severity of unused Pyright ignore comments

Pull Request - State: closed - Opened by compilade about 1 month ago
Labels: examples, python, devops

#9696 - convert : handle tokenizer merges format from transformers 4.45

Pull Request - State: open - Opened by compilade about 1 month ago - 4 comments
Labels: bugfix, Review Complexity : Low, python

#9695 - Bug: quality decreases in embeddings models

Issue - State: open - Opened by Maxon081102 about 1 month ago - 2 comments
Labels: bug-unconfirmed, medium severity

#9694 - update transfomers version.

Pull Request - State: closed - Opened by Vaibhavs10 about 1 month ago
Labels: examples, python, server

#9692 - Bug: cannot find tokenizer merges in model file

Issue - State: closed - Opened by nd791899 about 1 month ago - 11 comments
Labels: bug, high priority, high severity

#9691 - musa: enable docker workflow

Pull Request - State: closed - Opened by yeahdongcn about 1 month ago
Labels: documentation, devops

#9690 - utf-8 fix for windows stdin

Pull Request - State: closed - Opened by hasaranga about 1 month ago

#9687 - llama : first attempt to implement vision API (WIP)

Pull Request - State: open - Opened by ngxson about 1 month ago - 2 comments
Labels: examples, python

#9685 - musa: add docker image support

Pull Request - State: closed - Opened by yeahdongcn about 1 month ago - 1 comment
Labels: documentation, devops

#9684 - ggml : define missing HWCAP flags

Pull Request - State: closed - Opened by ggerganov about 1 month ago
Labels: ggml

#9683 - Use new model class for chameleon conversion

Pull Request - State: closed - Opened by nopperl about 1 month ago
Labels: python

#9680 - nix: update flake.lock

Pull Request - State: closed - Opened by ggerganov about 1 month ago
Labels: nix

#9679 - `server`: cancel non-streamed requests w/ closed connection

Pull Request - State: open - Opened by ochafik about 1 month ago
Labels: examples, python, server

#9678 - Bug: Can't Convert Meta's Chameleon-7B to GGUF (ERROR:hf-to-gguf:Model ChameleonForConditionalGeneration is not supported)

Issue - State: closed - Opened by joseph777111 about 1 month ago - 3 comments
Labels: bug-unconfirmed, medium severity

#9676 - Bug: `illegal hardware instruction` when running on M3 mac Sequoia installed with brew

Issue - State: open - Opened by Ben-Epstein about 1 month ago - 3 comments
Labels: bug-unconfirmed, high severity

#9675 - contrib : add Resources section

Pull Request - State: closed - Opened by ggerganov about 1 month ago

#9674 - Bug: baby-llama fails

Issue - State: open - Opened by sfadaei about 1 month ago - 1 comment
Labels: bug-unconfirmed, stale, medium severity

#9673 - Bug: convert_hf_to_gguf.py - Converting HF model to GGUF giving error Missing tokenizer.model - Qwen2.5 based

Issue - State: closed - Opened by Spacellary about 1 month ago - 1 comment
Labels: bug-unconfirmed, high severity

#9672 - Update building for Android

Pull Request - State: closed - Opened by amqdn about 1 month ago - 26 comments
Labels: documentation, merge ready

#9671 - Bug: Initializing KV Cache Spikes Memory, Crashing on Android

Issue - State: closed - Opened by amqdn about 1 month ago - 4 comments
Labels: bug-unconfirmed, critical severity

#9668 - common: ensure token addition to batch does not exceed llama_batch size

Pull Request - State: closed - Opened by matiaslin about 1 month ago - 3 comments
Labels: build, testing, Vulkan, examples, python, devops, server, ggml, merge ready

#9667 - Bug: llama-parallel crashes when adding more tokens to llama_batch than context size

Issue - State: closed - Opened by matiaslin about 1 month ago
Labels: bug-unconfirmed, low severity

#9666 - Bug: Issue building hipBLAS error: call to undeclared function '_mm256_dpbusd_epi32'

Issue - State: open - Opened by Zhaeong about 1 month ago
Labels: bug-unconfirmed, stale, low severity

#9665 - Bug: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.6, please update your driver to a newer version, or use an earlier cuda container: unknown.

Issue - State: open - Opened by wencan about 1 month ago - 1 comment
Labels: bug-unconfirmed, stale, medium severity

#9664 - Bug: Termux adreno 618 vulkan support

Issue - State: open - Opened by akac97 about 1 month ago
Labels: bug-unconfirmed, critical severity

#9663 - Feature Request: Add Support for MllamaForConditionalGeneration to Convert Llama 3.2 Vision Models to GGUF Format

Issue - State: open - Opened by manishkumart about 1 month ago - 8 comments
Labels: enhancement

#9662 - Dev refactoring

Pull Request - State: closed - Opened by ykhrustalev about 1 month ago - 2 comments
Labels: build, ggml

#9661 - cmake : add option for common library

Pull Request - State: closed - Opened by iboB about 1 month ago
Labels: build

#9659 - Introduce Graph Profiler

Pull Request - State: open - Opened by max-krasnyansky about 1 month ago - 2 comments
Labels: ggml

#9658 - sycl: initial cmake support of SYCL for AMD GPUs

Pull Request - State: open - Opened by Alcpz about 1 month ago - 3 comments
Labels: documentation, SYCL

#9657 - test-backend-ops : use flops for some performance tests

Pull Request - State: closed - Opened by slaren about 1 month ago - 1 comment
Labels: testing

#9656 - Error: llama_model_load: error loading model: failed to open ggml-bagel-2.8b-v0.2-q8_0.gguf

Issue - State: closed - Opened by vineel96 about 1 month ago - 4 comments
Labels: bug-unconfirmed, low severity

#9655 - Docs: Add akx/ollama-dl

Pull Request - State: closed - Opened by akx about 1 month ago

#9652 - Bug: server crashes when embedding model is passed in the -m parameter

Issue - State: open - Opened by mesibo about 1 month ago
Labels: bug-unconfirmed, stale, low severity

#9651 - Feature Request: sgemm.cpp : Q5_0 support

Issue - State: open - Opened by Srihari-mcw about 1 month ago - 3 comments
Labels: enhancement, stale

#9648 - [Draft] Tensor Parallel support to llama.cpp

Pull Request - State: open - Opened by ClarkChin08 about 1 month ago - 2 comments
Labels: ggml, SYCL

#9647 - Resurrect Graph & Op Profiler

Pull Request - State: closed - Opened by max-krasnyansky about 1 month ago - 5 comments
Labels: ggml

#9645 - Feature Request: Molmo 72B vision support

Issue - State: open - Opened by Kreijstal about 1 month ago - 7 comments
Labels: enhancement

#9644 - Bug: IQ3_M is significantly slower than IQ4_XS on AMD, is it expected?

Issue - State: open - Opened by Nekotekina about 1 month ago - 3 comments
Labels: bug-unconfirmed, low severity

#9643 - Llama-3.2 11B Vision Support

Issue - State: open - Opened by yukiarimo about 1 month ago - 31 comments

#9642 - Feature Request: Add support for LLaMA 3.2

Issue - State: closed - Opened by ndavidson19 about 1 month ago
Labels: enhancement

#9641 - Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS

Pull Request - State: closed - Opened by serhii-nakon about 1 month ago - 6 comments
Labels: devops

#9640 - Bug: server (New UI) ChatML templates are wrong

Issue - State: open - Opened by ivanstepanovftw about 1 month ago - 2 comments
Labels: good first issue, server/webui, bug-unconfirmed, medium severity

#9639 - Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars & minimalist Jinja engine

Pull Request - State: open - Opened by ochafik about 1 month ago - 7 comments
Labels: script, testing, examples, python, server

#9638 - ci : fix docker build number and tag name

Pull Request - State: closed - Opened by ngxson about 1 month ago
Labels: devops

#9637 - Add inverse chat template metadata

Pull Request - State: open - Opened by CISC about 1 month ago
Labels: python

#9636 - Bug: Assertion '__n < this->size()' failed.

Issue - State: open - Opened by Luke100000 about 1 month ago
Labels: bug-unconfirmed, stale, high severity

#9635 - server : add more env vars, improve gen-docs

Pull Request - State: closed - Opened by ngxson about 1 month ago
Labels: examples, server

#9633 - Examples: Add text compression example.

Pull Request - State: open - Opened by stduhpf about 1 month ago - 3 comments
Labels: examples

#9632 - Bug: python: can't open file 'llama.cpp/convert.py': [Errno 2] No such file or directory

Issue - State: open - Opened by AmosBunde about 1 month ago - 1 comment
Labels: bug-unconfirmed, stale, low severity

#9631 - Update convert_hf_to_gguf.py

Pull Request - State: closed - Opened by Ahmad986Ferdaws about 2 months ago
Labels: python

#9630 - Do llama.cpp support input_embeds?

Issue - State: open - Opened by OswaldoBornemann about 2 months ago - 3 comments
Labels: bug-unconfirmed, stale, low severity

#9629 - Bug: ggml_cuda_host_malloc: failed to allocate 1900,00 MiB of pinned memory: invalid argument

Issue - State: closed - Opened by XZVB12 about 2 months ago - 2 comments
Labels: bug-unconfirmed, low severity

#9628 - Bug: Failed to run qwen2-57b-a14b-instruct-fp16.

Issue - State: open - Opened by tang-t21 about 2 months ago - 3 comments
Labels: bug, good first issue, high severity

#9627 - [CANN]: Fix crash when running on multiple cann devices

Pull Request - State: closed - Opened by Dou-Git about 2 months ago - 2 comments
Labels: Ascend NPU

#9623 - Bug: [Hardware: ppc64le] On ppc64le llama.cpp only uses 1 thread by default and not half of all threads as it does on x86

Issue - State: open - Opened by mgiessing about 2 months ago
Labels: bug-unconfirmed, stale, low severity

GitHub / ggerganov/llama.cpp issues and pull requests