Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / ggerganov/llama.cpp issues and pull requests
#9599 - readme: Add offline-ai/cli programmable prompt engine language CLI for llama.cpp server
Pull Request -
State: closed - Opened by snowyu about 2 months ago
- 3 comments
#9598 - threads: improve ggml_barrier scaling with large number of threads
Pull Request -
State: closed - Opened by max-krasnyansky about 2 months ago
- 14 comments
Labels: ggml
#9597 - musa: enable VMM support
Pull Request -
State: closed - Opened by yeahdongcn about 2 months ago
- 3 comments
#9596 - perplexity : remove extra new lines after chunks
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: examples
#9595 - metal : use F32 prec for K*Q in vec FA
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
#9594 - CUDA: Enable FP16_MMA for RDNA3 with rocWMMA (PoC)
Pull Request -
State: closed - Opened by Nekotekina about 2 months ago
- 6 comments
Labels: Nvidia GPU
#9592 - Add basic function calling example using a llama-cli python wrapper
Pull Request -
State: open - Opened by dmahurin about 2 months ago
Labels: examples, python
#9591 - Added link to Bielik model
Pull Request -
State: closed - Opened by 32bitmicro about 2 months ago
#9589 - ggml: RWKV_WKV: Fix merge error in #9454
Pull Request -
State: closed - Opened by MollySophia about 2 months ago
#9588 - Bug: false sharing in threadpool makes ggml_barrier() needlessly slow
Issue -
State: closed - Opened by wtarreau about 2 months ago
- 1 comment
Labels: bug-unconfirmed, low severity
#9587 - Bug: passing `tfs_z` crashes the server
Issue -
State: open - Opened by z80maniac about 2 months ago
- 2 comments
Labels: bug-unconfirmed, stale, critical severity
#9586 - nix: update flake.lock
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: nix
#9585 - Feature Request: Support Jina V3 arch
Issue -
State: open - Opened by abhishekbhakat about 2 months ago
- 5 comments
Labels: enhancement, stale
#9584 - Add theme Rose Pine
Issue -
State: open - Opened by k2662 about 2 months ago
- 4 comments
Labels: stale
#9583 - Bug: Templates are swapped for Mistral and Llama 2 in llama-server when using --chat-template
Issue -
State: open - Opened by StrangeBytesDev about 2 months ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#9582 - Bug: Vulkan not compile
Issue -
State: closed - Opened by akac97 about 2 months ago
- 4 comments
Labels: bug-unconfirmed, critical severity
#9581 - CUDA: enable Gemma FA for HIP/Pascal
Pull Request -
State: closed - Opened by JohannesGaessler about 2 months ago
Labels: testing, Nvidia GPU
#9580 - Bug: Gemma2 9B FlashAttention is offloaded to CPU on AMD (HIP)
Issue -
State: closed - Opened by Nekotekina about 2 months ago
- 1 comment
Labels: bug-unconfirmed, medium severity
#9579 - Revert "[SYCL] fallback mmvq"
Pull Request -
State: closed - Opened by qnixsynapse about 2 months ago
Labels: ggml, SYCL
#9578 - Feature Request: Add native int8 pure CUDA Core accelerate for pascal series graphics cards(Like:Tesla P40,Tesla P4)
Issue -
State: closed - Opened by SakuraRK about 2 months ago
- 2 comments
Labels: enhancement
#9577 - [SYCL] add missed dll file in package
Pull Request -
State: closed - Opened by NeoZhangJianyu about 2 months ago
Labels: devops
#9575 - ERROR: Can't Compile llama.cpp on Mac OS Sequoia (September 2024 update)
Issue -
State: closed - Opened by joseph777111 about 2 months ago
- 5 comments
Labels: bug-unconfirmed, high severity
#9574 - llama: remove redundant loop when constructing ubatch
Pull Request -
State: closed - Opened by shankarg87 about 2 months ago
Labels: Review Complexity : Low
#9573 - ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG
Pull Request -
State: closed - Opened by slaren about 2 months ago
Labels: ggml
#9572 - Bug: Flash attention reduces vulkan performance by ~50%
Issue -
State: closed - Opened by tempstudio about 2 months ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#9571 - CUDA: Enable K-shift operation for -ctk q8_0 (limited)
Pull Request -
State: closed - Opened by Nekotekina about 2 months ago
- 8 comments
Labels: Nvidia GPU
#9570 - quantize : improve type name parsing
Pull Request -
State: closed - Opened by slaren about 2 months ago
Labels: examples
#9569 - Bug: Llama-Quantize Not Working with Capital Letters (T^T)
Issue -
State: closed - Opened by HatsuneMikuUwU33 about 2 months ago
Labels: bug-unconfirmed, medium severity
#9568 - Bug: ROCM 7900xtx output random garbage with qwen1.5/14B after recent update
Issue -
State: open - Opened by sorasoras about 2 months ago
- 6 comments
Labels: bug-unconfirmed, stale, critical severity
#9567 - sync : ggml
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
- 1 comment
Labels: script, testing, Nvidia GPU, Vulkan, ggml, SYCL, Kompute
#9566 - Bug: gguf pypi package corrupts environment
Issue -
State: open - Opened by vladmandic about 2 months ago
Labels: bug-unconfirmed, high severity
#9564 - Bug: Release version less accurate than Debug version consistently
Issue -
State: closed - Opened by SwamiKannan about 2 months ago
- 2 comments
Labels: bug-unconfirmed, low severity
#9563 - Bug: Model isn't loading
Issue -
State: open - Opened by iladshyan about 2 months ago
- 3 comments
Labels: bug-unconfirmed, stale, high severity
#9562 - CUDA: fix sum.cu compilation for CUDA < 11.7
Pull Request -
State: closed - Opened by JohannesGaessler about 2 months ago
Labels: Nvidia GPU, Review Complexity : Low
#9560 - [CANN]Bug: Can't compile ggml/src/CMakeFiles/ggml.dir/ggml-cann/acl_tensor.cpp.o
Issue -
State: open - Opened by pangbobi about 2 months ago
- 1 comment
Labels: enhancement, Ascend NPU
#9559 - examples : flush log upon ctrl+c
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: examples
#9558 - Bug: llama-cli does not show the results of the performance test when SIGINT
Issue -
State: closed - Opened by ownia about 2 months ago
- 3 comments
Labels: bug-unconfirmed, medium severity
#9557 - baby-llama : use unnamed namespace in baby_llama_layer
Pull Request -
State: open - Opened by danbev about 2 months ago
- 10 comments
Labels: examples
#9556 - Bug: llama cpp server arg LLAMA_ARG_N_GPU_LAYERS doesn't follow the same convention as llama cpp python n_gpu_layers
Issue -
State: open - Opened by mvonpohle about 2 months ago
- 2 comments
Labels: bug-unconfirmed, low severity
#9555 - Bug: Unreadable output from android example project
Issue -
State: open - Opened by xunuohope1107 about 2 months ago
- 6 comments
Labels: bug-unconfirmed, high severity
#9554 - Bug: Fail to compile after commit 202084d31d4247764fc6d6d40d2e2bda0c89a73a
Issue -
State: closed - Opened by AntonioLucibello about 2 months ago
- 5 comments
Labels: bug-unconfirmed, high severity
#9552 - Feature Request: Support GRIN-MoE by Microsoft
Issue -
State: open - Opened by GlasslessPizza about 2 months ago
Labels: enhancement
#9551 - Bug: KV quantization fails when using vulkan
Issue -
State: open - Opened by jmars about 2 months ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#9550 - Update CUDA graph on scale change plus clear nodes/params
Pull Request -
State: closed - Opened by agray3 about 2 months ago
Labels: Nvidia GPU
#9548 - Perplexity input data should not be unescaped
Pull Request -
State: closed - Opened by CISC about 2 months ago
Labels: examples
#9546 - Fix load time calculation error in llama_bench
Pull Request -
State: closed - Opened by Septa2112 about 2 months ago
- 4 comments
Labels: examples
#9545 - Bug: Build fails on i386 systems
Issue -
State: open - Opened by yurivict about 2 months ago
- 2 comments
Labels: bug-unconfirmed, Vulkan, low severity
#9544 - server: disable context shift
Pull Request -
State: closed - Opened by VJHack about 2 months ago
- 6 comments
Labels: examples, server
#9543 - Imatrix input data should not be unescaped
Pull Request -
State: closed - Opened by CISC about 2 months ago
- 2 comments
Labels: examples
#9542 - Update convert_hf_to_gguf.py
Pull Request -
State: closed - Opened by blap about 2 months ago
- 1 comment
Labels: python
#9541 - add solar pro support
Pull Request -
State: open - Opened by mxyng about 2 months ago
- 2 comments
Labels: python
#9540 - Bug: [SYCL] silently failed on windows
Issue -
State: closed - Opened by easyfab about 2 months ago
- 1 comment
Labels: bug-unconfirmed, critical severity
#9538 - ggml : fix n_threads_cur initialization with one thread
Pull Request -
State: closed - Opened by slaren about 2 months ago
Labels: ggml
#9535 - Bug: llama-cli generates incoherent output with full gpu offload
Issue -
State: closed - Opened by 8XXD8 about 2 months ago
- 3 comments
Labels: bug-unconfirmed, high severity
#9534 - llama : use reserve/emplace_back in sampler_sample
Pull Request -
State: closed - Opened by danbev about 2 months ago
#9533 - Error compiling using CUDA on Jetson Orin nx
Issue -
State: open - Opened by litao-zhx about 2 months ago
- 2 comments
#9532 - Implementations for Q4_0_8_8 quantization based functions - AVX512 version of ggml_gemm_q4_0_8x8_q8_0
Pull Request -
State: closed - Opened by Srihari-mcw about 2 months ago
- 8 comments
Labels: ggml
#9531 - server : clean-up completed tasks from waiting list
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: examples, server
#9530 - Bug: Lower performance in pre-built binary llama-server, Since llama-b3681-bin-win-cuda-cu12.2.0-x64
Issue -
State: closed - Opened by tobchef about 2 months ago
- 13 comments
Labels: bug-unconfirmed, medium severity
#9529 - server : fix OpenSSL build by removing invalid `LOG_INFO` references
Pull Request -
State: closed - Opened by EZForever about 2 months ago
Labels: examples, server
#9528 - Bug: task ids not removed from waiting_tasks for /v1/chat/completions call
Issue -
State: closed - Opened by anagri about 2 months ago
- 1 comment
Labels: bug-unconfirmed, medium severity
#9527 - bugfix: structured output response_format does not match openai
Pull Request -
State: closed - Opened by VJHack about 2 months ago
Labels: examples, server
#9526 - musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80)
Pull Request -
State: closed - Opened by yeahdongcn about 2 months ago
Labels: Nvidia GPU
#9525 - llama: (proposal) propagating the results of `graph_compute` to the user interface
Pull Request -
State: open - Opened by Xarbirus about 2 months ago
- 9 comments
#9524 - llama-bench: correct argument parsing error message
Pull Request -
State: closed - Opened by Xarbirus about 2 months ago
Labels: examples
#9522 - Bug: llama-server structured output response_format does not match openai docs
Issue -
State: closed - Opened by Gittingthehubbing about 2 months ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#9520 - scripts : verify py deps at the start of compare
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: script, python
#9519 - docs: update server streaming mode documentation
Pull Request -
State: open - Opened by CentricStorm about 2 months ago
Labels: examples, server
#9517 - Can't load a Q4 model on 12gb vram
Issue -
State: closed - Opened by akagohary about 2 months ago
- 1 comment
Labels: bug-unconfirmed, low severity
#9516 - Bug: duplicate vulkan devices being detected on windows
Issue -
State: open - Opened by tempstudio about 2 months ago
Labels: bug-unconfirmed, low severity
#9514 - Bug: Crash in Release Mode when built with Xcode 16 (& since Xcode 15.3)
Issue -
State: closed - Opened by brittlewis12 about 2 months ago
- 6 comments
Labels: bug-unconfirmed, critical severity
#9513 - add env variable for parallel
Pull Request -
State: closed - Opened by bertwagner about 2 months ago
- 2 comments
Labels: examples, server
#9512 - llama: public llama_n_head
Pull Request -
State: closed - Opened by Xarbirus about 2 months ago
#9511 - Fixed n vocab
Pull Request -
State: closed - Opened by Xarbirus about 2 months ago
#9510 - llama : add reranking support
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
- 41 comments
Labels: examples, python, devops, server, merge ready
#9509 - ggml : move common CPU backend impl to new header
Pull Request -
State: closed - Opened by slaren about 2 months ago
Labels: ggml
#9508 - llama.cpp: Add a missing header for cpp23
Pull Request -
State: closed - Opened by ykhrustalev about 2 months ago
#9507 - metal : increase GPU duty-cycle during inference
Issue -
State: closed - Opened by ggerganov about 2 months ago
- 1 comment
Labels: help wanted, performance, Apple Metal
#9505 - Bug: Lower performance in SYCL vs IPEX LLM.
Issue -
State: open - Opened by adi-lb-phoenix about 2 months ago
- 15 comments
Labels: bug-unconfirmed, medium severity
#9504 - llama : rename n_embed to n_embd in rwkv6_time_mix
Pull Request -
State: closed - Opened by danbev about 2 months ago
#9502 - Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox
Issue -
State: closed - Opened by CentricStorm about 2 months ago
- 3 comments
Labels: bug-unconfirmed, medium severity
#9501 - Bug: llama-bench: split-mode flag doesn't recognize argument 'none'
Issue -
State: open - Opened by letter-v about 2 months ago
- 1 comment
Labels: bug-unconfirmed, stale, low severity
#9499 - gguf-split : add basic checks
Pull Request -
State: closed - Opened by slaren about 2 months ago
Labels: examples
#9498 - Bug: can not merge gguf, gguf_init_from_file: invalid magic characters ''
Issue -
State: closed - Opened by bss03arg about 2 months ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#9497 - CMake: correct order of sycl flags
Pull Request -
State: closed - Opened by Xarbirus about 2 months ago
- 2 comments
#9496 - [SYCL] fix cmake broken
Pull Request -
State: closed - Opened by airMeng about 2 months ago
- 3 comments
Labels: devops
#9495 - added null check for llava decode
Pull Request -
State: closed - Opened by l3utterfly about 2 months ago
#9493 - Feature Request: RDMA support for rpc back ends
Issue -
State: open - Opened by slavonnet about 2 months ago
- 2 comments
Labels: enhancement, stale
#9492 - Bug: llama-server api first query very slow
Issue -
State: open - Opened by bosmart about 2 months ago
- 11 comments
Labels: bug, medium severity
#9490 - Bug: [SYCL] linker fails with undefined reference to symbol
Issue -
State: closed - Opened by qnixsynapse about 2 months ago
- 3 comments
Labels: bug-unconfirmed, high severity
#9489 - Bug: andriod compiling bug, with vulkan open
Issue -
State: open - Opened by bitxsw93 about 2 months ago
- 2 comments
Labels: bug-unconfirmed, stale, medium severity
#9488 - nix: update flake.lock
Pull Request -
State: closed - Opened by ggerganov about 2 months ago
Labels: nix
#9487 - sycl+intel build fix
Pull Request -
State: closed - Opened by Xarbirus about 2 months ago
- 2 comments
#9485 - nvidia uses the LLaMAForCausalLM string in their config.json, example…
Pull Request -
State: closed - Opened by csabakecskemeti about 2 months ago
Labels: python
#9484 - main: option to disable context shift
Pull Request -
State: closed - Opened by VJHack about 2 months ago
- 2 comments
Labels: examples, server
#9483 - Bug: ERROR-hf-to-gguf
Issue -
State: closed - Opened by xyangyan about 2 months ago
- 1 comment
Labels: bug-unconfirmed
#9482 - Update clip.cpp
Pull Request -
State: closed - Opened by Tejaakshaykumar about 2 months ago
- 5 comments
Labels: examples
#9481 - [CANN]Feature Request: Support OrangeAIPRO 310b CANN
Issue -
State: open - Opened by StudyingLover about 2 months ago
Labels: enhancement, Ascend NPU
#9478 - Bug: There is an issue to execute llama-baby-llama.
Issue -
State: closed - Opened by Foreverythin about 2 months ago
- 2 comments
Labels: bug-unconfirmed, low severity
#9477 - Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server
Issue -
State: closed - Opened by jeanromainroy about 2 months ago
- 1 comment
Labels: bug-unconfirmed, medium severity