Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / ggerganov/llama.cpp issues and pull requests
#10091 - Bug: Cannot run larger than VRAM models with `GGML_CUDA_ENABLE_UNIFIED_MEMORY`
Issue -
State: open - Opened by vlovich 10 days ago
Labels: bug-unconfirmed, high severity
#10090 - Feature Request: Meta releases Layer Skip, an end-to-end solution for accelerating LLMs
Issue -
State: open - Opened by mirek190 10 days ago
Labels: enhancement
#10090 - Feature Request: Meta releases Layer Skip, an end-to-end solution for accelerating LLMs
Issue -
State: open - Opened by mirek190 10 days ago
Labels: enhancement
#10089 - Bug: SwiftUI example does not work on simulator.
Issue -
State: open - Opened by guinmoon 10 days ago
Labels: bug-unconfirmed, low severity
#10089 - Bug: SwiftUI example does not work on simulator.
Issue -
State: open - Opened by guinmoon 10 days ago
Labels: bug-unconfirmed, low severity
#10089 - Bug: SwiftUI example does not work on simulator.
Issue -
State: open - Opened by guinmoon 10 days ago
Labels: bug-unconfirmed, low severity
#10083 - Bug: Floating Point Exceptions turned off by default, hiding fpExceptions
Issue -
State: open - Opened by borisweberdev 10 days ago
Labels: bug-unconfirmed, medium severity
#10083 - Bug: Floating Point Exceptions turned off by default, hiding fpExceptions
Issue -
State: open - Opened by borisweberdev 10 days ago
Labels: bug-unconfirmed, medium severity
#10080 - Bug: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
Issue -
State: open - Opened by morgen52 10 days ago
- 4 comments
Labels: bug-unconfirmed, high severity
#10080 - Bug: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
Issue -
State: open - Opened by morgen52 10 days ago
- 4 comments
Labels: bug-unconfirmed, high severity
#10078 - Bug: llama-server not logging to file
Issue -
State: open - Opened by PyroGenesis 11 days ago
- 4 comments
Labels: bug-unconfirmed, medium severity
#10078 - Bug: llama-server not logging to file
Issue -
State: open - Opened by PyroGenesis 11 days ago
- 4 comments
Labels: bug-unconfirmed, medium severity
#10077 - Bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
Issue -
State: open - Opened by awesomecoolraj 11 days ago
- 1 comment
Labels: bug-unconfirmed, high severity
#10077 - Bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
Issue -
State: open - Opened by awesomecoolraj 11 days ago
- 1 comment
Labels: bug-unconfirmed, high severity
#10077 - Bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
Issue -
State: open - Opened by awesomecoolraj 11 days ago
- 1 comment
Labels: bug-unconfirmed, high severity
#10075 - Feature Request: Implement « Why Does the Effective Context Length of LLMs Fall Short? »
Issue -
State: open - Opened by arthurwolf 11 days ago
Labels: enhancement
#10074 - Bug: llama-service can only generate garbled text after a request with invalid tokens.
Issue -
State: closed - Opened by morgen52 11 days ago
- 4 comments
Labels: bug-unconfirmed, critical severity
#10072 - Bug: Wrong slots management when receiving multiple concurrent requests.
Issue -
State: open - Opened by morgen52 11 days ago
- 4 comments
Labels: bug-unconfirmed, high severity
#10071 - llama : remove Tail-Free sampling
Pull Request -
State: closed - Opened by ggerganov 11 days ago
- 6 comments
Labels: script, testing, examples, python, server
#10070 - Bug: Setting the `np` configs leads to grabled generated tokens.
Issue -
State: open - Opened by morgen52 11 days ago
- 7 comments
Labels: bug-unconfirmed, high severity
#10069 - Implement ggml_v_expf() with a fast approximation on AVX/AVX2/AVX512
Pull Request -
State: closed - Opened by J-Montgomery 11 days ago
- 5 comments
Labels: ggml
#10065 - convert : more detailed convert lora usage docs
Pull Request -
State: open - Opened by richdougherty 13 days ago
Labels: python
#10064 - readme : more lora detail in main example readme
Pull Request -
State: open - Opened by richdougherty 13 days ago
Labels: examples
#10063 - nix: update flake.lock
Pull Request -
State: closed - Opened by ggerganov 13 days ago
Labels: nix
#10062 - fix deepseek deseret regex
Pull Request -
State: closed - Opened by dhiltgen 13 days ago
- 2 comments
#10059 - llama : rename missed batch params/vars to ubatch
Pull Request -
State: open - Opened by danbev 13 days ago
#10058 - Bug: when the vulan is enabled, the error log message ”vulkan-shaders-gen: command not found“ occurred
Issue -
State: open - Opened by shuzf 13 days ago
Labels: bug-unconfirmed, critical severity
#10057 - sampling : add adaptive temperature sampler
Pull Request -
State: closed - Opened by m18coppola 13 days ago
- 5 comments
Labels: testing, examples, server
#10056 - Bug: Server /v1/chat/completions API response's model info is wrong
Issue -
State: open - Opened by RifeWang 13 days ago
- 1 comment
Labels: bug-unconfirmed, medium severity
#10055 - add FP8 support to gguf/llama:
Pull Request -
State: open - Opened by Djip007 13 days ago
- 6 comments
Labels: build, script, testing, examples, ggml
#10054 - Bug: Server with multiple slots: Long input + short output -> extreme generation token/s slowdown
Issue -
State: closed - Opened by us58 14 days ago
- 3 comments
Labels: bug-unconfirmed, medium severity
#10053 - Llama server - Update doc for slot states
Pull Request -
State: open - Opened by PyroGenesis 14 days ago
- 3 comments
Labels: examples, server
#10052 - Bug: Can't run even the smallest model due to CUDA OOM without FlashAttention
Issue -
State: closed - Opened by vlovich 14 days ago
- 1 comment
Labels: bug-unconfirmed, medium severity
#10051 - Bug: No LM Runtime found for model format 'gguf
Issue -
State: closed - Opened by SculptorGoldenMoon 14 days ago
- 1 comment
Labels: bug-unconfirmed, low severity
#10050 - Bug: 15 GiB of CPU RAM permanently leaked on each llama-cli invocation
Issue -
State: closed - Opened by vlovich 14 days ago
- 2 comments
Labels: bug-unconfirmed, high severity
#10049 - Bug: llama-export-lora converts all non-F32 values to F16
Issue -
State: open - Opened by NWalker1208 14 days ago
Labels: bug-unconfirmed, medium severity
#10048 - sampling: add K-Shift sampler
Pull Request -
State: open - Opened by MaggotHATE 14 days ago
- 5 comments
Labels: testing, examples, server
#10047 - Bug: Certain RPC Servers cause major slowdown to Host machine
Issue -
State: open - Opened by GoudaCouda 14 days ago
- 2 comments
Labels: bug-unconfirmed, medium severity
#10045 - kompute: add backend registry / device interfaces and Q4_K shader
Pull Request -
State: open - Opened by slp 14 days ago
- 1 comment
Labels: ggml, Kompute
#10044 - Fix logging from llama-llava-cli
Pull Request -
State: open - Opened by Googulator 14 days ago
- 3 comments
Labels: examples
#10042 - musa: workaround for Guilty Lockup in cleaning src0 in #10032
Pull Request -
State: closed - Opened by yeahdongcn 14 days ago
- 2 comments
#10041 - [SYCL] pass SYCL CI
Pull Request -
State: open - Opened by airMeng 14 days ago
- 9 comments
Labels: testing, ggml, SYCL
#10039 - Execute multiple compute graphs in parallel
Issue -
State: open - Opened by paomiannanjue 14 days ago
#10037 - Bug: Vulkan backend freezes during its execution
Issue -
State: open - Opened by GrainyTV 15 days ago
- 9 comments
Labels: bug-unconfirmed, medium severity
#10035 - Feature Request: Support Aya
Issue -
State: open - Opened by maziyarpanahi 15 days ago
Labels: enhancement
#10034 - Make Kompute error verbose about unsupported types
Pull Request -
State: closed - Opened by ericcurtin 15 days ago
#10033 - metal : support permuted matrix multiplicaions
Pull Request -
State: closed - Opened by ggerganov 15 days ago
#10032 - CUDA: fix insufficient buffer clearing for MMQ
Pull Request -
State: closed - Opened by JohannesGaessler 15 days ago
Labels: Review Complexity : Low
#10031 - Bug: issue in CUDA flash attention
Issue -
State: closed - Opened by agray3 15 days ago
- 7 comments
Labels: bug-unconfirmed, medium severity
#10030 - server : check that the prompt fits in the slot's context
Pull Request -
State: closed - Opened by ggerganov 15 days ago
Labels: examples, python, server
#10029 - ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version
Pull Request -
State: open - Opened by xctan 15 days ago
- 1 comment
Labels: ggml
#10028 - Feature Request: Support for DeciLMForCausalLM
Issue -
State: open - Opened by ymcki 15 days ago
- 2 comments
Labels: enhancement
#10027 - Feature Request: Support tools and tool_choice parameter in OpenAI compatible service
Issue -
State: open - Opened by ChanceFlow 16 days ago
- 1 comment
Labels: enhancement
#10026 - llama : refactor model loader with backend registry
Pull Request -
State: open - Opened by slaren 16 days ago
- 17 comments
Labels: script, Nvidia GPU, Vulkan, examples, python, devops, ggml, SYCL, Kompute
#10023 - server : refactor slot input data, move tokenizer to HTTP thread
Pull Request -
State: closed - Opened by ngxson 16 days ago
- 5 comments
Labels: examples, python, server
#10022 - llama: string_split fix
Pull Request -
State: closed - Opened by Xarbirus 16 days ago
- 3 comments
Labels: examples, server
#10021 - CUDA: fix MMQ for non-contiguous src0, add tests
Pull Request -
State: closed - Opened by JohannesGaessler 16 days ago
Labels: testing, Nvidia GPU, Review Complexity : Medium, ggml
#10020 - Feature Request: Support OmniGen (based on phi-3 mini)
Issue -
State: open - Opened by Manni1000 16 days ago
Labels: enhancement
#10019 - Server - Sampling bug fix
Pull Request -
State: closed - Opened by wwoodsTM 16 days ago
Labels: examples, server
#10018 - server : don't overfill the batch during infill
Pull Request -
State: closed - Opened by ggerganov 16 days ago
Labels: examples, server
#10016 - sync : ggml
Pull Request -
State: closed - Opened by ggerganov 16 days ago
Labels: testing, Nvidia GPU, ggml
#10015 - llama : switch KQ multiplication to use F32 precision by default
Pull Request -
State: closed - Opened by ggerganov 16 days ago
#10013 - llama : Add IBM granite template
Pull Request -
State: closed - Opened by arch-btw 16 days ago
- 4 comments
Labels: testing
#10011 - Bug: K cache without FA goes Nan on Llama 3.1.
Issue -
State: closed - Opened by Nexesenex 16 days ago
- 22 comments
Labels: bug-unconfirmed, high severity
#10010 - Extend sgemm.cpp support for Q5_0 models
Pull Request -
State: closed - Opened by Srihari-mcw 16 days ago
- 2 comments
#10009 - Bug:Why does llama-cli choose a GPU with lower performance?
Issue -
State: open - Opened by badog-sing 17 days ago
- 5 comments
Labels: bug-unconfirmed, Apple Metal, medium severity
#10005 - llama : enable FA by default and disable it per-layer
Issue -
State: open - Opened by ggerganov 17 days ago
- 18 comments
Labels: enhancement
#10004 - llama : rename batch.logits to batch.output
Pull Request -
State: open - Opened by danbev 17 days ago
- 1 comment
Labels: breaking change, android, examples, server
#10002 - Bug: No text response when "--log-disable" is set
Issue -
State: open - Opened by jenskastensson 17 days ago
- 6 comments
Labels: bug-unconfirmed, high severity
#9995 - llama.vim : add classic vim support
Pull Request -
State: closed - Opened by m18coppola 17 days ago
- 4 comments
Labels: examples
#9991 - Bug: Unexpected output from Granite 3.0 MoE 1b when all layers on NVIDIA GPU
Issue -
State: closed - Opened by gabe-l-hart 18 days ago
- 12 comments
Labels: bug-unconfirmed, medium severity
#9988 - Bug: Memory Leak in llama-server after exit
Issue -
State: open - Opened by edwin0cheng 18 days ago
- 15 comments
Labels: bug-unconfirmed, medium severity
#9978 - Bug: llama-server crash with `--embeddings`
Issue -
State: closed - Opened by mokeyish 18 days ago
- 13 comments
Labels: bug, critical severity
#9964 - llama.cpp Windows/ROCm builds are broken? Using shared GPU memory instead of dedicated.
Issue -
State: open - Opened by SteelPh0enix 19 days ago
- 3 comments
#9961 - Feature Request: Convert .devops container images to be RHEL-based UBI images rather than Ubuntu based
Issue -
State: open - Opened by ericcurtin 19 days ago
- 1 comment
Labels: enhancement
#9953 - Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version
Pull Request -
State: closed - Opened by xctan 20 days ago
Labels: ggml
#9944 - Bug: Cannot build with C++ > 20
Issue -
State: open - Opened by bdashore3 21 days ago
- 2 comments
Labels: bug-unconfirmed, high severity
#9943 - ggml:metal Add POOL2D op and fix IM2COL in Metal backend for running MobileVLM_V2.
Pull Request -
State: closed - Opened by junhee-yoo 21 days ago
- 1 comment
Labels: testing
#9942 - Add llama_cpp_canister to the README
Pull Request -
State: open - Opened by icppWorld 21 days ago
#9939 - [SYCL]fix mul_mat_vec_q error
Pull Request -
State: open - Opened by NeoZhangJianyu 21 days ago
- 1 comment
Labels: SYCL
#9938 - server: handle n_predict==2 error
Pull Request -
State: open - Opened by kylo5aby 21 days ago
Labels: examples, server
#9937 - Bug: Can't build LLAMA_CURL=ON to embed curl on windows x64 build.
Issue -
State: open - Opened by AnthonyEmertec 21 days ago
Labels: bug-unconfirmed, high severity
#9935 - loader: use a map to find tensor by name from tensor weight
Pull Request -
State: open - Opened by kylo5aby 21 days ago
- 1 comment
#9934 - Bug: Got meaningless output when set -j {}.
Issue -
State: open - Opened by morgen52 21 days ago
- 8 comments
Labels: bug-unconfirmed, high severity
#9933 - Bug: Unexpected output length (Only one token response!) when set configs "-n -2 -c 256" for llama-server
Issue -
State: open - Opened by morgen52 21 days ago
- 1 comment
Labels: bug, good first issue, low severity
#9932 - Bug: Error when offloading falcon mamba layers on GPU
Issue -
State: open - Opened by vineel96 21 days ago
- 4 comments
Labels: bug-unconfirmed, low severity
#9931 - LLamaCausalLM add support for tokenizer.json
Pull Request -
State: open - Opened by robbiemu 21 days ago
Labels: python
#9930 - ggml : fix possible buffer use after free in sched reserve
Pull Request -
State: open - Opened by slaren 22 days ago
Labels: ggml
#9929 - server : add n_indent parameter for line indentation requirement
Pull Request -
State: closed - Opened by ggerganov 22 days ago
Labels: examples, server
#9928 - Bug: Occasional crashes when a connection has been interrupted before completion of computation
Issue -
State: open - Opened by sliedes 22 days ago
- 5 comments
Labels: bug-unconfirmed, high severity
#9927 - Bug: WARNING: The BPE pre-tokenizer was not recognized!
Issue -
State: open - Opened by smileyboy2019 22 days ago
Labels: bug-unconfirmed, medium severity
#9925 - Bug: invalid argument: --memory-f32
Issue -
State: open - Opened by iperov 22 days ago
- 5 comments
Labels: bug-unconfirmed, medium severity
#9924 - llama : infill sampling handle very long tokens
Pull Request -
State: closed - Opened by ggerganov 22 days ago
#9922 - sample: maintain token count in penalty sampler context
Pull Request -
State: open - Opened by kylo5aby 22 days ago
- 1 comment
#9921 - backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
Pull Request -
State: open - Opened by chaxu01 22 days ago
- 3 comments
Labels: examples, ggml
#9918 - Add SwiftLlama to the Bindings list
Pull Request -
State: closed - Opened by ShenghaiWang 23 days ago
#9917 - fix: allocating CPU buffer with size `0`
Pull Request -
State: closed - Opened by giladgd 23 days ago
Labels: ggml
#9916 - consolidated.safetensors
Pull Request -
State: open - Opened by CrispStrobe 23 days ago
Labels: python
#9914 - Feature Request: Support for Ministral-8B-Instruct-2410
Issue -
State: open - Opened by arch-btw 23 days ago
- 11 comments
Labels: enhancement
#9913 - Bug: Failing to build using cmake on tag b3912
Issue -
State: open - Opened by Martin-HZK 23 days ago
- 3 comments
Labels: bug-unconfirmed, medium severity