Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / ericlbuehler/candle-vllm issues and pull requests

#88 - Custom benchmark with parameters

Pull Request - State: closed - Opened by guoqingbao about 1 month ago

#87 - Fix Gemma-2 multiple eos/bos ids

Pull Request - State: closed - Opened by guoqingbao about 1 month ago

#86 - Support softcapping (Gemma-2 models)

Pull Request - State: closed - Opened by guoqingbao about 1 month ago

#85 - Restore previous bug fix

Pull Request - State: closed - Opened by guoqingbao about 1 month ago

#84 - Add support for the Gemma 2 model

Pull Request - State: closed - Opened by EricLBuehler about 1 month ago - 5 comments

#83 - Apply clippy

Pull Request - State: closed - Opened by EricLBuehler about 1 month ago

#82 - No crash when both hidden_act and hidden_activation are set for gemma models

Pull Request - State: closed - Opened by guoqingbao about 2 months ago

#81 - Ask users to provide huggingface token if no token cached and passed to the program.

Pull Request - State: closed - Opened by guoqingbao about 2 months ago

#80 - Fix bug for non-stream response

Pull Request - State: closed - Opened by guoqingbao about 2 months ago

#79 - Add model support for gemma 9b

Issue - State: open - Opened by sigridjineth about 2 months ago - 16 comments
Labels: enhancement

#78 - Optimize quantized matmul in batch processing & update Q4K results

Pull Request - State: closed - Opened by guoqingbao about 2 months ago

#77 - Support in-situ quantization

Pull Request - State: closed - Opened by guoqingbao about 2 months ago

#76 - Running without huggingface token cache raises an error

Issue - State: closed - Opened by sigridjineth about 2 months ago - 2 comments

#75 - When using non-stream mode, the client is blocking.

Issue - State: closed - Opened by wzzju about 2 months ago - 2 comments

#74 - Parallel token sampling process & reset decoder after each generation

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#73 - Tweak sampling parameters & update batched generation results

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#72 - Fix bug for space token decoding & remove redundant code

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#71 - Fix bug for token decoding & remove token padding

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#70 - Applying the optimization options

Pull Request - State: closed - Opened by kozistr 2 months ago - 2 comments

#69 - Support streaming batched chat completion requests

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#68 - Update demo video

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#67 - LLaMa3.1 chat completion

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#66 - More elegant way for handing non-streaming finish signal.

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#65 - Fix bug for non-streaming generation.

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#64 - Fix typo & update ReadMe

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#63 - Switch streaming service to axum & standalone generation thread

Pull Request - State: closed - Opened by guoqingbao 2 months ago

#62 - Using candle-vllm as crate in rust?

Issue - State: open - Opened by gkvoelkl 3 months ago - 1 comment

#61 - Server-side generation breaks down when the client closes the connection or stops the chat.

Issue - State: closed - Opened by guoqingbao 3 months ago - 2 comments
Labels: enhancement

#60 - Trim HF token

Pull Request - State: closed - Opened by EricLBuehler 3 months ago - 1 comment

#59 - model download failing from HF

Issue - State: closed - Opened by Ranganaths 3 months ago - 1 comment

#58 - Fix build

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#57 - Support Yi & StableLM models, change default maximum length of generated tokens for smooth chat.

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#56 - Fix corner case when block table too small

Pull Request - State: closed - Opened by EricLBuehler 3 months ago

#55 - Fix mistral output repetition with F32 rope and penalty & temperature parameters

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#54 - Fix mistral model & more optional model-specific parameters.

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#53 - Support Phi2 and Mistral models, fix generation remainder, more sampling parameters, etc.

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#52 - Fix bug for previous removal of repeat_kv (when key_value_heads > 1 and < attention_heads)

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#51 - Qwen 2 model broken

Issue - State: closed - Opened by EricLBuehler 3 months ago - 3 comments

#50 - Higher precision for rope in Gemma model.

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#49 - Support Gemma model & remove repeat_kv (replaced with broadcast matmu…

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#48 - Error prompt for requested message exceeds model capacity

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#47 - LongRope support for Phi 3

Issue - State: closed - Opened by EricLBuehler 3 months ago - 2 comments

#46 - Support qwen2 model, optimize phi3 model, revise model loading strategy

Pull Request - State: closed - Opened by guoqingbao 3 months ago - 2 comments

#45 - Unified pipeline for models & support phi3 model

Pull Request - State: closed - Opened by guoqingbao 3 months ago

#44 - Support chat serving for more models

Issue - State: open - Opened by guoqingbao 3 months ago - 7 comments
Labels: enhancement

#43 - Support stream response

Issue - State: closed - Opened by guoqingbao 3 months ago - 7 comments
Labels: enhancement

#42 - Support stream chat completion & optimization for decoding stage

Pull Request - State: closed - Opened by guoqingbao 3 months ago - 10 comments

#41 - Configurable kvcache & fix repeat chat history

Pull Request - State: closed - Opened by guoqingbao 4 months ago - 3 comments

#40 - Optional logprobs & fix llama eos/stop token

Pull Request - State: closed - Opened by guoqingbao 4 months ago - 4 comments

#39 - Dusting the project off

Pull Request - State: closed - Opened by EricLBuehler 4 months ago

#38 - Correct generation with paged attention (fix kernel launch, kvcache, llama pipeline, etc.)

Pull Request - State: closed - Opened by guoqingbao 4 months ago - 1 comment

#37 - Fix pipeline generation (kernel launch, kernel compilation, rwlock, paged attention, etc.)

Pull Request - State: closed - Opened by guoqingbao 4 months ago - 2 comments

#36 - Doesn't compile

Issue - State: closed - Opened by ivanbaldo 6 months ago - 2 comments

#35 - candle-vllm build issue

Issue - State: closed - Opened by tupleleap 7 months ago - 1 comment

#34 - Support using arbitrary derivative models

Issue - State: closed - Opened by ivanbaldo 8 months ago - 5 comments

#33 - Support Mixtral-8x7B-v0.1

Issue - State: closed - Opened by ivanbaldo 8 months ago - 2 comments
Labels: enhancement, tracking

#32 - --repeat-last-n option not mentioned in the usage help

Issue - State: closed - Opened by ivanbaldo 8 months ago - 8 comments
Labels: triaged

#31 - Support running without the --hf-token parameter and using ~/.cache/huggingface/token instead

Issue - State: closed - Opened by ivanbaldo 8 months ago - 45 comments
Labels: enhancement, triaged

#30 - Fix model IDs

Pull Request - State: closed - Opened by pcuenca 8 months ago - 1 comment

#29 - Wrong URL for downloading models

Issue - State: closed - Opened by ivanbaldo 8 months ago - 5 comments

#28 - `paged_attention_v1` function

Issue - State: closed - Opened by EricLBuehler 8 months ago

#27 - `rotary_embedding` function

Issue - State: closed - Opened by EricLBuehler 8 months ago

#26 - [Request] Constrained Generation

Issue - State: closed - Opened by scottwey 8 months ago - 4 comments
Labels: enhancement

#25 - candle-flash-attn linking error with Red Hat based distributions

Issue - State: closed - Opened by ivanbaldo 8 months ago - 46 comments
Labels: bug, triaged

#24 - Use rotary embedding CUDA kernel

Pull Request - State: closed - Opened by EricLBuehler 9 months ago
Labels: enhancement, tracking

#23 - `reshape_and_cache` function

Issue - State: closed - Opened by EricLBuehler 10 months ago - 1 comment

#22 - Pass tensor pointers

Issue - State: closed - Opened by EricLBuehler 10 months ago - 1 comment

#21 - `swap_blocks` function

Issue - State: closed - Opened by EricLBuehler 10 months ago

#20 - `copy_blocks` function

Issue - State: closed - Opened by EricLBuehler 10 months ago
Labels: tracking

#19 - Switch to a Rust-based `cudarc` based backend

Pull Request - State: closed - Opened by EricLBuehler 10 months ago - 1 comment
Labels: enhancement, tracking

#18 - Barriers to further development

Issue - State: closed - Opened by EricLBuehler 10 months ago
Labels: urgent, tracking

#17 - Add devcontainer

Pull Request - State: closed - Opened by sigma-andex 10 months ago - 3 comments

#16 - Integrate cxx

Pull Request - State: closed - Opened by sigma-andex 10 months ago - 1 comment
Labels: enhancement, tracking

#15 - Add working scheduler

Pull Request - State: closed - Opened by EricLBuehler 10 months ago
Labels: enhancement

#14 - KV Cache and Scheduler tracking issue

Issue - State: closed - Opened by EricLBuehler 10 months ago - 1 comment
Labels: tracking

#13 - Add PagedAttention

Pull Request - State: closed - Opened by EricLBuehler 10 months ago
Labels: enhancement, tracking

#12 - mistral error

Issue - State: closed - Opened by lambdaofgod 10 months ago - 4 comments
Labels: bug, triaged, urgent

#11 - PagedAttention tracking issue

Issue - State: closed - Opened by EricLBuehler 10 months ago - 4 comments
Labels: tracking

#10 - Pipeline batching tracking issue

Issue - State: closed - Opened by EricLBuehler 10 months ago - 3 comments
Labels: tracking

#9 - PagedAttention tracking issue

Issue - State: closed - Opened by EricLBuehler 10 months ago
Labels: tracking

#8 - Mistral does not load safetensors

Issue - State: closed - Opened by EricLBuehler 10 months ago - 1 comment
Labels: bug, triaged

#7 - OpenAI API version

Issue - State: closed - Opened by lambdaofgod 10 months ago - 4 comments
Labels: triaged

#6 - KV Cache causes breakage

Issue - State: closed - Opened by EricLBuehler 10 months ago - 2 comments
Labels: bug

#5 - added flan-t5 example into test

Pull Request - State: closed - Opened by bm777 11 months ago - 3 comments

#4 - Can the architectural design be improved?

Issue - State: closed - Opened by mokeyish 11 months ago - 9 comments
Labels: enhancement

#3 - Batching and VLLM-style kv caching missing

Issue - State: closed - Opened by michaelfeil 11 months ago - 7 comments
Labels: enhancement

#2 - Support streaming of tokens

Issue - State: closed - Opened by michaelfeil 11 months ago - 1 comment
Labels: enhancement

#1 - Readme request

Issue - State: closed - Opened by bm777 11 months ago - 4 comments