Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / PygmalionAI/aphrodite-engine issues and pull requests
#1081 - chore: check for torch 2.4.0 when registering custom op
Pull Request -
State: closed - Opened by AlpinDale about 1 month ago
#1080 - core: rename `PromptInputs,inputs` -> `PromptType,prompt`
Pull Request -
State: closed - Opened by AlpinDale about 1 month ago
#1079 - vlm: fix feature size calculation for llava-next models
Pull Request -
State: closed - Opened by AlpinDale about 1 month ago
#1078 - tests: refactor model tests
Pull Request -
State: closed - Opened by AlpinDale about 1 month ago
#1077 - [Feature]: Automatic max-model-len or max-num-seqs
Issue -
State: open - Opened by markouustalu about 1 month ago
- 2 comments
#1076 - fix: unexpected kwarg for the legacy API server
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1075 - fix: validate `n` in the sampling params
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1073 - build: dockerfile for aarch64
Pull Request -
State: open - Opened by AlpinDale about 2 months ago
#1072 - api: support LoRA lineage and base model metadata management
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1071 - rocm: enable multi-step scheduling for rocm
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1070 - fix: Phi3.5 Mini and MoE LoRA inference
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1069 - vlm: add support for molmo vision model
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1068 - build: guard against changes in cuda library name
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1067 - sampler: simplify logits resort in _apply_top_k_top_p
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1066 - rocm: add support for FP8 KV cache in the custom paged attention kernels
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1065 - api: enable MQAphroditeEngine for embedding models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1064 - fix: encoder-decoder models for beam search
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1063 - api: non-zero exit code if MQ engine startup fails
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1062 - rocm: add more quants, fix _scaled_mm call
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1061 - distributed: bind only to 127.0.0.1 for local-only usage
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1060 - core: support prompt logprobs in multi-step
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1059 - fix: add missing logit index increment in sampling metadata prep
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1058 - tokenizer: allow skip_special_tokens=False for mistral tokenizer
Pull Request -
State: open - Opened by AlpinDale about 2 months ago
#1057 - build: fix compilation for causal_conv1d_fwd kernel signature
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1056 - feat: introduce MQAphroditeEngine
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1055 - mamba: enable continuous batching for mamba kernels
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1054 - fix: granite logit scale in logit computation
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1053 - api: add mistral function calling format to all models loaded with "mistral" format
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1052 - quant: add tensor parallel support for bitsandbytes
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1051 - core: add cuda graph support for encoder-decoder models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1050 - torch.compile: register all-reduce operations as custom ops
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1049 - chore: remove dead code from triton sampling kernels
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1048 - kernel: asymmetric AQ AZP quantization kernels
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1047 - fix: clean shutdown issues
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1046 - tpu: implement multi-step scheduling
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1045 - torch.compile: fix functionalization
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1044 - model: add support for MiniCPM-3
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1043 - rocm: add custom paged attention kernels for ROCm
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1042 - xpu: bump IPEX to 2.3, support GQA
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1041 - torch.compile: allow adding custom compile backends via plugins
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1040 - fix: skip loading extra bias for Qwen2-VL GPTQ
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1039 - core: factor out input preprocessing into a separate class
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1038 - fix: grouped_topk return type
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1037 - fix: disable chunked prefill and prefix caching for multimodal models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1036 - fix: multi-step + flashinfer with cuda graphs
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1035 - api: add sampling/engine option to return only deltas or final output
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1034 - model: add support for DeepSeek-V3 model
Pull Request -
State: open - Opened by AlpinDale about 2 months ago
#1033 - multi-step: add support for flashinfer attention backend
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1032 - fix: lazy init _copy_stream
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1031 - vlm: support multiple images for qwen-vl
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1030 - vlm: fix internvl2 inference with various num_patches
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1029 - torch.compile: hide slicing under custom op for inductor
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1028 - chore: use RoPE cache for MRoPE method
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1027 - cpu: raise error if using encoder-decoder models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1026 - quants: add bitsandbytes support for gemma2 model
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1025 - api: fix logic for deciding if tool parser is used
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1024 - chore: remove engine_use_ray
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1023 - core: dump model runner inputs during crash
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1022 - vlm: add support for Pixtral model
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1021 - tests: refactor speculative decoding tests to remove the async engine
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1020 - chore: move `device` keys to a constant
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1019 - kernel: add meta functions for ops to prevent graph breaks
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1018 - [Usage]: Any tips on troubleshooting Quant-LLM
Issue -
State: open - Opened by GHBigD about 2 months ago
- 1 comment
#1017 - cpu: add support for W8A8 quantization via compressed-tensor
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1016 - cpu: fix issue with sampling kernels
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1015 - vlm: add support for Qwen2-VL model
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1014 - vlm: add support for video modality + llava next video model
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1013 - quants: add support for NVIDIA's ModelOpt checkpoints
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1012 - fix: internvl pipeline parallel
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1011 - chore: skip loading extra bias for qwen2 moe GPTQ
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1010 - build: shallow clone cutlass 3.5.1 tag
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1009 - fix: pass `APHRODITE_ATTENTION_BACKEND` to ray workers
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1008 - fix: ensure multistep lookahead allocation is compatible with cugraph max capture
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1007 - chore: keep chunked prefill enabled with prefix caching
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1006 - chore: remove peft as a requirement
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1005 - spec decode: move ops.advane_step to flash attention backend
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1004 - fix: LoRA support for Cohere and Jamba models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1003 - tools: fix tool calls to more strictly follow OpenAI format
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1002 - vlm: add multi-input support for LLaVA and InternVL models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#1001 - [Bug]: Docker latest [FATAL tini (19)] exec /app/aphrodite-engine/docker/entrypoint.sh failed: No such file or directory
Issue -
State: open - Opened by GHBigD about 2 months ago
- 2 comments
Labels: bug
#1000 - core: fix async postprocessor in case of preemption
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#999 - fix: hermes tool call chat template
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#998 - quants: improve awq_triton throughput
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#997 - chore: use `ray[adag]` dep instead of cuda
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#996 - fix: gptq_marlin exception on older GPUs
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#995 - models: add support for QwenVL
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#994 - neuron: add 8bit quantization for Neuron
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#993 - api: implement OpenAI-compatible tools API for Hermes/Mistral models
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#992 - vlm: enable multimodal inputs for the LLM class
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#991 - vlm: fix siglip layernorm and paligemma weight loading
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#990 - vlm: support multiple audios per prompt for Ultravox
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#989 - tpu: use XLA rank for persistent cache path
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#988 - benchmarks: add `--async-engine` arg to throughput benchmark
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#987 - quants: add GPTQ and FBGEMM to AphroditeParameters
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#986 - tpu: fix outputs by correcting the next_token_ids shape
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#985 - chore: rename `task_handler` to `worker`
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#984 - fix: raise exception when accessing logger for disable_log_stats=True case
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#984 - fix: raise exception when accessing logger for disable_log_stats=True case
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#983 - core: improve async postproc + multi-step performance
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago
#983 - core: improve async postproc + multi-step performance
Pull Request -
State: closed - Opened by AlpinDale about 2 months ago