PygmalionAI/aphrodite-engine issues and pull requests

#1081 - chore: check for torch 2.4.0 when registering custom op

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1080 - core: rename `PromptInputs,inputs` -> `PromptType,prompt`

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1079 - vlm: fix feature size calculation for llava-next models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1078 - tests: refactor model tests

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1077 - [Feature]: Automatic max-model-len or max-num-seqs

Issue - State: open - Opened by markouustalu 7 months ago - 2 comments

#1076 - fix: unexpected kwarg for the legacy API server

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1075 - fix: validate `n` in the sampling params

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1073 - build: dockerfile for aarch64

Pull Request - State: open - Opened by AlpinDale 7 months ago

#1072 - api: support LoRA lineage and base model metadata management

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1071 - rocm: enable multi-step scheduling for rocm

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1070 - fix: Phi3.5 Mini and MoE LoRA inference

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1069 - vlm: add support for molmo vision model

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1068 - build: guard against changes in cuda library name

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1067 - sampler: simplify logits resort in _apply_top_k_top_p

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1066 - rocm: add support for FP8 KV cache in the custom paged attention kernels

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1065 - api: enable MQAphroditeEngine for embedding models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1064 - fix: encoder-decoder models for beam search

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1063 - api: non-zero exit code if MQ engine startup fails

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1062 - rocm: add more quants, fix _scaled_mm call

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1061 - distributed: bind only to 127.0.0.1 for local-only usage

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1060 - core: support prompt logprobs in multi-step

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1059 - fix: add missing logit index increment in sampling metadata prep

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1058 - tokenizer: allow skip_special_tokens=False for mistral tokenizer

Pull Request - State: open - Opened by AlpinDale 7 months ago

#1057 - build: fix compilation for causal_conv1d_fwd kernel signature

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1056 - feat: introduce MQAphroditeEngine

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1055 - mamba: enable continuous batching for mamba kernels

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1054 - fix: granite logit scale in logit computation

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1053 - api: add mistral function calling format to all models loaded with "mistral" format

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1052 - quant: add tensor parallel support for bitsandbytes

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1051 - core: add cuda graph support for encoder-decoder models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1050 - torch.compile: register all-reduce operations as custom ops

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1049 - chore: remove dead code from triton sampling kernels

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1048 - kernel: asymmetric AQ AZP quantization kernels

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1047 - fix: clean shutdown issues

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1046 - tpu: implement multi-step scheduling

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1045 - torch.compile: fix functionalization

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1044 - model: add support for MiniCPM-3

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1043 - rocm: add custom paged attention kernels for ROCm

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1042 - xpu: bump IPEX to 2.3, support GQA

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1041 - torch.compile: allow adding custom compile backends via plugins

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1040 - fix: skip loading extra bias for Qwen2-VL GPTQ

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1039 - core: factor out input preprocessing into a separate class

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1038 - fix: grouped_topk return type

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1037 - fix: disable chunked prefill and prefix caching for multimodal models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1036 - fix: multi-step + flashinfer with cuda graphs

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1035 - api: add sampling/engine option to return only deltas or final output

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1034 - model: add support for DeepSeek-V3 model

Pull Request - State: open - Opened by AlpinDale 7 months ago

#1033 - multi-step: add support for flashinfer attention backend

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1032 - fix: lazy init _copy_stream

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1031 - vlm: support multiple images for qwen-vl

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1030 - vlm: fix internvl2 inference with various num_patches

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1029 - torch.compile: hide slicing under custom op for inductor

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1028 - chore: use RoPE cache for MRoPE method

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1027 - cpu: raise error if using encoder-decoder models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1026 - quants: add bitsandbytes support for gemma2 model

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1025 - api: fix logic for deciding if tool parser is used

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1024 - chore: remove engine_use_ray

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1023 - core: dump model runner inputs during crash

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1022 - vlm: add support for Pixtral model

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1021 - tests: refactor speculative decoding tests to remove the async engine

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1020 - chore: move `device` keys to a constant

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1019 - kernel: add meta functions for ops to prevent graph breaks

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1018 - [Usage]: Any tips on troubleshooting Quant-LLM

Issue - State: open - Opened by GHBigD 7 months ago - 1 comment

#1017 - cpu: add support for W8A8 quantization via compressed-tensor

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1016 - cpu: fix issue with sampling kernels

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1015 - vlm: add support for Qwen2-VL model

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1008 - fix: ensure multistep lookahead allocation is compatible with cugraph max capture

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1007 - chore: keep chunked prefill enabled with prefix caching

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1006 - chore: remove peft as a requirement

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1005 - spec decode: move ops.advane_step to flash attention backend

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1004 - fix: LoRA support for Cohere and Jamba models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1003 - tools: fix tool calls to more strictly follow OpenAI format

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1002 - vlm: add multi-input support for LLaVA and InternVL models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#1001 - [Bug]: Docker latest [FATAL tini (19)] exec /app/aphrodite-engine/docker/entrypoint.sh failed: No such file or directory

Issue - State: open - Opened by GHBigD 7 months ago - 2 comments
Labels: bug

#1000 - core: fix async postprocessor in case of preemption

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#999 - fix: hermes tool call chat template

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#998 - quants: improve awq_triton throughput

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#997 - chore: use `ray[adag]` dep instead of cuda

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#996 - fix: gptq_marlin exception on older GPUs

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#995 - models: add support for QwenVL

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#994 - neuron: add 8bit quantization for Neuron

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#993 - api: implement OpenAI-compatible tools API for Hermes/Mistral models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#992 - vlm: enable multimodal inputs for the LLM class

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#991 - vlm: fix siglip layernorm and paligemma weight loading

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#990 - vlm: support multiple audios per prompt for Ultravox

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#989 - tpu: use XLA rank for persistent cache path

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#988 - benchmarks: add `--async-engine` arg to throughput benchmark

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#987 - quants: add GPTQ and FBGEMM to AphroditeParameters

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#986 - tpu: fix outputs by correcting the next_token_ids shape

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#985 - chore: rename `task_handler` to `worker`

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#984 - fix: raise exception when accessing logger for disable_log_stats=True case

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#983 - core: improve async postproc + multi-step performance

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#982 - vlm: fallback to SDPA for ViT models on CPU backend

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#981 - core: slightly improve chunked prefill performance

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#980 - fix: InternLM2 model with Tensor Parallel

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#979 - tpu: align worker index with node boundary

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#978 - models: add support for IBM Granite (PowerLM) models

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#977 - fix: crash when cancelling a request with multi-step

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#976 - fix: modelscope for VLMs

Pull Request - State: closed - Opened by AlpinDale 7 months ago

#975 - tpu: fix TPU type api

Pull Request - State: closed - Opened by AlpinDale 7 months ago

GitHub / PygmalionAI/aphrodite-engine issues and pull requests