Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / vectorch-ai/ScaleLLM issues and pull requests

#354 - fix cmake version issue for manylinux image

Pull Request - State: closed - Opened by guocuimi 10 days ago

#353 - added cuda 12.6 build image

Pull Request - State: closed - Opened by guocuimi 10 days ago

#352 - [WIP] Llava support

Pull Request - State: open - Opened by guocuimi 10 days ago

#351 - upgrade pytorch to 2.5.1

Pull Request - State: closed - Opened by guocuimi 10 days ago

#349 - quetstion about awq

Issue - State: closed - Opened by sitabulaixizawaluduo 26 days ago - 3 comments

#348 - will `callback` be protected by GIL

Issue - State: closed - Opened by tp-nan about 1 month ago - 1 comment

#347 - Upgrade pytorch to 2.5.0

Pull Request - State: closed - Opened by guocuimi about 1 month ago

#346 - ci: build cuda 12.4 for scalellm cpp images

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#345 - ci: run package test in docker

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#344 - ci: use venv instead of conda in package test

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#343 - Revert "port cuda changes"

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#342 - ci: update python version for package test

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#341 - upgrade pytorch to 2.4.1

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#340 - ut: add more tests for different warp layout

Pull Request - State: closed - Opened by guocuimi 2 months ago

#339 - misc: attention kernel refactoring

Pull Request - State: closed - Opened by guocuimi 2 months ago

#338 - [misc] read flashinfer kernel code and add comments

Pull Request - State: closed - Opened by guocuimi 2 months ago

#337 - ci: added pip cache to avoid redownloading

Pull Request - State: closed - Opened by guocuimi 3 months ago

#336 - ut: added fp8 kv unittests for flash infer kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#335 - refactor: move paged kv related logic into paged_kv_t

Pull Request - State: closed - Opened by guocuimi 3 months ago

#334 - feat: added pass-in alibi slopes support for flash infer kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#332 - ut: added unittests for flash infer kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#331 - kernel: port flash infer handler + wrapper logics

Pull Request - State: closed - Opened by guocuimi 3 months ago

#330 - refactor: move flash attn and flash infer into attention folder

Pull Request - State: closed - Opened by guocuimi 3 months ago

#328 - refactor: flatten block tables to 1d tensor

Pull Request - State: closed - Opened by guocuimi 3 months ago

#327 - kernel: added flash infer attention impl

Pull Request - State: closed - Opened by guocuimi 3 months ago

#326 - feat: fix and use marlin kernel for awq by default

Pull Request - State: closed - Opened by guocuimi 3 months ago

#325 - refactor: added static switch for marlin kernel dispatch

Pull Request - State: closed - Opened by guocuimi 3 months ago

#324 - fix: put item into asyncio.Queue in a thread-safe way

Pull Request - State: closed - Opened by guocuimi 3 months ago

#323 - Will the result callback called in a threadsafe/coruntine safe way? #322

Issue - State: closed - Opened by tp-nan 3 months ago - 7 comments

#321 - ci: allow build without requiring a physical gpu device

Pull Request - State: closed - Opened by guocuimi 3 months ago

#320 - cmake: make includes private and disable jinja2cpp build

Pull Request - State: closed - Opened by guocuimi 3 months ago

#319 - fix: clean up build warnings: "LOG" redefined

Pull Request - State: closed - Opened by guocuimi 3 months ago

#318 - refactor: clean up build warnings and refactor marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#317 - test: added unittests for marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#316 - build: speed up compilation for marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#315 - feat: added awq marlin qlinear

Pull Request - State: closed - Opened by guocuimi 3 months ago

#314 - kernel: port awq repack kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#313 - feat: added fused column parallel linear

Pull Request - State: closed - Opened by guocuimi 3 months ago

#312 - feat: added gptq marlin qlinear layer

Pull Request - State: closed - Opened by guocuimi 4 months ago

#310 - RuntimeError: Timed out

Issue - State: open - Opened by spongxin 4 months ago - 1 comment

#309 - rust: upgrade rust libs to latest version

Pull Request - State: closed - Opened by guocuimi 4 months ago

#308 - Mistral large GPTQ model inference problem

Issue - State: closed - Opened by drdaliang 4 months ago - 3 comments
Labels: investigation needed

#307 - kernel: port gptq marlin kernel and fp8 marlin kernel

Pull Request - State: closed - Opened by guocuimi 4 months ago

#306 - refactor: move models to upper folder

Pull Request - State: closed - Opened by guocuimi 4 months ago

#305 - fix: move eos out of stop token list to honor ignore_eos option

Pull Request - State: closed - Opened by guocuimi 4 months ago

#303 - feat: added marlin qlinear support

Pull Request - State: open - Opened by guocuimi 4 months ago

#302 - test: added unittests for marlin fp16xint4 gemm

Pull Request - State: closed - Opened by guocuimi 4 months ago

#301 - kernel: support kernel test in python via pybind

Pull Request - State: closed - Opened by guocuimi 4 months ago

#300 - model: added gemma2 with softcap and sliding window support

Pull Request - State: closed - Opened by guocuimi 4 months ago

#299 - test: added unittests for attention sliding window

Pull Request - State: closed - Opened by guocuimi 4 months ago

#298 - kernel: port softcap support for flash attention

Pull Request - State: closed - Opened by guocuimi 4 months ago

#297 - ci: fix pytest version to avoid flakiness

Pull Request - State: closed - Opened by guocuimi 4 months ago

#296 - feat: added sliding window support for QWen2

Pull Request - State: closed - Opened by guocuimi 4 months ago

#295 - model: added qwen2 support

Pull Request - State: closed - Opened by guocuimi 4 months ago

#294 - triton: fix build error and add example with unittest

Pull Request - State: closed - Opened by guocuimi 4 months ago

#293 - fix: handle unfinished utf8 bytes for tiktoken tokenizer

Pull Request - State: closed - Opened by guocuimi 4 months ago

#292 - feat: added THUDM/glm-4* support

Pull Request - State: closed - Opened by guocuimi 4 months ago

#290 - ci: added clang-format-ignore file to exclude generated files

Pull Request - State: closed - Opened by guocuimi 4 months ago

#289 - kernel: added triton kernel build support

Pull Request - State: closed - Opened by guocuimi 4 months ago

#288 - debug: added environment collection script.

Pull Request - State: closed - Opened by guocuimi 4 months ago

#287 - kernel: added marlin dense and sparse kernels

Pull Request - State: closed - Opened by guocuimi 4 months ago

#286 - ci: disable pip cache to avoid hash mismatch error

Pull Request - State: closed - Opened by guocuimi 4 months ago

#285 - refactor: remove exllama kernels

Pull Request - State: closed - Opened by guocuimi 4 months ago

#284 - pypi: fix invalid classifier

Pull Request - State: closed - Opened by guocuimi 4 months ago

#275 - [Issue] Qwen-14B-Chat init fail and performance issue.

Issue - State: open - Opened by liutongxuan 5 months ago - 2 comments

#261 - bugfix: fix multiple definition issue.

Pull Request - State: open - Opened by liutongxuan 5 months ago

#246 - [wip] feat: add embeddings support

Pull Request - State: open - Opened by guocuimi 5 months ago

#139 - [kernel] added half2 specialization for layernorm kernel

Pull Request - State: open - Opened by dongxianzhe 7 months ago

#128 - [model] add support for mixtral moe model

Pull Request - State: open - Opened by 936187425 8 months ago

#124 - benchmark test script

Pull Request - State: open - Opened by ShijiaTang 8 months ago - 2 comments

#105 - [workflow] added clang-format workflow

Pull Request - State: closed - Opened by guocuimi 8 months ago

#104 - [workflow] added clang-format for pull_requests

Pull Request - State: closed - Opened by guocuimi 8 months ago

#103 - [model] added support for google Gemma-2b model

Pull Request - State: closed - Opened by 936187425 8 months ago - 1 comment

#102 - [refactor] moved top_k and top_p from sampler to logits process.

Pull Request - State: closed - Opened by guocuimi 8 months ago

#101 - [feat] added speculative engine class without implementation.

Pull Request - State: closed - Opened by guocuimi 8 months ago

#100 - [feat] added engine type to allow LLM and SSM share sequence.

Pull Request - State: closed - Opened by guocuimi 8 months ago

#99 - [refactor] move model output process logic into batch

Pull Request - State: closed - Opened by guocuimi 8 months ago

#98 - [feat] added dynamic split-fuse support in continuous scheduler

Pull Request - State: closed - Opened by guocuimi 8 months ago

#97 - added layernorm benchmark

Pull Request - State: closed - Opened by dongxianzhe 8 months ago

#95 - [fix] added small page size support for flash attention.

Pull Request - State: closed - Opened by guocuimi 8 months ago

#93 - [feat] add max tokens to process to support dynamic split-fuse

Pull Request - State: closed - Opened by guocuimi 8 months ago

#91 - [fix] use https instead of git to avoid permission issue.

Pull Request - State: closed - Opened by guocuimi 8 months ago

#90 - [refactor] move batch related logic into a class

Pull Request - State: closed - Opened by guocuimi 9 months ago

#89 - [feat] added LRU policy into prefix cache.

Pull Request - State: closed - Opened by guocuimi 9 months ago

#88 - [refactor] avoid name conflict with torch::indexing::Slice

Pull Request - State: closed - Opened by guocuimi 9 months ago

#87 - [feat] enable prefix cache in block manager

Pull Request - State: closed - Opened by guocuimi 9 months ago - 1 comment

#86 - [feat] added prefix cache to share kv cache across sequences.

Pull Request - State: closed - Opened by guocuimi 9 months ago

#84 - ScaleLLM Roadmap

Issue - State: open - Opened by guocuimi 9 months ago - 3 comments
Labels: roadmap

#83 - [baichuan2-7b] random core dump in offline batched inference.

Issue - State: closed - Opened by liutongxuan 9 months ago - 2 comments

#82 - [models] fix chatglm model issue.

Pull Request - State: closed - Opened by guocuimi 9 months ago