vectorch-ai/ScaleLLM issues and pull requests

#354 - fix cmake version issue for manylinux image

Pull Request - State: closed - Opened by guocuimi 10 days ago

#353 - added cuda 12.6 build image

Pull Request - State: closed - Opened by guocuimi 10 days ago

#352 - [WIP] Llava support

Pull Request - State: open - Opened by guocuimi 10 days ago

#351 - upgrade pytorch to 2.5.1

Pull Request - State: closed - Opened by guocuimi 10 days ago

#350 - misc: remove legacy logic to support quantization for other types.

Pull Request - State: closed - Opened by guocuimi 23 days ago

#349 - quetstion about awq

Issue - State: closed - Opened by sitabulaixizawaluduo 26 days ago - 3 comments

#348 - will `callback` be protected by GIL

Issue - State: closed - Opened by tp-nan about 1 month ago - 1 comment

#347 - Upgrade pytorch to 2.5.0

Pull Request - State: closed - Opened by guocuimi about 1 month ago

#346 - ci: build cuda 12.4 for scalellm cpp images

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#345 - ci: run package test in docker

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#344 - ci: use venv instead of conda in package test

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#343 - Revert "port cuda changes"

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#342 - ci: update python version for package test

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#341 - upgrade pytorch to 2.4.1

Pull Request - State: closed - Opened by guocuimi about 2 months ago

#340 - ut: add more tests for different warp layout

Pull Request - State: closed - Opened by guocuimi 2 months ago

#339 - misc: attention kernel refactoring

Pull Request - State: closed - Opened by guocuimi 2 months ago

#338 - [misc] read flashinfer kernel code and add comments

Pull Request - State: closed - Opened by guocuimi 2 months ago

#337 - ci: added pip cache to avoid redownloading

Pull Request - State: closed - Opened by guocuimi 3 months ago

#336 - ut: added fp8 kv unittests for flash infer kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#335 - refactor: move paged kv related logic into paged_kv_t

Pull Request - State: closed - Opened by guocuimi 3 months ago

#334 - feat: added pass-in alibi slopes support for flash infer kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#333 - refactor: replaced last_page_len with kv_indptr for flash infer kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago

#332 - ut: added unittests for flash infer kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#331 - kernel: port flash infer handler + wrapper logics

Pull Request - State: closed - Opened by guocuimi 3 months ago

#330 - refactor: move flash attn and flash infer into attention folder

Pull Request - State: closed - Opened by guocuimi 3 months ago

#329 - kernel: added script to generate instantiation for flashinfer kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#328 - refactor: flatten block tables to 1d tensor

Pull Request - State: closed - Opened by guocuimi 3 months ago

#327 - kernel: added flash infer attention impl

Pull Request - State: closed - Opened by guocuimi 3 months ago

#326 - feat: fix and use marlin kernel for awq by default

Pull Request - State: closed - Opened by guocuimi 3 months ago

#325 - refactor: added static switch for marlin kernel dispatch

Pull Request - State: closed - Opened by guocuimi 3 months ago

#324 - fix: put item into asyncio.Queue in a thread-safe way

Pull Request - State: closed - Opened by guocuimi 3 months ago

#323 - Will the result callback called in a threadsafe/coruntine safe way? #322

Issue - State: closed - Opened by tp-nan 3 months ago - 7 comments

#321 - ci: allow build without requiring a physical gpu device

Pull Request - State: closed - Opened by guocuimi 3 months ago

#320 - cmake: make includes private and disable jinja2cpp build

Pull Request - State: closed - Opened by guocuimi 3 months ago

#319 - fix: clean up build warnings: "LOG" redefined

Pull Request - State: closed - Opened by guocuimi 3 months ago

#318 - refactor: clean up build warnings and refactor marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#317 - test: added unittests for marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#316 - build: speed up compilation for marlin kernels

Pull Request - State: closed - Opened by guocuimi 3 months ago

#315 - feat: added awq marlin qlinear

Pull Request - State: closed - Opened by guocuimi 3 months ago

#314 - kernel: port awq repack kernel

Pull Request - State: closed - Opened by guocuimi 3 months ago