Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / vectorch-ai/ScaleLLM issues and pull requests
#354 - fix cmake version issue for manylinux image
Pull Request -
State: closed - Opened by guocuimi 10 days ago
#353 - added cuda 12.6 build image
Pull Request -
State: closed - Opened by guocuimi 10 days ago
#352 - [WIP] Llava support
Pull Request -
State: open - Opened by guocuimi 10 days ago
#351 - upgrade pytorch to 2.5.1
Pull Request -
State: closed - Opened by guocuimi 10 days ago
#350 - misc: remove legacy logic to support quantization for other types.
Pull Request -
State: closed - Opened by guocuimi 23 days ago
#349 - quetstion about awq
Issue -
State: closed - Opened by sitabulaixizawaluduo 26 days ago
- 3 comments
#348 - will `callback` be protected by GIL
Issue -
State: closed - Opened by tp-nan about 1 month ago
- 1 comment
#347 - Upgrade pytorch to 2.5.0
Pull Request -
State: closed - Opened by guocuimi about 1 month ago
#346 - ci: build cuda 12.4 for scalellm cpp images
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#345 - ci: run package test in docker
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#344 - ci: use venv instead of conda in package test
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#343 - Revert "port cuda changes"
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#342 - ci: update python version for package test
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#341 - upgrade pytorch to 2.4.1
Pull Request -
State: closed - Opened by guocuimi about 2 months ago
#340 - ut: add more tests for different warp layout
Pull Request -
State: closed - Opened by guocuimi 2 months ago
#339 - misc: attention kernel refactoring
Pull Request -
State: closed - Opened by guocuimi 2 months ago
#338 - [misc] read flashinfer kernel code and add comments
Pull Request -
State: closed - Opened by guocuimi 2 months ago
#337 - ci: added pip cache to avoid redownloading
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#336 - ut: added fp8 kv unittests for flash infer kernel
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#335 - refactor: move paged kv related logic into paged_kv_t
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#334 - feat: added pass-in alibi slopes support for flash infer kernel
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#333 - refactor: replaced last_page_len with kv_indptr for flash infer kernel
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#332 - ut: added unittests for flash infer kernels
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#331 - kernel: port flash infer handler + wrapper logics
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#330 - refactor: move flash attn and flash infer into attention folder
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#329 - kernel: added script to generate instantiation for flashinfer kernels
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#328 - refactor: flatten block tables to 1d tensor
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#327 - kernel: added flash infer attention impl
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#326 - feat: fix and use marlin kernel for awq by default
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#325 - refactor: added static switch for marlin kernel dispatch
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#324 - fix: put item into asyncio.Queue in a thread-safe way
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#323 - Will the result callback called in a threadsafe/coruntine safe way? #322
Issue -
State: closed - Opened by tp-nan 3 months ago
- 7 comments
#321 - ci: allow build without requiring a physical gpu device
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#320 - cmake: make includes private and disable jinja2cpp build
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#319 - fix: clean up build warnings: "LOG" redefined
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#318 - refactor: clean up build warnings and refactor marlin kernels
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#317 - test: added unittests for marlin kernels
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#316 - build: speed up compilation for marlin kernels
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#315 - feat: added awq marlin qlinear
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#314 - kernel: port awq repack kernel
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#313 - feat: added fused column parallel linear
Pull Request -
State: closed - Opened by guocuimi 3 months ago
#312 - feat: added gptq marlin qlinear layer
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#311 - refactor: remove the logic loading individual weight from shared partitions
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#310 - RuntimeError: Timed out
Issue -
State: open - Opened by spongxin 4 months ago
- 1 comment
#309 - rust: upgrade rust libs to latest version
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#308 - Mistral large GPTQ model inference problem
Issue -
State: closed - Opened by drdaliang 4 months ago
- 3 comments
Labels: investigation needed
#307 - kernel: port gptq marlin kernel and fp8 marlin kernel
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#306 - refactor: move models to upper folder
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#305 - fix: move eos out of stop token list to honor ignore_eos option
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#304 - The process terminated before reaching the specified max_tokens after setting ignore_ros=True and max_tokens.
Issue -
State: closed - Opened by HowardChenRV 4 months ago
- 3 comments
#303 - feat: added marlin qlinear support
Pull Request -
State: open - Opened by guocuimi 4 months ago
#302 - test: added unittests for marlin fp16xint4 gemm
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#301 - kernel: support kernel test in python via pybind
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#300 - model: added gemma2 with softcap and sliding window support
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#299 - test: added unittests for attention sliding window
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#298 - kernel: port softcap support for flash attention
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#297 - ci: fix pytest version to avoid flakiness
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#296 - feat: added sliding window support for QWen2
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#295 - model: added qwen2 support
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#294 - triton: fix build error and add example with unittest
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#293 - fix: handle unfinished utf8 bytes for tiktoken tokenizer
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#292 - feat: added THUDM/glm-4* support
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#291 - Deployment of glm-4-9b-chat model fails with SentencePiece tokenizer error
Issue -
State: closed - Opened by dengyingxu 4 months ago
- 4 comments
#290 - ci: added clang-format-ignore file to exclude generated files
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#289 - kernel: added triton kernel build support
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#288 - debug: added environment collection script.
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#287 - kernel: added marlin dense and sparse kernels
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#286 - ci: disable pip cache to avoid hash mismatch error
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#285 - refactor: remove exllama kernels
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#284 - pypi: fix invalid classifier
Pull Request -
State: closed - Opened by guocuimi 4 months ago
#275 - [Issue] Qwen-14B-Chat init fail and performance issue.
Issue -
State: open - Opened by liutongxuan 5 months ago
- 2 comments
#261 - bugfix: fix multiple definition issue.
Pull Request -
State: open - Opened by liutongxuan 5 months ago
#246 - [wip] feat: add embeddings support
Pull Request -
State: open - Opened by guocuimi 5 months ago
#139 - [kernel] added half2 specialization for layernorm kernel
Pull Request -
State: open - Opened by dongxianzhe 7 months ago
#128 - [model] add support for mixtral moe model
Pull Request -
State: open - Opened by 936187425 8 months ago
#124 - benchmark test script
Pull Request -
State: open - Opened by ShijiaTang 8 months ago
- 2 comments
#105 - [workflow] added clang-format workflow
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#104 - [workflow] added clang-format for pull_requests
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#103 - [model] added support for google Gemma-2b model
Pull Request -
State: closed - Opened by 936187425 8 months ago
- 1 comment
#102 - [refactor] moved top_k and top_p from sampler to logits process.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#101 - [feat] added speculative engine class without implementation.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#100 - [feat] added engine type to allow LLM and SSM share sequence.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#99 - [refactor] move model output process logic into batch
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#98 - [feat] added dynamic split-fuse support in continuous scheduler
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#97 - added layernorm benchmark
Pull Request -
State: closed - Opened by dongxianzhe 8 months ago
#96 - [fix] adjust kv_cache_pos to give at least one token to generate logits
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#95 - [fix] added small page size support for flash attention.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#94 - [feat] return prompt string directly in echo mode to avoid decode cost and avoid showing appended prefix tokens.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#93 - [feat] add max tokens to process to support dynamic split-fuse
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#92 - [fix] replace submodules git path with https path to avoid permission issue.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#91 - [fix] use https instead of git to avoid permission issue.
Pull Request -
State: closed - Opened by guocuimi 8 months ago
#90 - [refactor] move batch related logic into a class
Pull Request -
State: closed - Opened by guocuimi 9 months ago
#89 - [feat] added LRU policy into prefix cache.
Pull Request -
State: closed - Opened by guocuimi 9 months ago
#88 - [refactor] avoid name conflict with torch::indexing::Slice
Pull Request -
State: closed - Opened by guocuimi 9 months ago
#87 - [feat] enable prefix cache in block manager
Pull Request -
State: closed - Opened by guocuimi 9 months ago
- 1 comment
#86 - [feat] added prefix cache to share kv cache across sequences.
Pull Request -
State: closed - Opened by guocuimi 9 months ago
#85 - [feat] add block id lifecycle management for block sharing scenarios.
Pull Request -
State: closed - Opened by guocuimi 9 months ago
#84 - ScaleLLM Roadmap
Issue -
State: open - Opened by guocuimi 9 months ago
- 3 comments
Labels: roadmap
#83 - [baichuan2-7b] random core dump in offline batched inference.
Issue -
State: closed - Opened by liutongxuan 9 months ago
- 2 comments
#82 - [models] fix chatglm model issue.
Pull Request -
State: closed - Opened by guocuimi 9 months ago