sgl-project/sglang issues and pull requests

#1538 - Move scheduler code from tp_worker.py to scheduler.py

Pull Request - State: closed - Opened by merrymercy about 1 month ago

#1537 - fix ipv6 url when warm up model

Pull Request - State: closed - Opened by cauyxy about 1 month ago - 1 comment

#1536 - [Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim'

Pull Request - State: closed - Opened by mssongit about 1 month ago - 5 comments

#1535 - [Fix] Fix AttributeError in Qwen2.5(huggingface model) LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_module_name'

Pull Request - State: closed - Opened by mssongit about 1 month ago - 1 comment

#1534 - Improve process creation

Pull Request - State: closed - Opened by merrymercy about 1 month ago

#1533 - [Bug] ValueError: The memory capacity is unbalanced

Issue - State: closed - Opened by chuangzhidan about 1 month ago - 2 comments

#1532 - Make detokenizer_manager.py not asyncio

Pull Request - State: closed - Opened by merrymercy about 1 month ago

#1531 - Organize image inputs

Pull Request - State: closed - Opened by hnyls2002 about 1 month ago

#1530 - Multiple minor fixes

Pull Request - State: closed - Opened by merrymercy about 1 month ago

#1529 - [Event] Update meeting link

Pull Request - State: closed - Opened by Ying1123 about 1 month ago

#1528 - Add float8 dynamic quant to torchao_utils

Pull Request - State: closed - Opened by jerryzh168 about 1 month ago

#1527 - [Feature] VLLM 6.0 support

Issue - State: closed - Opened by arunpatala about 1 month ago - 2 comments

#1526 - [Bug] IndexError: list index out of range

Issue - State: closed - Opened by lvxianfeng-git about 1 month ago - 3 comments

#1525 - [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B

Pull Request - State: closed - Opened by Ying1123 about 1 month ago

#1524 - minor: fix config

Pull Request - State: closed - Opened by hnyls2002 about 1 month ago

#1523 - [Feature] add support for llama 3.2

Issue - State: closed - Opened by Stealthwriter about 1 month ago - 7 comments

#1522 - [Bug] Unable to use gptq or awq with torch.compile (8*A40)

Issue - State: open - Opened by smallstepman about 1 month ago - 8 comments

#1521 - [FIX] Catch syntax error of Regex Guide to avoid crash

Pull Request - State: closed - Opened by du00cs about 1 month ago

#1520 - [bugfix]Add modelscope package to avoid docker image without modelscope

Pull Request - State: closed - Opened by KylinMountain about 1 month ago - 4 comments

#1519 - Accuracy reduction of Lora

Issue - State: closed - Opened by yileld about 1 month ago - 2 comments

#1518 - Update Dockerfile

Pull Request - State: closed - Opened by KylinMountain about 1 month ago

#1517 - [Bug] no module modelscope using docker compose to start sglang

Issue - State: closed - Opened by KylinMountain about 1 month ago - 3 comments

#1515 - How to study the code?

Issue - State: closed - Opened by TJ949 about 2 months ago

#1514 - [Feature] _get_pixel_values needs to return tgt_sizes

Issue - State: open - Opened by huangzl18883 about 2 months ago - 1 comment

#1513 - [Fix] Ignore model import error

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1512 - Release v0.3.2

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1511 - Revert "kernel: use tensor cores for flashinfer gqa kernels"

Pull Request - State: closed - Opened by Ying1123 about 2 months ago - 3 comments

#1510 - [Fix] Fix clean_up_tokenization_spaces in tokenizer

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1509 - [Bug] tensor parallel run error

Issue - State: closed - Opened by jerryzh168 about 2 months ago - 1 comment

#1508 - Add support for tie_word_embeddings when loading weights + support for SmolLM

Pull Request - State: closed - Opened by TianyiQ about 2 months ago - 2 comments

#1507 - [CI] Update nightly eval

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1506 - [Bug] LLaVa-next does not work for single image processing

Issue - State: closed - Opened by ThomasBenzshawel about 2 months ago - 1 comment

#1505 - AWQ performance tracking

Issue - State: open - Opened by zhyncs about 2 months ago - 1 comment
Labels: performance

#1504 - Possible timing side-channels caused by shared prefix

Issue - State: open - Opened by Unik-lif about 2 months ago - 2 comments

#1503 - Simplify bench_latency.py

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1502 - Update test_srt_backend.py

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1501 - [Bug] radixcache stack_overflow

Issue - State: closed - Opened by luzengxiangcn about 2 months ago - 1 comment

#1500 - [CI] Move AMD test to a separate file

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1499 - debug radixcache stack_overflow

Pull Request - State: closed - Opened by luzengxiangcn about 2 months ago - 1 comment

#1498 - [WIP] Spec infer with EAGLE2

Pull Request - State: open - Opened by yukavio about 2 months ago - 41 comments

#1497 - MoE torch compile

Pull Request - State: closed - Opened by ispobock about 2 months ago - 2 comments

#1496 - Fix the overhead due to penalizer in bench_latency

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1495 - Fix RuntimeEndpoint.select method

Pull Request - State: closed - Opened by jeffrey-fong about 2 months ago - 7 comments

#1494 - minor: add mla fp8 test

Pull Request - State: closed - Opened by zhyncs about 2 months ago

#1493 - [Community] Add open collective sponsor link to README

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1492 - Update dockerfile to include datamodel_code_generator

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1491 - Add AMD tests to CI

Pull Request - State: closed - Opened by Ying1123 about 2 months ago - 3 comments

#1490 - [API, Feature] Support response prefill for openai API

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1489 - Add a unit test for data parallelism

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1488 - Better unit tests for adding a new model

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1487 - Development Roadmap (2024 Q4)

Issue - State: open - Opened by Ying1123 about 2 months ago - 7 comments

#1486 - doc: update backend

Pull Request - State: closed - Opened by zhyncs about 2 months ago

#1485 - [Bug] tp-4 start timeout

Issue - State: closed - Opened by siddhatiwari about 2 months ago - 1 comment

#1484 - Add MLA gsm8k eval

Pull Request - State: closed - Opened by ispobock about 2 months ago

#1483 - chore: bump v0.3.1.post3

Pull Request - State: closed - Opened by zhyncs about 2 months ago

#1482 - Fix triton head num

Pull Request - State: closed - Opened by ispobock about 2 months ago

#1481 - fix incorrect links in documentation

Pull Request - State: closed - Opened by rchen19 about 2 months ago - 1 comment

#1480 - [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch

Pull Request - State: closed - Opened by liangan1 about 2 months ago - 9 comments

#1479 - [Bug] Deepseek-V2.5 capture cuda graph failed

Issue - State: closed - Opened by halexan about 2 months ago - 2 comments

#1477 - [Bug] The sglang cannot reach the preset concurrency level.

Issue - State: closed - Opened by rangehow about 2 months ago - 5 comments

#1476 - Add OLMoE

Pull Request - State: closed - Opened by Muennighoff about 2 months ago

#1475 - minor: add quant eval compared with base

Pull Request - State: closed - Opened by zhyncs about 2 months ago - 1 comment

#1473 - [Bug] The engine hangs after requesting health_generate 190 times.

Issue - State: closed - Opened by unix1986 about 2 months ago - 6 comments

#1472 - Fix env vars in bench_latency

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1471 - [Performance] Add triton kernels for LoRA

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1470 - Release v0.3.1.post2

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1469 - Fix padding in the cuda graph

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1467 - [Bug] illegal memory access encountered

Issue - State: closed - Opened by wonderisland about 2 months ago - 3 comments

#1466 - [Bug] enable-mixed-chunk may cause the regex request get wrong result and output_token_logprobs

Issue - State: closed - Opened by liuteng about 2 months ago

#1465 - Debug schedule optimization

Pull Request - State: closed - Opened by hnyls2002 about 2 months ago

#1464 - fix: creat new dict everytime for putting new frame

Pull Request - State: closed - Opened by Luodian about 2 months ago

#1463 - [Bug] oom,torch.OutOfMemoryError: seems to only use one gpu on A800-80G,available 40g on each card

Issue - State: closed - Opened by chuangzhidan about 2 months ago - 5 comments

#1461 - [WIP] Prometheus Metrics

Pull Request - State: closed - Opened by binarycrayon about 2 months ago - 3 comments

#1460 - [Question]Why is the default value of max_prefill_tokens 16384?

Issue - State: closed - Opened by wjj19950828 about 2 months ago

#1459 - Support double sparsity

Pull Request - State: closed - Opened by andy-yang-1 about 2 months ago - 19 comments
Labels: high priority

#1458 - [Event] Add public meeting invite to README

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

#1457 - Fuse top_k and top_k in the sampler

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1456 - Pr fix max workers

Pull Request - State: closed - Opened by wellhowtosay about 2 months ago - 1 comment

#1455 - [Bug] OOM when runing `bench_serving` with DeepSeekCoder-V2-Lite.

Issue - State: closed - Opened by zh-zheng about 2 months ago - 3 comments

#1454 - Fix oom issues with fp8 for llama

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1453 - [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419)

Pull Request - State: closed - Opened by HaiShaw about 2 months ago

#1452 - Add bench_server_latency.py

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1451 - Fix schedule bug

Pull Request - State: closed - Opened by hnyls2002 about 2 months ago

#1450 - fix schedule bug

Pull Request - State: closed - Opened by hnyls2002 about 2 months ago

#1449 - Fixed n>1 causing list index out of range with VLM

Pull Request - State: closed - Opened by jasonyux about 2 months ago - 2 comments

#1448 - Fix attention backend

Pull Request - State: closed - Opened by ispobock about 2 months ago

#1447 - Enable MLA by default

Pull Request - State: closed - Opened by ispobock about 2 months ago

#1446 - [Bug] Performance issue on MoE with torch.compile

Issue - State: closed - Opened by ispobock about 2 months ago - 1 comment
Labels: performance

#1445 - Release 0.3.1.post1

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1444 - Add OLMoE model

Pull Request - State: closed - Opened by janimo about 2 months ago

#1443 - [Bug] The latest Sglang docker image cannot start online services

Issue - State: closed - Opened by CedricHwong about 2 months ago - 2 comments

#1442 - Fix torch compile for deepseek-v2

Pull Request - State: closed - Opened by ispobock about 2 months ago - 6 comments

#1441 - Simplify sampler and its error handling

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1440 - Clean up model loader

Pull Request - State: closed - Opened by merrymercy about 2 months ago

#1439 - [Bug] Llama 405B FP8 causes OOM on 16xA40

Issue - State: open - Opened by sumukshashidhar about 2 months ago - 2 comments

#1438 - Add constrained_json_whitespace_pattern to ServerArgs

Pull Request - State: closed - Opened by zifeitong about 2 months ago

#1436 - [Feature] Add initial support for sequence parallelism

Pull Request - State: open - Opened by Ying1123 about 2 months ago

#1435 - [Feature] Expert parallelism support

Issue - State: open - Opened by chongli-uw about 2 months ago - 1 comment
Labels: enhancement

#1434 - [Bug] Nonsense and slow output under high concurrency

Issue - State: closed - Opened by tongyx361 about 2 months ago - 2 comments

#1433 - [Feature] Support LoRA path renaming and add LoRA serving benchmarks

Pull Request - State: closed - Opened by Ying1123 about 2 months ago

GitHub / sgl-project/sglang issues and pull requests