InternLM/lmdeploy issues and pull requests

#2474 - [Bug] The parameters `n=number_outputs` does not work in the function: `response = client.chat.completions.create`

Issue - State: open - Opened by leoozy 2 days ago

#2473 - Support user-sepcified data type

Pull Request - State: open - Opened by lvhan028 4 days ago
Labels: enhancement

#2472 - fix [Bug] TypeError: Got unsupported ScalarType BFloat16

Pull Request - State: open - Opened by SeitaroShinagawa 4 days ago

#2471 - Ascend NPU support

Issue - State: open - Opened by zer0py2c 4 days ago - 2 comments
Labels: awaiting response

#2470 - [Bug/Feature] Keep-alive `\n` should be sent roughly every 30 seconds in Event Stream responses to prevent connection timeout errors for cases of very long ITL

Issue - State: open - Opened by josephrocca 5 days ago

#2469 - Add silu mul kernel

Pull Request - State: open - Opened by grimoire 5 days ago

#2468 - [Bug] 2x4090 with Llama2 70B silently crashes (i.e. without any error message in DEBUG mode) as of v0.6.0a0 and v0.6.0 (but works fine in previous versions)

Issue - State: open - Opened by josephrocca 6 days ago - 2 comments

#2467 - [Bug] 使用华为昇腾平台昇腾 910A 显卡推理时报错：Get regInfo failed, The binary_info_config.json of socVersion [ascend910] does not support opType [ApplyRotaryPosEmb].

Issue - State: open - Opened by XYZliang 6 days ago - 1 comment
Labels: awaiting response

#2466 - Refactor lora

Pull Request - State: open - Opened by grimoire 6 days ago

#2465 - Support minicpm3-4b

Pull Request - State: open - Opened by AllentDan 7 days ago - 4 comments
Labels: enhancement

#2463 - [Feature] How to use awq with my own dataset

Issue - State: open - Opened by wangzhongren-code 7 days ago

#2462 - [Bug] fail to deploy model from modelscope using [lmdeploy serve api_server]

Issue - State: closed - Opened by kaiwang0112006 7 days ago - 2 comments

#2461 - Fix initialization of runtime_min_p

Pull Request - State: closed - Opened by irexyc 7 days ago
Labels: Bug:P1

#2460 - fix MultinomialSampling operator builder

Pull Request - State: closed - Opened by grimoire 7 days ago
Labels: Bug:P2

#2459 - [Bug] The fisrt response is different with others with same GenerationConfig and a given random_seed.

Issue - State: closed - Opened by zhulinJulia24 7 days ago - 1 comment

#2458 - [Feature] support s-lora in turbomind backend

Issue - State: open - Opened by torinchen 8 days ago - 2 comments

#2457 - [Bug] output not consistent with different max_prefill_token_num for long context input on pytorch engine

Issue - State: open - Opened by RunningLeon 8 days ago

#2456 - InternVL2在做AWQ量化的时候，不能支持自定义的校准数据集吗

Issue - State: open - Opened by tanguozhu 8 days ago - 3 comments

#2455 - MiniCPM3-4B会支持吗？

Issue - State: open - Opened by LIUKAI0815 8 days ago - 1 comment

#2454 - fix tensors on different devices when deploying MiniCPM-V-2_6 with tensor parallelism

Pull Request - State: closed - Opened by irexyc 8 days ago
Labels: Bug:P1

#2453 - [Bug] TypeError: Got unsupported ScalarType BFloat16

Issue - State: open - Opened by SeitaroShinagawa 8 days ago - 2 comments

#2452 - [Bug] lmdeploy部署minicpm-v2_6推理报错

Issue - State: closed - Opened by dfe2342 8 days ago - 4 comments

#2451 - [Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Issue - State: open - Opened by EvoNexusX 8 days ago - 3 comments

#2450 - [Bug] LongCite-glm4-9b awq quantization error

Issue - State: open - Opened by maxin9966 8 days ago

#2449 - support Qwen2-VL with pytorch backend

Pull Request - State: open - Opened by irexyc 8 days ago - 3 comments
Labels: enhancement

#2448 - [Feature] pipe如何输出scores

Issue - State: open - Opened by KooSung 8 days ago - 1 comment
Labels: awaiting response

#2447 - add docs about ascend

Pull Request - State: closed - Opened by yao-fengchen 8 days ago - 1 comment

#2446 - Fix ascend readme

Pull Request - State: closed - Opened by jinminxi104 9 days ago - 1 comment

#2445 - bump version to v0.6.0

Pull Request - State: closed - Opened by lvhan028 9 days ago - 1 comment

#2444 - fix llama3 rotary in pytorch engine

Pull Request - State: closed - Opened by grimoire 9 days ago
Labels: Bug:P1

#2443 - [Bug] How to use w4a16 model in PytorchEngine

Issue - State: closed - Opened by xzmates 9 days ago - 2 comments

#2442 - [Bug] CUDA runtime error when running Llama-3.1-70B-Instruct-AWQ-INT4

Issue - State: open - Opened by rtadewald 9 days ago - 3 comments
Labels: awaiting response

#2441 - [Docs] 自己训练并使用W8A8量化的InternVL v2模型，应该怎么使用python api推理呢？

Issue - State: closed - Opened by Loneseven 9 days ago - 3 comments

#2440 - refactor pytorch engine(ascend)

Pull Request - State: closed - Opened by yao-fengchen 10 days ago
Labels: enhancement

#2439 - [Bug] lmdeploy does not support the regularized lora target module

Issue - State: open - Opened by orzgugu 10 days ago - 1 comment
Labels: awaiting response

#2438 - Support pytorch engine kv int4/int8 quantization

Pull Request - State: open - Opened by AllentDan 10 days ago - 1 comment

#2437 - [Bug] 请问一下，我用qwen2的模型，仿照internlm-xcomposer2中的plora方式训练了一个VL模型，这个模型应该怎样用lmdeploy部署起来

Issue - State: open - Opened by alanayu 10 days ago - 2 comments
Labels: awaiting response

#2436 - [Feature] 能否支持一下qwenvl2

Issue - State: open - Opened by Ranking666 11 days ago - 3 comments
Labels: awaiting response

#2435 - [Bug] main分支EngineGenerationConfig不在初始化中了

Issue - State: closed - Opened by RandomCoins 11 days ago - 3 comments

#2434 - automatically set max_batch_size according to the device when it is not specified

Pull Request - State: closed - Opened by lvhan028 12 days ago
Labels: improvement

#2433 - build nccl in dockerfile for cuda11.8

Pull Request - State: closed - Opened by RunningLeon 13 days ago
Labels: improvement

#2432 - 是否支持embedding模型部署

Issue - State: open - Opened by Toblame 14 days ago - 1 comment

#2431 - [ci] regular update

Pull Request - State: open - Opened by zhulinJulia24 14 days ago

#2430 - [Bug] cogvlm2支持的问题

Issue - State: open - Opened by tdf1995 14 days ago - 1 comment

#2429 - lmdeploy测试4卡/8卡部署api_server用benchmark下代码测试发现qps和吞吐没有很大区别

Issue - State: closed - Opened by sunzx8 14 days ago - 2 comments

#2428 - Fix some issues encountered by modelscope and community

Pull Request - State: closed - Opened by irexyc 15 days ago
Labels: Bug:P1

#2427 - inplace logits process as default

Pull Request - State: closed - Opened by grimoire 15 days ago
Labels: improvement

#2426 - ignore *.pth when download model from model hub

Pull Request - State: closed - Opened by lvhan028 15 days ago - 1 comment
Labels: improvement

#2425 - [Bug] 部署internvl2自己lora微调的模型，显存占用非常高，快速爬升到了60g正常吗

Issue - State: open - Opened by yywangfei 15 days ago - 1 comment

#2424 - [Feature] Profiling GeMM kernel in lmdeploy

Issue - State: open - Opened by DerrickYLJ 15 days ago - 1 comment

#2423 - [Feature] when --tp 2

Issue - State: open - Opened by maxin9966 15 days ago - 6 comments
Labels: awaiting response

#2422 - [Docs] AWQ / GPTQ 部分

Issue - State: open - Opened by Skyseaee 16 days ago

#2421 - build: update ascend dockerfile

Pull Request - State: closed - Opened by CyCle1024 16 days ago
Labels: improvement

#2420 - support min_p sampling parameter

Pull Request - State: closed - Opened by irexyc 16 days ago - 1 comment
Labels: enhancement

#2419 - update actions/download-artifact to v4 to fix security issue

Pull Request - State: closed - Opened by lvhan028 16 days ago

#2418 - [Docs] 关于kv cache 量化

Issue - State: closed - Opened by Root970103 16 days ago - 5 comments
Labels: awaiting response

#2417 - add Ascend get_started

Pull Request - State: closed - Opened by jinminxi104 16 days ago
Labels: documentation

#2416 - [Bug] Met a error when deploying an AWQ model on H20.

Issue - State: closed - Opened by medwang1 17 days ago - 17 comments

#2415 - [Feature] Is there a plan to support the deployment of Qwen2-VL？

Issue - State: open - Opened by ldknight 17 days ago - 2 comments
Labels: awaiting response

#2414 - [Bug] CUDA runtime error: operation not supported /lmdeploy/src/turbomind/utils/allocator.h:197

Issue - State: open - Opened by JingMo 17 days ago - 12 comments

#2413 - import dlinfer before imageencoding

Pull Request - State: closed - Opened by jinminxi104 18 days ago - 3 comments
Labels: improvement

#2412 - [Bug] When using DeepSeek-VL-7B, there is an error raised, because its config doesn't have 'hidden_size'.

Issue - State: open - Opened by zytx121 18 days ago - 1 comment

#2411 - [Feature] Would you consider add qwen2vl?

Issue - State: closed - Opened by PredyDaddy 18 days ago - 2 comments

#2410 - fix get_started user guide unaccessible

Pull Request - State: closed - Opened by lvhan028 18 days ago
Labels: documentation

#2409 - [Bug] Aborted (core dumped)

Issue - State: open - Opened by suwenzhuo 18 days ago - 1 comment

#2408 - [Bug] 多卡部署InternVL2-8B报错Aborted (core dumped)

Issue - State: closed - Opened by gxlover0625 19 days ago - 1 comment

#2407 - [Bug] internlm模型进行bitsandbytes int8量化

Issue - State: closed - Opened by EvoNexusX 19 days ago - 14 comments

#2406 - [Feature] vlm开启openai的serve，如何视频解析？

Issue - State: closed - Opened by maxin9966 19 days ago

#2405 - [Bug] 部署baichuan2-13b-chat报错，是不支持吗？

Issue - State: open - Opened by sxk000 20 days ago

#2403 - rename the ascend dockerfile

Pull Request - State: closed - Opened by lvhan028 21 days ago

#2402 - Torchrun launching multiple api_server

Pull Request - State: open - Opened by AllentDan 21 days ago

#2401 - [ci] add daily test's coverage report

Pull Request - State: closed - Opened by zhulinJulia24 21 days ago

#2400 - [Bug] 2卡internvl2-26b推理，卡间通信是pcie会失败，nvlink会成功，这是为啥

Issue - State: open - Opened by chestnut111 21 days ago - 12 comments

#2399 - [Bug] [TM][ERROR] CUDA runtime error: misaligned address

Issue - State: open - Opened by sleepwalker2017 21 days ago - 6 comments

#2398 - 支持对qwen2-audio-instruct的加速吗

Issue - State: open - Opened by zhanghanweii 21 days ago - 4 comments

#2397 - [Feature] Does/Can lmdeploy work with XLA/TPUs

Issue - State: closed - Opened by radna0 22 days ago - 1 comment

#2396 - fix: make main process exit properly when tp>1 on ascend backend

Pull Request - State: closed - Opened by CyCle1024 22 days ago - 1 comment
Labels: Bug:P1

#2395 - Fix /v1/completions batch order wrong

Pull Request - State: closed - Opened by AllentDan 22 days ago
Labels: Bug:P1

#2394 - [Feature] 增加对于虚拟内存的缓冲时间

Issue - State: closed - Opened by NB-Group 22 days ago - 14 comments

#2393 - [Feature] InternVL2 inference is slower than InternLM-Xcomposer2

Issue - State: closed - Opened by zhaoning1987 23 days ago - 2 comments

#2392 - Inquiry

Issue - State: closed - Opened by xiaoajie738 23 days ago - 2 comments

#2392 - Inquiry

Issue - State: open - Opened by xiaoajie738 23 days ago

#2391 - [BUG]session id is not threadsafe

Issue - State: closed - Opened by tp-nan 23 days ago - 1 comment

#2390 - [Bug] InternLM 2.5 function calling

Issue - State: open - Opened by coffeecode24 23 days ago - 3 comments

#2389 - Model Parallel

Issue - State: closed - Opened by beichenzbc 23 days ago - 2 comments

#2388 - fix cache position for pytorch engine

Pull Request - State: closed - Opened by RunningLeon 23 days ago
Labels: Bug:P2

#2388 - fix cache position for pytorch engine

Pull Request - State: closed - Opened by RunningLeon 23 days ago
Labels: Bug:P2

#2387 - [Bug] error occurs when reset chat session in gradio with image uploded and get response.

Issue - State: open - Opened by zhulinJulia24 24 days ago - 14 comments

#2386 - [Feature] 海光DCU简单测试，希望能支持

Issue - State: open - Opened by luckfu 24 days ago - 2 comments

#2385 - [Bug] 無法在windows上部署 Phi-3.5-vision-instruct

Issue - State: open - Opened by HSIAOKUOWEI 24 days ago - 4 comments

#2385 - [Bug] 無法在windows上部署 Phi-3.5-vision-instruct

Issue - State: open - Opened by HSIAOKUOWEI 24 days ago - 4 comments

#2384 - [Bug] 0.6.0 glm4-9b gptq还是会出现无限吐字的问题

Issue - State: open - Opened by maxin9966 24 days ago - 20 comments

#2384 - [Bug] 0.6.0 glm4-9b gptq还是会出现无限吐字的问题

Issue - State: closed - Opened by maxin9966 24 days ago - 20 comments

#2383 - [Bug] It seems startcode2-7b is not supported, error is PatchedStarcoder2Attention.forward() got an unexpected keyword argument 'cache_position'

Issue - State: closed - Opened by zhulinJulia24 24 days ago - 1 comment

#2383 - [Bug] It seems startcode2-7b is not supported, error is PatchedStarcoder2Attention.forward() got an unexpected keyword argument 'cache_position'

Issue - State: closed - Opened by zhulinJulia24 24 days ago - 1 comment

#2382 - [Bug] run out of tokens error when using llama3-llava-next-8b-hf

Issue - State: closed - Opened by binzhang01 24 days ago - 5 comments

#2381 - [Bug] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32

Issue - State: open - Opened by AmazDeng 24 days ago - 5 comments
Labels: awaiting response, Stale

#2381 - [Bug] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32

Issue - State: closed - Opened by AmazDeng 24 days ago - 6 comments
Labels: awaiting response, Stale

#2380 - [Feature] Reuse the text prompt encoding when processing batch jobs

Issue - State: open - Opened by ZejiaZheng 24 days ago

#2379 - [Bug] model format is "gptq" but group_size is 32. Currently, only 128 is supported

Issue - State: closed - Opened by maxin9966 24 days ago - 2 comments

GitHub / InternLM/lmdeploy issues and pull requests