InternLM/lmdeploy issues and pull requests

#2626 - small block_m for sm7.x

Pull Request - State: closed - Opened by grimoire 20 days ago
Labels: improvement

#2624 - [Bug] 对InternVL模型进行推理时，图像编码阶段gpu-cpu的传输时间过长

Issue - State: open - Opened by Dimensionzw 22 days ago - 2 comments

#2623 - [Feature] Does Qwen2-VL support W4A16 in turbomind engine?

Issue - State: closed - Opened by BlueBlueFF 23 days ago - 1 comment

#2622 - [Bug] qwen2-vl-72b无法使用

Issue - State: open - Opened by bltcn 23 days ago - 2 comments

#2621 - MoE support for turbomind

Pull Request - State: closed - Opened by lzhangzz 23 days ago - 9 comments
Labels: enhancement

#2620 - Copy sglang/bench_serving.py to lmdeploy as serving benchmark script

Pull Request - State: open - Opened by lvhan028 23 days ago
Labels: improvement

#2619 - refactor for multi backends in dlinfer

Pull Request - State: closed - Opened by CyCle1024 23 days ago
Labels: improvement

#2618 - Raise an error for the wrong chat template

Pull Request - State: closed - Opened by AllentDan 23 days ago
Labels: improvement

#2617 - [ci] React dailytest workflow

Pull Request - State: closed - Opened by zhulinJulia24 23 days ago

#2616 - [Bug] Internvl2-8B模型量化后推理速度减慢

Issue - State: open - Opened by guozhiyao 23 days ago - 2 comments

#2615 - Add distributed context in pytorch engine to support torchrun

Pull Request - State: closed - Opened by grimoire 23 days ago
Labels: Bug:P1

#2614 - [Feature] need to output prompt logits

Issue - State: closed - Opened by anaivebird 24 days ago - 3 comments

#2613 - [ascend] refactor fused_moe on ascend platform

Pull Request - State: closed - Opened by yao-fengchen 24 days ago
Labels: improvement

#2612 - [ascend] support paged_prefill_attn when batch > 1

Pull Request - State: closed - Opened by yao-fengchen 24 days ago
Labels: improvement

#2611 - [Bug] When TP = 4 and prefix cache is enabled, no result is generated.

Issue - State: open - Opened by rbao2018 24 days ago - 1 comment

#2610 - [Feature] qwen2.5 是否可以支持tool_calls传入？

Issue - State: closed - Opened by akai-shuuichi 24 days ago - 4 comments
Labels: awaiting response, Stale

#2609 - internvl2量化是否支持自定义calib-dataset？

Issue - State: open - Opened by guozhiyao 25 days ago - 1 comment

#2608 - [Bug] InternVL2-26B model load extremely slow

Issue - State: open - Opened by HappyNotHappy 25 days ago

#2607 - Add barrier to prevent TP nccl kernel waiting.

Pull Request - State: closed - Opened by grimoire 25 days ago
Labels: improvement

#2605 - Support mllama for pytorch engine

Pull Request - State: closed - Opened by AllentDan 26 days ago
Labels: enhancement

#2604 - [Bug] InternVL2-2B的推理速度慢，发现是视觉特征提取的耗时很长

Issue - State: open - Opened by fong-git 26 days ago - 13 comments

#2603 - OOM Issue

Issue - State: closed - Opened by poppybrown 26 days ago - 6 comments
Labels: awaiting response, Stale

#2601 - Fix spacing in ascend user guide

Pull Request - State: closed - Opened by Superskyyy 26 days ago
Labels: documentation

#2600 - [Feature] TurbomindEngine generate LogitsProcessor

Issue - State: closed - Opened by BlueBlueFF 27 days ago - 1 comment

#2599 - [Feature] 自定义的多模态模型(不在HF中)如何适配进行推理加速

Issue - State: open - Opened by GZL11 27 days ago - 2 comments

#2598 - [Bug] 华为昇腾910b3使用lmdeploy镜像报错

Issue - State: closed - Opened by zhouyuustc 27 days ago - 2 comments

#2597 - 使用xtuner chat 和 lmdeploy chat 调用未量化的模型，一直生成答案而不停止

Issue - State: closed - Opened by liguoyu666 28 days ago - 3 comments
Labels: awaiting response, Stale

#2596 - Support llama3.2 LLM models in turbomind engine

Pull Request - State: closed - Opened by lvhan028 29 days ago
Labels: improvement

#2595 - [Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter.

Issue - State: open - Opened by zhulinJulia24 29 days ago

#2594 - [Doc]: Lock sphinx version

Pull Request - State: closed - Opened by RunningLeon 29 days ago

#2593 - [Bug] 使用lmdeploy serve api_server启动的服务，多线程调用openai接口调用时，即使设置了 temperature=0, seed = 17002729324219322736,输出结果仍然具有随机性。

Issue - State: closed - Opened by tiaotiaosong 29 days ago - 1 comment

#2592 - [Bug] pipeline如何指定显卡进行推理，例如我想使用cuda:1进行推理，目前文档还没发现如何设置

Issue - State: closed - Opened by aizhweiwei 29 days ago - 5 comments
Labels: awaiting response, Stale

#2591 - support cross-cache

Pull Request - State: closed - Opened by grimoire 29 days ago

#2590 - [Bug] Qwen/Qwen2-VL-7B-Instruct 用--tp 2直接弹出Docker了，不用--tp运行正常。

Issue - State: closed - Opened by wangaocheng 29 days ago - 7 comments
Labels: awaiting response, Stale

#2589 - v1/chat/interactive方式调用lmdeploy，interactive_mode=true, 图片刷新，问题不变，回答结果永远一样，这是什么原因导致的呢？

Issue - State: open - Opened by zhoulin2545210131 30 days ago - 15 comments

#2588 - fix: make exit_flag verification for ascend more general

Pull Request - State: closed - Opened by CyCle1024 30 days ago
Labels: Bug:P1

#2587 - feat(ascend): support w4a16

Pull Request - State: closed - Opened by yao-fengchen 30 days ago
Labels: enhancement

#2586 - 本地大模型下，是不是不支持CPU下部署？ #217

Issue - State: closed - Opened by cristianohello 30 days ago - 1 comment
Labels: awaiting response

#2585 - [Bug] 华为昇腾（Atlas 800T A2）使用lmdeploy

Issue - State: open - Opened by holoodst 30 days ago - 25 comments

#2584 - [ci] add pytorch kvint testcase into function regresstion

Pull Request - State: closed - Opened by zhulinJulia24 30 days ago - 1 comment

#2583 - Add a workaround for saving internvl2 with latest transformers

Pull Request - State: closed - Opened by AllentDan 30 days ago - 1 comment
Labels: improvement

#2582 - [Bug] 使用两个GPU运行“Qwen2-VL-2B”，启动后还没有任何请求就有一颗GPU自动满载运行

Issue - State: closed - Opened by jianliao 30 days ago - 10 comments
Labels: awaiting response, Stale

#2581 - support release pipeline

Pull Request - State: open - Opened by irexyc about 1 month ago - 1 comment
Labels: improvement

#2580 - [Bug] InternVL 26B在推理视频的时候，生成速度很慢。

Issue - State: closed - Opened by Mrgengli about 1 month ago

#2579 - update copyright

Pull Request - State: closed - Opened by lvhan028 about 1 month ago

#2578 - Update Dockerfile_aarch64_ascend

Pull Request - State: closed - Opened by wangyuanxiong-hub about 1 month ago - 8 comments

#2577 - Add instruction for downloading models from openmind hub

Pull Request - State: closed - Opened by cookieyyds about 1 month ago
Labels: documentation

#2576 - Support glm-4v-9b.

Pull Request - State: closed - Opened by pdx1989 about 1 month ago

#2570 - cudaGetDeviceCount() Error in docker

Issue - State: open - Opened by karndeb about 1 month ago - 1 comment

#2569 - [ci] use local requirements for test workflow

Pull Request - State: closed - Opened by zhulinJulia24 about 1 month ago - 1 comment

#2568 - Fix llama3.2-1b inference error by handling tie_word_embedding

Pull Request - State: closed - Opened by grimoire about 1 month ago
Labels: improvement

#2567 - [Docs] 问lmdeploy中的w8a8-triton实现是否有实际llm（如llama2，qwen2）的推理速度加速效果的benchmark测试？

Issue - State: open - Opened by brisker about 1 month ago - 2 comments

#2566 - [Bug] internvl 4B awq推理 Engine loop failed with error: module 'triton.language' has no attribute 'inline_asm_elementwise'

Issue - State: closed - Opened by Mrgengli about 1 month ago - 13 comments

#2565 - [Bug] Qwen2-VL占用显存过大导致OOM

Issue - State: open - Opened by cmpute about 1 month ago - 8 comments

#2564 - [Bug] Unable to use Ctrl+C to normally end service on the Ascend platform

Issue - State: closed - Opened by jiajie-yang about 1 month ago - 3 comments

#2563 - support downloading models from openmind_hub

Pull Request - State: closed - Opened by cookieyyds about 1 month ago
Labels: enhancement

#2562 - [Feature] Are there any plans to support Molmo?

Issue - State: open - Opened by sudanl about 1 month ago - 2 comments

#2561 - [Bug] Serve OpenAI VLM With GLM-4V Doesn't Accept Base64 Encoded Images

Issue - State: open - Opened by iamthemulti about 1 month ago - 7 comments

#2560 - set capture mode thread_local

Pull Request - State: closed - Opened by grimoire about 1 month ago
Labels: Bug:P1

#2559 - set outlines<0.1.0

Pull Request - State: closed - Opened by AllentDan about 1 month ago
Labels: Bug:P1

#2558 - Add tool role for langchain usage

Pull Request - State: closed - Opened by AllentDan about 1 month ago
Labels: improvement

#2557 - 建了一个多模态大模型技术交流群，欢迎加入交流

Issue - State: closed - Opened by feihuamantian about 1 month ago - 1 comment

#2556 - [Bug] 在 docker 方式运行 Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4， throw RuntimeError: Unsupported quant method: gptq

Issue - State: closed - Opened by xukecheng about 1 month ago - 1 comment

#2555 - [Bug] Failed to deploy InternVL2-1B on V100 with pytorch engine

Issue - State: closed - Opened by austingg about 1 month ago - 2 comments

#2554 - [Bug] does TurboMind support Qwen2-VL-2B-Instruct in lmdeploy v0.6.1

Issue - State: closed - Opened by LinJianping about 1 month ago - 1 comment

#2553 - optimize paged attention on triton3

Pull Request - State: closed - Opened by grimoire about 1 month ago - 1 comment
Labels: improvement

#2552 - [Feature] Support chat completion stream with tool calls

Issue - State: open - Opened by nbczb1996 about 1 month ago - 1 comment

#2549 - [Feature] Please add support for Llama 3.2

Issue - State: open - Opened by cuong-dyania about 1 month ago

#2548 - [Docs] InternVL2-4B使用awq量化报错

Issue - State: closed - Opened by Mrgengli about 1 month ago - 4 comments
Labels: awaiting response, Stale

#2546 - [Bug] qwen2 vl does not support the turbomind engine

Issue - State: closed - Opened by windar427 about 1 month ago - 2 comments

#2544 - [Bug] RuntimeError: CUDA error: operation not permitted when stream is capturing

Issue - State: open - Opened by LinJianping about 1 month ago - 15 comments

#2543 - [Bug] accelerate包发生'NoneType' object has no attribute '_parameters'

Issue - State: closed - Opened by mouweng about 1 month ago - 1 comment

#2542 - [Bug] Providing tool response back to llm for output generation is broken for llama3.1 8B

Issue - State: open - Opened by S1LV3RJ1NX about 1 month ago - 2 comments

#2541 - [Feature] 请支持molmo视觉大模型

Issue - State: open - Opened by win4r about 1 month ago

#2540 - [Feature] Add argument to disable FastAPI docs

Pull Request - State: closed - Opened by mouweng about 1 month ago
Labels: improvement

#2539 - [Feature] Add argument to disable FastAPI docs

Pull Request - State: closed - Opened by mouweng about 1 month ago

#2537 - [Bug] 升级0.6.1之后 proxy 的 api-keys 参数不支持逗号分隔的list了

Issue - State: closed - Opened by snachx about 1 month ago - 4 comments

#2536 - 启用prefix_cache时，同分辨率图片会缓存命中情况下如何推理时区分

Issue - State: closed - Opened by zhuchen1109 about 1 month ago - 1 comment

#2535 - add check for device with cap 7.x

Pull Request - State: closed - Opened by grimoire about 1 month ago
Labels: improvement

#2534 - [Bug] 910b multi-card reasoning is very slow.

Issue - State: open - Opened by the-nine-nation about 1 month ago - 1 comment

#2533 - [Bug] NPU是否支持glm4v-9b的部署推理

Issue - State: open - Opened by Sunxiaohu0406 about 1 month ago - 2 comments
Labels: awaiting response, Stale

#2532 - [Feature] 对比vllm推理速度

Issue - State: closed - Opened by senlice about 1 month ago - 8 comments
Labels: awaiting response, Stale

#2531 - [Bug] v0.6.1 Qwen2-VL-7B

Issue - State: closed - Opened by smallflyingpig about 1 month ago - 3 comments
Labels: awaiting response, Stale

#2530 - [Bug] 多模态MLLM的对话模板注册方式

Issue - State: closed - Opened by Sunxiaohu0406 about 1 month ago - 1 comment

#2529 - 关于运行时候时间差异过大的问题

Issue - State: open - Opened by lwdnxu about 1 month ago

#2528 - [Bug] lmdeploy + InternVL2-40B-AWQ hangs under a certain number of asynchronous requests

Issue - State: open - Opened by hkunzhe about 1 month ago - 4 comments

#2527 - fix vl gradio

Pull Request - State: closed - Opened by irexyc about 1 month ago
Labels: Bug:P1

#2526 - [Feature] Please support Llama3.2 and Qwen2.5

Issue - State: closed - Opened by mihara-bot about 1 month ago - 5 comments

#2525 - VÍDEO Portal Zacarias Sabrina Saraiva Miss PPK Concurso

Issue - State: closed - Opened by ilairutt about 1 month ago

#2524 - [Feature] InternVL2-4B turbomind支持

Issue - State: open - Opened by AIFFFENG about 1 month ago

#2523 - [ci] add oc infer test in stable test

Pull Request - State: closed - Opened by zhulinJulia24 about 1 month ago

#2522 - [Bug] error when serving glm4-9b-chat-1m

Issue - State: closed - Opened by YanShuang17 about 1 month ago - 1 comment

#2521 - optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way

Pull Request - State: closed - Opened by jiajie-yang about 1 month ago - 1 comment
Labels: improvement

#2520 - Fix chatglm tokenizer failed when transformers>=4.45.0

Pull Request - State: closed - Opened by AllentDan about 1 month ago
Labels: improvement

#2519 - support yarn in turbomind backend

Pull Request - State: closed - Opened by irexyc about 1 month ago - 1 comment
Labels: enhancement

#2517 - [Feature] Support Llama 3.2 family of models

Issue - State: closed - Opened by vikrantrathore about 2 months ago - 3 comments

#2516 - 请问能否支持通义千问2.5？

Issue - State: closed - Opened by yangpeng666 about 2 months ago - 1 comment
Labels: awaiting response

#2515 - [Bug] llama3.1 70B v1/chat/completions error on Huawei Ascend 910B

Issue - State: open - Opened by nullxjx about 2 months ago - 4 comments

#2514 - push released docker image to aliyun hub

Pull Request - State: closed - Opened by lvhan028 about 2 months ago

#2513 - bump version to v0.6.1

Pull Request - State: closed - Opened by lvhan028 about 2 months ago - 2 comments

GitHub / InternLM/lmdeploy issues and pull requests