Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / InternLM/lmdeploy issues and pull requests
#2626 - small block_m for sm7.x
Pull Request -
State: closed - Opened by grimoire 20 days ago
Labels: improvement
#2624 - [Bug] 对InternVL模型进行推理时,图像编码阶段gpu-cpu的传输时间过长
Issue -
State: open - Opened by Dimensionzw 22 days ago
- 2 comments
#2623 - [Feature] Does Qwen2-VL support W4A16 in turbomind engine?
Issue -
State: closed - Opened by BlueBlueFF 23 days ago
- 1 comment
#2622 - [Bug] qwen2-vl-72b无法使用
Issue -
State: open - Opened by bltcn 23 days ago
- 2 comments
#2621 - MoE support for turbomind
Pull Request -
State: closed - Opened by lzhangzz 23 days ago
- 9 comments
Labels: enhancement
#2620 - Copy sglang/bench_serving.py to lmdeploy as serving benchmark script
Pull Request -
State: open - Opened by lvhan028 23 days ago
Labels: improvement
#2619 - refactor for multi backends in dlinfer
Pull Request -
State: closed - Opened by CyCle1024 23 days ago
Labels: improvement
#2618 - Raise an error for the wrong chat template
Pull Request -
State: closed - Opened by AllentDan 23 days ago
Labels: improvement
#2617 - [ci] React dailytest workflow
Pull Request -
State: closed - Opened by zhulinJulia24 23 days ago
#2616 - [Bug] Internvl2-8B模型量化后推理速度减慢
Issue -
State: open - Opened by guozhiyao 23 days ago
- 2 comments
#2615 - Add distributed context in pytorch engine to support torchrun
Pull Request -
State: closed - Opened by grimoire 23 days ago
Labels: Bug:P1
#2614 - [Feature] need to output prompt logits
Issue -
State: closed - Opened by anaivebird 24 days ago
- 3 comments
#2613 - [ascend] refactor fused_moe on ascend platform
Pull Request -
State: closed - Opened by yao-fengchen 24 days ago
Labels: improvement
#2612 - [ascend] support paged_prefill_attn when batch > 1
Pull Request -
State: closed - Opened by yao-fengchen 24 days ago
Labels: improvement
#2611 - [Bug] When TP = 4 and prefix cache is enabled, no result is generated.
Issue -
State: open - Opened by rbao2018 24 days ago
- 1 comment
#2610 - [Feature] qwen2.5 是否可以支持tool_calls传入?
Issue -
State: closed - Opened by akai-shuuichi 24 days ago
- 4 comments
Labels: awaiting response, Stale
#2609 - internvl2量化是否支持自定义calib-dataset?
Issue -
State: open - Opened by guozhiyao 25 days ago
- 1 comment
#2608 - [Bug] InternVL2-26B model load extremely slow
Issue -
State: open - Opened by HappyNotHappy 25 days ago
#2607 - Add barrier to prevent TP nccl kernel waiting.
Pull Request -
State: closed - Opened by grimoire 25 days ago
Labels: improvement
#2605 - Support mllama for pytorch engine
Pull Request -
State: closed - Opened by AllentDan 26 days ago
Labels: enhancement
#2604 - [Bug] InternVL2-2B的推理速度慢,发现是视觉特征提取的耗时很长
Issue -
State: open - Opened by fong-git 26 days ago
- 13 comments
#2603 - OOM Issue
Issue -
State: closed - Opened by poppybrown 26 days ago
- 6 comments
Labels: awaiting response, Stale
#2601 - Fix spacing in ascend user guide
Pull Request -
State: closed - Opened by Superskyyy 26 days ago
Labels: documentation
#2600 - [Feature] TurbomindEngine generate LogitsProcessor
Issue -
State: closed - Opened by BlueBlueFF 27 days ago
- 1 comment
#2599 - [Feature] 自定义的多模态模型(不在HF中)如何适配进行推理加速
Issue -
State: open - Opened by GZL11 27 days ago
- 2 comments
#2598 - [Bug] 华为昇腾910b3使用lmdeploy镜像报错
Issue -
State: closed - Opened by zhouyuustc 27 days ago
- 2 comments
#2597 - 使用xtuner chat 和 lmdeploy chat 调用未量化的模型,一直生成答案而不停止
Issue -
State: closed - Opened by liguoyu666 28 days ago
- 3 comments
Labels: awaiting response, Stale
#2596 - Support llama3.2 LLM models in turbomind engine
Pull Request -
State: closed - Opened by lvhan028 29 days ago
Labels: improvement
#2595 - [Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter.
Issue -
State: open - Opened by zhulinJulia24 29 days ago
#2594 - [Doc]: Lock sphinx version
Pull Request -
State: closed - Opened by RunningLeon 29 days ago
#2593 - [Bug] 使用lmdeploy serve api_server启动的服务,多线程调用openai接口调用时,即使设置了 temperature=0, seed = 17002729324219322736,输出结果仍然具有随机性。
Issue -
State: closed - Opened by tiaotiaosong 29 days ago
- 1 comment
#2592 - [Bug] pipeline如何指定显卡进行推理,例如我想使用cuda:1进行推理,目前文档还没发现如何设置
Issue -
State: closed - Opened by aizhweiwei 29 days ago
- 5 comments
Labels: awaiting response, Stale
#2591 - support cross-cache
Pull Request -
State: closed - Opened by grimoire 29 days ago
#2590 - [Bug] Qwen/Qwen2-VL-7B-Instruct 用--tp 2直接弹出Docker了,不用--tp运行正常。
Issue -
State: closed - Opened by wangaocheng 29 days ago
- 7 comments
Labels: awaiting response, Stale
#2589 - v1/chat/interactive方式调用lmdeploy,interactive_mode=true, 图片刷新,问题不变,回答结果永远一样,这是什么原因导致的呢?
Issue -
State: open - Opened by zhoulin2545210131 30 days ago
- 15 comments
#2588 - fix: make exit_flag verification for ascend more general
Pull Request -
State: closed - Opened by CyCle1024 30 days ago
Labels: Bug:P1
#2587 - feat(ascend): support w4a16
Pull Request -
State: closed - Opened by yao-fengchen 30 days ago
Labels: enhancement
#2586 - 本地大模型下,是不是不支持CPU下部署? #217
Issue -
State: closed - Opened by cristianohello 30 days ago
- 1 comment
Labels: awaiting response
#2585 - [Bug] 华为昇腾(Atlas 800T A2)使用lmdeploy
Issue -
State: open - Opened by holoodst 30 days ago
- 25 comments
#2584 - [ci] add pytorch kvint testcase into function regresstion
Pull Request -
State: closed - Opened by zhulinJulia24 30 days ago
- 1 comment
#2583 - Add a workaround for saving internvl2 with latest transformers
Pull Request -
State: closed - Opened by AllentDan 30 days ago
- 1 comment
Labels: improvement
#2582 - [Bug] 使用两个GPU运行“Qwen2-VL-2B”,启动后还没有任何请求就有一颗GPU自动满载运行
Issue -
State: closed - Opened by jianliao 30 days ago
- 10 comments
Labels: awaiting response, Stale
#2581 - support release pipeline
Pull Request -
State: open - Opened by irexyc about 1 month ago
- 1 comment
Labels: improvement
#2580 - [Bug] InternVL 26B在推理视频的时候,生成速度很慢。
Issue -
State: closed - Opened by Mrgengli about 1 month ago
#2579 - update copyright
Pull Request -
State: closed - Opened by lvhan028 about 1 month ago
#2578 - Update Dockerfile_aarch64_ascend
Pull Request -
State: closed - Opened by wangyuanxiong-hub about 1 month ago
- 8 comments
#2577 - Add instruction for downloading models from openmind hub
Pull Request -
State: closed - Opened by cookieyyds about 1 month ago
Labels: documentation
#2576 - Support glm-4v-9b.
Pull Request -
State: closed - Opened by pdx1989 about 1 month ago
#2570 - cudaGetDeviceCount() Error in docker
Issue -
State: open - Opened by karndeb about 1 month ago
- 1 comment
#2569 - [ci] use local requirements for test workflow
Pull Request -
State: closed - Opened by zhulinJulia24 about 1 month ago
- 1 comment
#2568 - Fix llama3.2-1b inference error by handling tie_word_embedding
Pull Request -
State: closed - Opened by grimoire about 1 month ago
Labels: improvement
#2567 - [Docs] 问lmdeploy中的w8a8-triton实现是否有 实际llm(如llama2,qwen2)的推理速度加速效果的benchmark测试?
Issue -
State: open - Opened by brisker about 1 month ago
- 2 comments
#2566 - [Bug] internvl 4B awq推理 Engine loop failed with error: module 'triton.language' has no attribute 'inline_asm_elementwise'
Issue -
State: closed - Opened by Mrgengli about 1 month ago
- 13 comments
#2565 - [Bug] Qwen2-VL占用显存过大导致OOM
Issue -
State: open - Opened by cmpute about 1 month ago
- 8 comments
#2564 - [Bug] Unable to use Ctrl+C to normally end service on the Ascend platform
Issue -
State: closed - Opened by jiajie-yang about 1 month ago
- 3 comments
#2563 - support downloading models from openmind_hub
Pull Request -
State: closed - Opened by cookieyyds about 1 month ago
Labels: enhancement
#2562 - [Feature] Are there any plans to support Molmo?
Issue -
State: open - Opened by sudanl about 1 month ago
- 2 comments
#2561 - [Bug] Serve OpenAI VLM With GLM-4V Doesn't Accept Base64 Encoded Images
Issue -
State: open - Opened by iamthemulti about 1 month ago
- 7 comments
#2560 - set capture mode thread_local
Pull Request -
State: closed - Opened by grimoire about 1 month ago
Labels: Bug:P1
#2559 - set outlines<0.1.0
Pull Request -
State: closed - Opened by AllentDan about 1 month ago
Labels: Bug:P1
#2558 - Add tool role for langchain usage
Pull Request -
State: closed - Opened by AllentDan about 1 month ago
Labels: improvement
#2557 - 建了一个多模态大模型技术交流群,欢迎加入交流
Issue -
State: closed - Opened by feihuamantian about 1 month ago
- 1 comment
#2556 - [Bug] 在 docker 方式运行 Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4, throw RuntimeError: Unsupported quant method: gptq
Issue -
State: closed - Opened by xukecheng about 1 month ago
- 1 comment
#2555 - [Bug] Failed to deploy InternVL2-1B on V100 with pytorch engine
Issue -
State: closed - Opened by austingg about 1 month ago
- 2 comments
#2554 - [Bug] does TurboMind support Qwen2-VL-2B-Instruct in lmdeploy v0.6.1
Issue -
State: closed - Opened by LinJianping about 1 month ago
- 1 comment
#2553 - optimize paged attention on triton3
Pull Request -
State: closed - Opened by grimoire about 1 month ago
- 1 comment
Labels: improvement
#2552 - [Feature] Support chat completion stream with tool calls
Issue -
State: open - Opened by nbczb1996 about 1 month ago
- 1 comment
#2549 - [Feature] Please add support for Llama 3.2
Issue -
State: open - Opened by cuong-dyania about 1 month ago
#2548 - [Docs] InternVL2-4B使用awq量化报错
Issue -
State: closed - Opened by Mrgengli about 1 month ago
- 4 comments
Labels: awaiting response, Stale
#2546 - [Bug] qwen2 vl does not support the turbomind engine
Issue -
State: closed - Opened by windar427 about 1 month ago
- 2 comments
#2544 - [Bug] RuntimeError: CUDA error: operation not permitted when stream is capturing
Issue -
State: open - Opened by LinJianping about 1 month ago
- 15 comments
#2543 - [Bug] accelerate包发生'NoneType' object has no attribute '_parameters'
Issue -
State: closed - Opened by mouweng about 1 month ago
- 1 comment
#2542 - [Bug] Providing tool response back to llm for output generation is broken for llama3.1 8B
Issue -
State: open - Opened by S1LV3RJ1NX about 1 month ago
- 2 comments
#2541 - [Feature] 请支持molmo视觉大模型
Issue -
State: open - Opened by win4r about 1 month ago
#2540 - [Feature] Add argument to disable FastAPI docs
Pull Request -
State: closed - Opened by mouweng about 1 month ago
Labels: improvement
#2539 - [Feature] Add argument to disable FastAPI docs
Pull Request -
State: closed - Opened by mouweng about 1 month ago
#2537 - [Bug] 升级0.6.1之后 proxy 的 api-keys 参数不支持逗号分隔的list了
Issue -
State: closed - Opened by snachx about 1 month ago
- 4 comments
#2536 - 启用prefix_cache时,同分辨率图片会缓存命中情况下如何推理时区分
Issue -
State: closed - Opened by zhuchen1109 about 1 month ago
- 1 comment
#2535 - add check for device with cap 7.x
Pull Request -
State: closed - Opened by grimoire about 1 month ago
Labels: improvement
#2534 - [Bug] 910b multi-card reasoning is very slow.
Issue -
State: open - Opened by the-nine-nation about 1 month ago
- 1 comment
#2533 - [Bug] NPU是否支持glm4v-9b的部署推理
Issue -
State: open - Opened by Sunxiaohu0406 about 1 month ago
- 2 comments
Labels: awaiting response, Stale
#2532 - [Feature] 对比vllm推理速度
Issue -
State: closed - Opened by senlice about 1 month ago
- 8 comments
Labels: awaiting response, Stale
#2531 - [Bug] v0.6.1 Qwen2-VL-7B
Issue -
State: closed - Opened by smallflyingpig about 1 month ago
- 3 comments
Labels: awaiting response, Stale
#2530 - [Bug] 多模态MLLM的对话模板注册方式
Issue -
State: closed - Opened by Sunxiaohu0406 about 1 month ago
- 1 comment
#2529 - 关于运行时候时间差异过大的问题
Issue -
State: open - Opened by lwdnxu about 1 month ago
#2528 - [Bug] lmdeploy + InternVL2-40B-AWQ hangs under a certain number of asynchronous requests
Issue -
State: open - Opened by hkunzhe about 1 month ago
- 4 comments
#2527 - fix vl gradio
Pull Request -
State: closed - Opened by irexyc about 1 month ago
Labels: Bug:P1
#2526 - [Feature] Please support Llama3.2 and Qwen2.5
Issue -
State: closed - Opened by mihara-bot about 1 month ago
- 5 comments
#2525 - VÍDEO Portal Zacarias Sabrina Saraiva Miss PPK Concurso
Issue -
State: closed - Opened by ilairutt about 1 month ago
#2524 - [Feature] InternVL2-4B turbomind支持
Issue -
State: open - Opened by AIFFFENG about 1 month ago
#2523 - [ci] add oc infer test in stable test
Pull Request -
State: closed - Opened by zhulinJulia24 about 1 month ago
#2522 - [Bug] error when serving glm4-9b-chat-1m
Issue -
State: closed - Opened by YanShuang17 about 1 month ago
- 1 comment
#2521 - optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way
Pull Request -
State: closed - Opened by jiajie-yang about 1 month ago
- 1 comment
Labels: improvement
#2520 - Fix chatglm tokenizer failed when transformers>=4.45.0
Pull Request -
State: closed - Opened by AllentDan about 1 month ago
Labels: improvement
#2519 - support yarn in turbomind backend
Pull Request -
State: closed - Opened by irexyc about 1 month ago
- 1 comment
Labels: enhancement
#2517 - [Feature] Support Llama 3.2 family of models
Issue -
State: closed - Opened by vikrantrathore about 2 months ago
- 3 comments
#2516 - 请问能否支持通义千问2.5?
Issue -
State: closed - Opened by yangpeng666 about 2 months ago
- 1 comment
Labels: awaiting response
#2515 - [Bug] llama3.1 70B v1/chat/completions error on Huawei Ascend 910B
Issue -
State: open - Opened by nullxjx about 2 months ago
- 4 comments
#2514 - push released docker image to aliyun hub
Pull Request -
State: closed - Opened by lvhan028 about 2 months ago
#2513 - bump version to v0.6.1
Pull Request -
State: closed - Opened by lvhan028 about 2 months ago
- 2 comments