EvolvingLMMs-Lab/lmms-eval issues and pull requests

#762 - Dev/olympiad bench

Pull Request - State: open - Opened by Luodian 15 days ago

#761 - Issues when using llava_vid.py to evaluate the llava-onevision-qwen2-7b-ov model

Issue - State: open - Opened by SplendidYuan 15 days ago

#760 - [fix] Fixed applying process_* twice on resAns for VQAv2

Pull Request - State: closed - Opened by Avelina9X 15 days ago - 1 comment

#759 - [fix] update korean benchmark's post_prompt

Pull Request - State: closed - Opened by jujeongho0 15 days ago - 1 comment

#757 - Remove Claude GitHub workflows for code review

Pull Request - State: open - Opened by Luodian 17 days ago - 1 comment

#756 - [New Benchmark] Request for supporting TimeScope

Pull Request - State: open - Opened by ruili33 17 days ago - 1 comment

#755 - Bug while initializing the evaluation of Qwen2VL-7B

Issue - State: closed - Opened by tianjiedai 19 days ago - 2 comments
Labels: bug

#754 - [Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64

Pull Request - State: closed - Opened by LiamLian0727 19 days ago - 1 comment

#753 - Fix handling of encode_video output in vllm.py so each frame’s Base64

Pull Request - State: closed - Opened by LiamLian0727 19 days ago - 1 comment

#752 - Fix handling of encode_video output in vllm.py so each frame’s Base64…

Pull Request - State: closed - Opened by LiamLian0727 20 days ago - 1 comment

#751 - 两种评测方式结果差别很大是怎么回事。

Issue - State: open - Opened by mathCrazyy 20 days ago - 6 comments

#750 - The results of evaluation on multi-GPUs versus single-GPU differ from each other

Issue - State: open - Opened by shuangchen2003 21 days ago

#749 - Add claude GitHub actions 1752118403023

Pull Request - State: closed - Opened by Luodian 21 days ago - 1 comment

#748 - when to support lmms-lab/MMMU interleaved_format evaluation?

Issue - State: open - Opened by zhiwenhou1227 21 days ago

#747 - Feature/inference throughput logging

Pull Request - State: closed - Opened by Luodian 21 days ago - 2 comments

#746 - [WIP] Feature/terminal UI integration

Pull Request - State: open - Opened by Luodian 21 days ago - 2 comments

#745 - Add comprehensive test suite for CI/CD

Pull Request - State: closed - Opened by Luodian 21 days ago - 2 comments

#744 - Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025)

Pull Request - State: closed - Opened by dunghuynhandy 21 days ago - 8 comments

#744 - Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025)

Pull Request - State: open - Opened by dunghuynhandy 21 days ago - 7 comments

#743 - Example Script for llava-hf.

Issue - State: closed - Opened by Danielement321 22 days ago

#742 - [New Benchmark] Add Video-TT Benchmark

Pull Request - State: closed - Opened by dongyh20 23 days ago - 1 comment

#741 - Revert "Pass in the 'cache_dir' to use local cache"

Pull Request - State: closed - Opened by kcz358 23 days ago - 1 comment

#740 - fix: add `max_frames_num` to `OpenAICompatible`

Pull Request - State: closed - Opened by loongfeili 23 days ago - 1 comment

#739 - How to download dataset correctly?

Issue - State: closed - Opened by Royalcat2022 24 days ago

#738 - Abnormal GPU Utilization When Evaluating Qwen 2.5 VL 72B

Issue - State: open - Opened by LiamLian0727 25 days ago - 8 comments

#737 - [Bugfix] Add min image resolution requirement for vLLM Qwen-VL models

Pull Request - State: closed - Opened by zch42 25 days ago - 1 comment

#736 - Add min image resolution requirement for vLLM Qwen-VL models

Pull Request - State: closed - Opened by zch42 25 days ago - 1 comment

#735 - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1

Issue - State: open - Opened by anurag-198 28 days ago - 4 comments
Labels: bug

#734 - Fix three bugs in the codebase

Pull Request - State: closed - Opened by Luodian 29 days ago - 3 comments

#733 - [fix] cli_evaluate to properly handle Namespace arguments

Pull Request - State: closed - Opened by Luodian 29 days ago - 3 comments

#732 - how to download the dataset correctly?

Issue - State: open - Opened by Sugar929 30 days ago - 7 comments
Labels: enhancement

#731 - bugfix: missing fields in doc when using --log_samples

Pull Request - State: open - Opened by VincentYCYao about 1 month ago - 6 comments

#730 - cli_evaluate has option to pass args for use within a module but this does not work

Issue - State: closed - Opened by lusxvr about 1 month ago - 1 comment

#729 - Support for MiMo-VL-7B

Issue - State: open - Opened by ChangtiWu about 1 month ago - 2 comments
Labels: bug

#728 - [vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD

Pull Request - State: closed - Opened by kylesayrs about 1 month ago - 1 comment

#727 - add new tasks MMVU and Visual Web Bench

Pull Request - State: closed - Opened by pbcong about 1 month ago - 2 comments

#726 - vLLM Eval Not Working

Issue - State: closed - Opened by chuyishang about 1 month ago

#725 - Add CameraBench_VQA

Pull Request - State: closed - Opened by chancharikmitra about 1 month ago - 1 comment

#724 - [FIX] Resolve MMMU-test submission file generation issue

Pull Request - State: closed - Opened by xyyandxyy about 1 month ago - 3 comments

#723 - [Bug] fix a bug in post processing stage of ScienceQA.

Pull Request - State: closed - Opened by ashun989 about 1 month ago - 1 comment

#722 - Is it possible to add support for llava-hf/llava-onevision-qwen2-7b-ov-hf?

Issue - State: open - Opened by tianyuzong about 1 month ago - 2 comments
Labels: bug, discussion

#721 - [Feat] LMMS-Eval 0.4

Pull Request - State: open - Opened by Luodian about 1 month ago - 4 comments

#720 - [Minor] typo fixed in task_guide.md

Pull Request - State: closed - Opened by JulyanZhu about 1 month ago - 1 comment

#719 - [fix] update korean benchmark's post_prompt

Pull Request - State: closed - Opened by jujeongho0 about 1 month ago - 1 comment

#718 - [Update] Doc to messages and setup Unified Server for GPT as a judge

Pull Request - State: closed - Opened by kcz358 about 1 month ago - 2 comments

#717 - [fix ] Refactor Accelerator initialization

Pull Request - State: closed - Opened by Luodian about 1 month ago

#716 - Update sentencepiece dependency and add new parameters to mathvista_t…

Pull Request - State: closed - Opened by Luodian about 1 month ago

#715 - add mmsi-bench (https://arxiv.org/abs/2505.23764)

Pull Request - State: closed - Opened by sihany077 about 1 month ago - 1 comment

#714 - [fix] ensure synchronization not be used without distributed execution

Pull Request - State: closed - Opened by debugdoctor about 2 months ago

#713 - add mmvu task

Pull Request - State: closed - Opened by pbcong about 2 months ago

#712 - [Bug?] Doc fields with "image" not saved to jsonl file when using log_samples=True

Issue - State: open - Opened by YongchengYAO about 2 months ago - 2 comments

#711 - [Bug] MM prompt too long during vllm generation

Issue - State: open - Opened by Zhuofeng-Li about 2 months ago - 2 comments

#710 - Resolving Evaluation Issues with Custom OPENAI_API_BASE in Model Assessment

Issue - State: closed - Opened by gjgjh about 2 months ago - 1 comment

#709 - Dev/tomato

Pull Request - State: closed - Opened by Devininthelab about 2 months ago

#708 - Fix test_llava for pytest

Pull Request - State: closed - Opened by Luodian about 2 months ago
Labels: codex

#707 - Fix duplicate return statement

Pull Request - State: closed - Opened by Luodian about 2 months ago
Labels: codex

#706 - Fix test_llava for pytest

Pull Request - State: closed - Opened by Luodian about 2 months ago
Labels: codex

#705 - [Fix] Enable the ignored API_URL in the MathVista evaluation.

Pull Request - State: closed - Opened by MoyusiteruIori about 2 months ago

#704 - [Fix] Minor fix on some warning messages

Pull Request - State: closed - Opened by kcz358 about 2 months ago

#703 - [BUG] name 'get_task_dict' is not defined when using --tasks list_with_num command

Issue - State: open - Opened by xyyandxyy about 2 months ago - 4 comments

#702 - Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos

Pull Request - State: closed - Opened by hanoonaR about 2 months ago

#701 - How to evaluate generation tasks in MLVU?

Issue - State: open - Opened by KevinZeng08 about 2 months ago - 3 comments

#700 - [FIX] Add macro metric to task xlrs-lite

Pull Request - State: closed - Opened by nanocm about 2 months ago

#699 - [Fix] Fix evaluator crash with accelerate backend when num_processes=1

Pull Request - State: closed - Opened by miikatoi about 2 months ago

#698 - Performance gap between python and vLLM backends using the same Qwen2-VL-2B-Instruct model on ChartQA task

Issue - State: open - Opened by mikittt 2 months ago - 3 comments

#697 - pip 0.3.4

Pull Request - State: closed - Opened by pufanyi 2 months ago

#696 - refcoco dataset with qwen2vl

Issue - State: closed - Opened by Ezra-Yu 2 months ago - 1 comment

#695 - ValueError on distributed_executor_backend 'accelerate' when using 1 process with qwen2.5-vl

Issue - State: closed - Opened by miikatoi 2 months ago - 2 comments

#694 - [TASK & FIX] add task VideoEval-Pro and fix tar file concat

Pull Request - State: closed - Opened by iamtonymwt 2 months ago

#693 - [FIX] Fix parameter name in qwen25vl.sh

Pull Request - State: closed - Opened by MasterBeeee 2 months ago

#692 - [Main Update] Doc to messages feature support and Split simple and chat mode

Pull Request - State: closed - Opened by kcz358 2 months ago - 1 comment

#691 - Added direction for locally cached dataset in task_guide.md

Pull Request - State: closed - Opened by JulyanZhu 2 months ago

#690 - Pass in the 'cache_dir' to use local cache

Pull Request - State: closed - Opened by JulyanZhu 2 months ago - 1 comment

#689 - When using video dataset tasks, 'cache_dir' isn't set to use local cache

Issue - State: closed - Opened by JulyanZhu 2 months ago - 4 comments

#688 - Development Roadmap (LMMs-Eval 0.4)

Issue - State: open - Opened by Luodian 2 months ago - 1 comment
Labels: documentation, discussion

#687 - [fix] Fix task listing in CLI evaluation by updating to use 'all_tasks' instead of 'list_all_tasks' for improved clarity.

Pull Request - State: closed - Opened by Luodian 2 months ago

#686 - fix --tasks list output logging

Pull Request - State: closed - Opened by pbcong 2 months ago

#685 - lmms_eval Task not find

Issue - State: closed - Opened by 846529069 2 months ago

#684 - [Task] Add new task: XLRS-Bench-lite

Pull Request - State: closed - Opened by nanocm 2 months ago - 2 comments

#683 - [Task] V*-Bench (Visual Star Benchmark)

Pull Request - State: closed - Opened by Luodian 2 months ago - 1 comment

#682 - Support multiple inference instance run on a single GPU

Issue - State: open - Opened by nanocm 2 months ago - 1 comment

#681 - [WIP/dev] integrate llm as judge

Pull Request - State: closed - Opened by Luodian 2 months ago

#680 - support distributed executor backend - torchrun

Pull Request - State: closed - Opened by kaiyuyue 2 months ago

#679 - support distributed executor backend - torchrun

Pull Request - State: closed - Opened by kaiyuyue 2 months ago

#678 - [fix] add reminder for `interleave_visual` for Qwen2.5-VL, update version control.

Pull Request - State: closed - Opened by Luodian 2 months ago

#677 - Is it normal that the Qwen-2.5-VL result in the MME Benchmark test is significantly lower than InternVL3?

Issue - State: open - Opened by PrLeung 2 months ago - 8 comments

#676 - delete unused test_parse.py file

Pull Request - State: closed - Opened by pbcong 3 months ago

#675 - Update README.md

Pull Request - State: closed - Opened by pufanyi 3 months ago

#674 - Implmentation of Video Frame Sampling of Qwen2.5VL

Issue - State: open - Opened by EugeneLiu01 3 months ago - 1 comment

#673 - question about score evaluation

Issue - State: open - Opened by es2ilver 3 months ago - 1 comment

#672 - [BUG] Batch Size 16 evaluation

Issue - State: closed - Opened by OliverGrace 3 months ago

#671 - [Fix] Correct rating logic for VITATECS benchmark

Pull Request - State: closed - Opened by erfanbsoula 3 months ago

#670 - Why does my process hang if I am using multiple GPUs

Issue - State: open - Opened by harshg99 3 months ago - 1 comment

#669 - Fix VQAv2 utils.py

Pull Request - State: closed - Opened by OliverGrace 3 months ago

#668 - [fix] modify the GPT evaluation model

Pull Request - State: closed - Opened by jujeongho0 3 months ago

#667 - [fix] modify the GPT evaluation model

Pull Request - State: closed - Opened by jujeongho0 3 months ago

#666 - Fix issue with killing process in sglang

Pull Request - State: closed - Opened by ravi03071991 3 months ago

#665 - Fixes Metadata Reading from Released PLM Checkpoints

Pull Request - State: closed - Opened by mmaaz60 3 months ago

#664 - [New Model] Are there any plans to add Gemma 3 to the harness?

Issue - State: open - Opened by antocodes 3 months ago - 1 comment

#663 - SIGTERM Error during videomme and mlvu evaluation

Issue - State: open - Opened by MYMY-young 3 months ago - 3 comments

GitHub / EvolvingLMMs-Lab/lmms-eval issues and pull requests