GitHub / EvolvingLMMs-Lab/lmms-eval issues and pull requests
#762 - Dev/olympiad bench
Pull Request -
State: open - Opened by Luodian 15 days ago
#761 - Issues when using llava_vid.py to evaluate the llava-onevision-qwen2-7b-ov model
Issue -
State: open - Opened by SplendidYuan 15 days ago
#760 - [fix] Fixed applying process_* twice on resAns for VQAv2
Pull Request -
State: closed - Opened by Avelina9X 15 days ago
- 1 comment
#759 - [fix] update korean benchmark's post_prompt
Pull Request -
State: closed - Opened by jujeongho0 15 days ago
- 1 comment
#757 - Remove Claude GitHub workflows for code review
Pull Request -
State: open - Opened by Luodian 17 days ago
- 1 comment
#756 - [New Benchmark] Request for supporting TimeScope
Pull Request -
State: open - Opened by ruili33 17 days ago
- 1 comment
#755 - Bug while initializing the evaluation of Qwen2VL-7B
Issue -
State: closed - Opened by tianjiedai 19 days ago
- 2 comments
Labels: bug
#754 - [Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64
Pull Request -
State: closed - Opened by LiamLian0727 19 days ago
- 1 comment
#753 - Fix handling of encode_video output in vllm.py so each frame’s Base64
Pull Request -
State: closed - Opened by LiamLian0727 19 days ago
- 1 comment
#752 - Fix handling of encode_video output in vllm.py so each frame’s Base64…
Pull Request -
State: closed - Opened by LiamLian0727 20 days ago
- 1 comment
#751 - 两种评测方式结果差别很大是怎么回事。
Issue -
State: open - Opened by mathCrazyy 20 days ago
- 6 comments
#750 - The results of evaluation on multi-GPUs versus single-GPU differ from each other
Issue -
State: open - Opened by shuangchen2003 21 days ago
#749 - Add claude GitHub actions 1752118403023
Pull Request -
State: closed - Opened by Luodian 21 days ago
- 1 comment
#748 - when to support lmms-lab/MMMU interleaved_format evaluation?
Issue -
State: open - Opened by zhiwenhou1227 21 days ago
#747 - Feature/inference throughput logging
Pull Request -
State: closed - Opened by Luodian 21 days ago
- 2 comments
#746 - [WIP] Feature/terminal UI integration
Pull Request -
State: open - Opened by Luodian 21 days ago
- 2 comments
#745 - Add comprehensive test suite for CI/CD
Pull Request -
State: closed - Opened by Luodian 21 days ago
- 2 comments
#744 - Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025)
Pull Request -
State: closed - Opened by dunghuynhandy 21 days ago
- 8 comments
#744 - Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025)
Pull Request -
State: open - Opened by dunghuynhandy 21 days ago
- 7 comments
#743 - Example Script for llava-hf.
Issue -
State: closed - Opened by Danielement321 22 days ago
#742 - [New Benchmark] Add Video-TT Benchmark
Pull Request -
State: closed - Opened by dongyh20 23 days ago
- 1 comment
#741 - Revert "Pass in the 'cache_dir' to use local cache"
Pull Request -
State: closed - Opened by kcz358 23 days ago
- 1 comment
#740 - fix: add `max_frames_num` to `OpenAICompatible`
Pull Request -
State: closed - Opened by loongfeili 23 days ago
- 1 comment
#739 - How to download dataset correctly?
Issue -
State: closed - Opened by Royalcat2022 24 days ago
#738 - Abnormal GPU Utilization When Evaluating Qwen 2.5 VL 72B
Issue -
State: open - Opened by LiamLian0727 25 days ago
- 8 comments
#737 - [Bugfix] Add min image resolution requirement for vLLM Qwen-VL models
Pull Request -
State: closed - Opened by zch42 25 days ago
- 1 comment
#736 - Add min image resolution requirement for vLLM Qwen-VL models
Pull Request -
State: closed - Opened by zch42 25 days ago
- 1 comment
#735 - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1
Issue -
State: open - Opened by anurag-198 28 days ago
- 4 comments
Labels: bug
#734 - Fix three bugs in the codebase
Pull Request -
State: closed - Opened by Luodian 29 days ago
- 3 comments
#733 - [fix] cli_evaluate to properly handle Namespace arguments
Pull Request -
State: closed - Opened by Luodian 29 days ago
- 3 comments
#732 - how to download the dataset correctly?
Issue -
State: open - Opened by Sugar929 30 days ago
- 7 comments
Labels: enhancement
#731 - bugfix: missing fields in doc when using --log_samples
Pull Request -
State: open - Opened by VincentYCYao about 1 month ago
- 6 comments
#730 - cli_evaluate has option to pass args for use within a module but this does not work
Issue -
State: closed - Opened by lusxvr about 1 month ago
- 1 comment
#729 - Support for MiMo-VL-7B
Issue -
State: open - Opened by ChangtiWu about 1 month ago
- 2 comments
Labels: bug
#728 - [vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD
Pull Request -
State: closed - Opened by kylesayrs about 1 month ago
- 1 comment
#727 - add new tasks MMVU and Visual Web Bench
Pull Request -
State: closed - Opened by pbcong about 1 month ago
- 2 comments
#726 - vLLM Eval Not Working
Issue -
State: closed - Opened by chuyishang about 1 month ago
#725 - Add CameraBench_VQA
Pull Request -
State: closed - Opened by chancharikmitra about 1 month ago
- 1 comment
#724 - [FIX] Resolve MMMU-test submission file generation issue
Pull Request -
State: closed - Opened by xyyandxyy about 1 month ago
- 3 comments
#723 - [Bug] fix a bug in post processing stage of ScienceQA.
Pull Request -
State: closed - Opened by ashun989 about 1 month ago
- 1 comment
#722 - Is it possible to add support for llava-hf/llava-onevision-qwen2-7b-ov-hf?
Issue -
State: open - Opened by tianyuzong about 1 month ago
- 2 comments
Labels: bug, discussion
#721 - [Feat] LMMS-Eval 0.4
Pull Request -
State: open - Opened by Luodian about 1 month ago
- 4 comments
#720 - [Minor] typo fixed in task_guide.md
Pull Request -
State: closed - Opened by JulyanZhu about 1 month ago
- 1 comment
#719 - [fix] update korean benchmark's post_prompt
Pull Request -
State: closed - Opened by jujeongho0 about 1 month ago
- 1 comment
#718 - [Update] Doc to messages and setup Unified Server for GPT as a judge
Pull Request -
State: closed - Opened by kcz358 about 1 month ago
- 2 comments
#717 - [fix ] Refactor Accelerator initialization
Pull Request -
State: closed - Opened by Luodian about 1 month ago
#716 - Update sentencepiece dependency and add new parameters to mathvista_t…
Pull Request -
State: closed - Opened by Luodian about 1 month ago
#715 - add mmsi-bench (https://arxiv.org/abs/2505.23764)
Pull Request -
State: closed - Opened by sihany077 about 1 month ago
- 1 comment
#714 - [fix] ensure synchronization not be used without distributed execution
Pull Request -
State: closed - Opened by debugdoctor about 2 months ago
#713 - add mmvu task
Pull Request -
State: closed - Opened by pbcong about 2 months ago
#712 - [Bug?] Doc fields with "image" not saved to jsonl file when using log_samples=True
Issue -
State: open - Opened by YongchengYAO about 2 months ago
- 2 comments
#711 - [Bug] MM prompt too long during vllm generation
Issue -
State: open - Opened by Zhuofeng-Li about 2 months ago
- 2 comments
#710 - Resolving Evaluation Issues with Custom OPENAI_API_BASE in Model Assessment
Issue -
State: closed - Opened by gjgjh about 2 months ago
- 1 comment
#709 - Dev/tomato
Pull Request -
State: closed - Opened by Devininthelab about 2 months ago
#708 - Fix test_llava for pytest
Pull Request -
State: closed - Opened by Luodian about 2 months ago
Labels: codex
#707 - Fix duplicate return statement
Pull Request -
State: closed - Opened by Luodian about 2 months ago
Labels: codex
#706 - Fix test_llava for pytest
Pull Request -
State: closed - Opened by Luodian about 2 months ago
Labels: codex
#705 - [Fix] Enable the ignored API_URL in the MathVista evaluation.
Pull Request -
State: closed - Opened by MoyusiteruIori about 2 months ago
#704 - [Fix] Minor fix on some warning messages
Pull Request -
State: closed - Opened by kcz358 about 2 months ago
#703 - [BUG] name 'get_task_dict' is not defined when using --tasks list_with_num command
Issue -
State: open - Opened by xyyandxyy about 2 months ago
- 4 comments
#702 - Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos
Pull Request -
State: closed - Opened by hanoonaR about 2 months ago
#701 - How to evaluate generation tasks in MLVU?
Issue -
State: open - Opened by KevinZeng08 about 2 months ago
- 3 comments
#700 - [FIX] Add macro metric to task xlrs-lite
Pull Request -
State: closed - Opened by nanocm about 2 months ago
#699 - [Fix] Fix evaluator crash with accelerate backend when num_processes=1
Pull Request -
State: closed - Opened by miikatoi about 2 months ago
#698 - Performance gap between python and vLLM backends using the same Qwen2-VL-2B-Instruct model on ChartQA task
Issue -
State: open - Opened by mikittt 2 months ago
- 3 comments
#697 - pip 0.3.4
Pull Request -
State: closed - Opened by pufanyi 2 months ago
#696 - refcoco dataset with qwen2vl
Issue -
State: closed - Opened by Ezra-Yu 2 months ago
- 1 comment
#695 - ValueError on distributed_executor_backend 'accelerate' when using 1 process with qwen2.5-vl
Issue -
State: closed - Opened by miikatoi 2 months ago
- 2 comments
#694 - [TASK & FIX] add task VideoEval-Pro and fix tar file concat
Pull Request -
State: closed - Opened by iamtonymwt 2 months ago
#693 - [FIX] Fix parameter name in qwen25vl.sh
Pull Request -
State: closed - Opened by MasterBeeee 2 months ago
#692 - [Main Update] Doc to messages feature support and Split simple and chat mode
Pull Request -
State: closed - Opened by kcz358 2 months ago
- 1 comment
#691 - Added direction for locally cached dataset in task_guide.md
Pull Request -
State: closed - Opened by JulyanZhu 2 months ago
#690 - Pass in the 'cache_dir' to use local cache
Pull Request -
State: closed - Opened by JulyanZhu 2 months ago
- 1 comment
#689 - When using video dataset tasks, 'cache_dir' isn't set to use local cache
Issue -
State: closed - Opened by JulyanZhu 2 months ago
- 4 comments
#688 - Development Roadmap (LMMs-Eval 0.4)
Issue -
State: open - Opened by Luodian 2 months ago
- 1 comment
Labels: documentation, discussion
#687 - [fix] Fix task listing in CLI evaluation by updating to use 'all_tasks' instead of 'list_all_tasks' for improved clarity.
Pull Request -
State: closed - Opened by Luodian 2 months ago
#686 - fix --tasks list output logging
Pull Request -
State: closed - Opened by pbcong 2 months ago
#685 - lmms_eval Task not find
Issue -
State: closed - Opened by 846529069 2 months ago
#684 - [Task] Add new task: XLRS-Bench-lite
Pull Request -
State: closed - Opened by nanocm 2 months ago
- 2 comments
#683 - [Task] V*-Bench (Visual Star Benchmark)
Pull Request -
State: closed - Opened by Luodian 2 months ago
- 1 comment
#682 - Support multiple inference instance run on a single GPU
Issue -
State: open - Opened by nanocm 2 months ago
- 1 comment
#681 - [WIP/dev] integrate llm as judge
Pull Request -
State: closed - Opened by Luodian 2 months ago
#680 - support distributed executor backend - torchrun
Pull Request -
State: closed - Opened by kaiyuyue 2 months ago
#679 - support distributed executor backend - torchrun
Pull Request -
State: closed - Opened by kaiyuyue 2 months ago
#678 - [fix] add reminder for `interleave_visual` for Qwen2.5-VL, update version control.
Pull Request -
State: closed - Opened by Luodian 2 months ago
#677 - Is it normal that the Qwen-2.5-VL result in the MME Benchmark test is significantly lower than InternVL3?
Issue -
State: open - Opened by PrLeung 2 months ago
- 8 comments
#676 - delete unused test_parse.py file
Pull Request -
State: closed - Opened by pbcong 3 months ago
#675 - Update README.md
Pull Request -
State: closed - Opened by pufanyi 3 months ago
#674 - Implmentation of Video Frame Sampling of Qwen2.5VL
Issue -
State: open - Opened by EugeneLiu01 3 months ago
- 1 comment
#673 - question about score evaluation
Issue -
State: open - Opened by es2ilver 3 months ago
- 1 comment
#672 - [BUG] Batch Size 16 evaluation
Issue -
State: closed - Opened by OliverGrace 3 months ago
#671 - [Fix] Correct rating logic for VITATECS benchmark
Pull Request -
State: closed - Opened by erfanbsoula 3 months ago
#670 - Why does my process hang if I am using multiple GPUs
Issue -
State: open - Opened by harshg99 3 months ago
- 1 comment
#669 - Fix VQAv2 utils.py
Pull Request -
State: closed - Opened by OliverGrace 3 months ago
#668 - [fix] modify the GPT evaluation model
Pull Request -
State: closed - Opened by jujeongho0 3 months ago
#667 - [fix] modify the GPT evaluation model
Pull Request -
State: closed - Opened by jujeongho0 3 months ago
#666 - Fix issue with killing process in sglang
Pull Request -
State: closed - Opened by ravi03071991 3 months ago
#665 - Fixes Metadata Reading from Released PLM Checkpoints
Pull Request -
State: closed - Opened by mmaaz60 3 months ago
#664 - [New Model] Are there any plans to add Gemma 3 to the harness?
Issue -
State: open - Opened by antocodes 3 months ago
- 1 comment
#663 - SIGTERM Error during videomme and mlvu evaluation
Issue -
State: open - Opened by MYMY-young 3 months ago
- 3 comments