EleutherAI/lm-evaluation-harness issues and pull requests

#2365 - Extracting vLLM metrics

Issue - State: open - Opened by vsmolyakov 1 day ago

#2364 - Add Unitxt Multimodality Support

Pull Request - State: open - Opened by elronbandel 1 day ago

#2363 - Unitxt Multi Modality Support

Pull Request - State: closed - Opened by elronbandel 1 day ago

#2362 - Which filter value should be used among the accuracy test results?

Issue - State: open - Opened by KKwanhee 4 days ago

#2361 - boolq trust remote code

Issue - State: open - Opened by IvanSedykh 4 days ago

#2360 - [multimodal] llava-1.5-7b-hf doesn't work on `mmmu_val`

Issue - State: open - Opened by BabyChouSr 5 days ago - 4 comments
Labels: bug

#2359 - fix `cost_estimate` script

Pull Request - State: open - Opened by baberabb 5 days ago

#2358 - Improve `docs/model_guide.md` with skeleton template code + description of utils like `Collator` and `Reorderer`

Issue - State: open - Opened by haileyschoelkopf 5 days ago
Labels: documentation, feature request

#2357 - Add metabench task to LM Evaluation Harness

Pull Request - State: open - Opened by kozzy97 5 days ago - 1 comment

#2356 - Add a test for `scripts/write_out.py` and other `scripts/` utils

Issue - State: open - Opened by haileyschoelkopf 5 days ago

#2356 - Add a test for `scripts/write_out.py` and other `scripts/` utils

Issue - State: open - Opened by haileyschoelkopf 5 days ago

#2355 - --tasks mmlu

Issue - State: closed - Opened by belle9217 5 days ago - 1 comment
Labels: asking questions

#2355 - --tasks mmlu

Issue - State: closed - Opened by belle9217 5 days ago - 1 comment
Labels: asking questions

#2354 - Evaluation of MMLU tasks using a fined tuned Gemma 2 model

Issue - State: open - Opened by chamath-eka 5 days ago

#2354 - Evaluation of MMLU tasks using a fined tuned Gemma 2 model

Issue - State: open - Opened by chamath-eka 5 days ago

#2353 - HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS`

Pull Request - State: open - Opened by baberabb 6 days ago

#2353 - HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS`

Pull Request - State: open - Opened by baberabb 6 days ago

#2352 - Setting limit_mm_per_prompt for vllm_vlm fails argument parser

Issue - State: open - Opened by mgoin 6 days ago
Labels: bug

#2352 - Setting limit_mm_per_prompt for vllm_vlm fails argument parser

Issue - State: open - Opened by mgoin 6 days ago
Labels: bug

#2351 - squad v2: load metric with `evaluate`

Pull Request - State: closed - Opened by baberabb 6 days ago

#2351 - squad v2: load metric with `evaluate`

Pull Request - State: closed - Opened by baberabb 6 days ago

#2350 - fix writeout script

Pull Request - State: closed - Opened by baberabb 6 days ago

#2350 - fix writeout script

Pull Request - State: closed - Opened by baberabb 6 days ago

#2349 - Support pipeline parallel with OpenVINO models

Pull Request - State: open - Opened by sstrehlk 6 days ago - 1 comment

#2348 - squadv2 task occurred "AttributeError: module 'datasets' has no attribute 'load_metric'"

Issue - State: closed - Opened by chengpong1127 6 days ago
Labels: bug

#2348 - squadv2 task occurred "AttributeError: module 'datasets' has no attribute 'load_metric'"

Issue - State: closed - Opened by chengpong1127 6 days ago
Labels: bug

#2347 - The base model and chat model have no difference when using generate_until, loglikelihood, loglikelihood_rolling,right?

Issue - State: open - Opened by belle9217 6 days ago - 1 comment
Labels: asking questions

#2347 - The base model and chat model have no difference when using generate_until, loglikelihood, loglikelihood_rolling,right?

Issue - State: open - Opened by belle9217 6 days ago - 1 comment
Labels: asking questions

#2346 - Unexpected space character

Issue - State: open - Opened by eldarkurtic 6 days ago - 2 comments

#2345 - tasks RACE only high not "middle"

Issue - State: open - Opened by Choi-jun9803 6 days ago

#2344 - Reproduce QWen 2.5-14B-Instruct and LLaMa-3.1-8B-Instruct Results

Issue - State: open - Opened by ruleGreen 6 days ago - 1 comment

#2343 - gpt2 evaluation

Issue - State: open - Opened by sorobedio 6 days ago

#2342 - AttributeError: 'dict' object has no attribute 'has_test_docs'

Issue - State: closed - Opened by Sshubam 7 days ago

#2341 - Merge New Tasks

Pull Request - State: closed - Opened by ToluClassics 7 days ago

#2340 - Added metric aggregation for leaderboard tasks.

Pull Request - State: closed - Opened by Am1n3e 7 days ago - 4 comments

#2339 - Fixed dummy model

Pull Request - State: closed - Opened by Am1n3e 7 days ago - 1 comment

#2338 - Locally reproducible HF-Leaderboard evals

Issue - State: open - Opened by eldarkurtic 7 days ago - 2 comments
Labels: asking questions

#2337 - Robustness Task

Pull Request - State: closed - Opened by rimashahbazyan 7 days ago - 1 comment

#2336 - Add a note for missing dependencies

Pull Request - State: closed - Opened by eldarkurtic 7 days ago - 1 comment

#2335 - Dynamical prompt with extremely promising results #RIPrompt

Issue - State: open - Opened by anthonyrisinger 7 days ago - 1 comment

#2334 - mmlu-pro: add newlines to task descriptions (not leaderboard)

Pull Request - State: closed - Opened by baberabb 8 days ago

#2333 - add newlines to mmlu_pro task descriptions (not leaderboard)

Pull Request - State: closed - Opened by baberabb 8 days ago

#2332 - change glianorex to test split

Pull Request - State: closed - Opened by baberabb 8 days ago - 1 comment

#2331 - Confusion over the model outputs

Issue - State: open - Opened by tranlm 8 days ago

#2330 - Failed to add a new metric

Issue - State: open - Opened by Ofir408 8 days ago

#2329 - `glianorex_en` task does not work

Issue - State: closed - Opened by casper-hansen 8 days ago - 1 comment
Labels: bug

#2328 - Hashing error when setting random seed for vllm model

Issue - State: open - Opened by yizhongw 9 days ago - 1 comment
Labels: asking questions

#2327 - openai: better error messages; fix greedy matching

Pull Request - State: closed - Opened by baberabb 11 days ago - 1 comment

#2326 - Support for Using Multiple Choice Datasets with GPT-4o Model via OpenAI API

Issue - State: closed - Opened by Laplace888 11 days ago - 3 comments
Labels: asking questions

#2325 - Fix float limit override

Pull Request - State: open - Opened by cjluo-omniml 11 days ago - 3 comments

#2324 - Bug in the float limit handling

Issue - State: open - Opened by cjluo-omniml 11 days ago - 6 comments
Labels: feature request

#2323 - Error for AGIEval when using fewshot

Issue - State: open - Opened by BaohaoLiao 12 days ago - 1 comment
Labels: bug, validation

#2322 - Which version to use

Issue - State: open - Opened by sorobedio 12 days ago - 9 comments
Labels: validation

#2321 - Mathvista

Pull Request - State: open - Opened by baberabb 12 days ago

#2320 - change group to tags in task `eus_exams` task configs

Pull Request - State: closed - Opened by baberabb 12 days ago

#2319 - how to get lm_eval version 4.2

Issue - State: closed - Opened by sorobedio 13 days ago - 1 comment

#2318 - Evaluation of MMLU tasks using the OpenAI API

Issue - State: closed - Opened by Laplace888 13 days ago - 3 comments
Labels: asking questions

#2317 - Multiple generations (sequential) per question

Issue - State: open - Opened by IntrepidEnki 13 days ago - 1 comment
Labels: feature request, asking questions

#2316 - GSM8K Problem On Colab With Finetuned Phi3.5 mini model

Issue - State: closed - Opened by SongTonyLi 14 days ago - 3 comments
Labels: asking questions

#2315 - remove comma

Pull Request - State: closed - Opened by baberabb 14 days ago

#2314 - Update neuron backend

Pull Request - State: closed - Opened by dacorvo 14 days ago - 4 comments

#2313 - Comma breaks repr for write-out

Issue - State: closed - Opened by giuliolovisotto 14 days ago - 1 comment
Labels: bug

#2312 - mmlu translated professionally by OpenAI

Pull Request - State: open - Opened by giuliolovisotto 14 days ago - 1 comment

#2311 - add batch_size to `get_sample_size`

Pull Request - State: closed - Opened by baberabb 14 days ago

#2310 - AttributeError: 'GPT2TokenizerFast' object has no attribute 'default_chat_template'. Did you mean: 'get_chat_template'?

Issue - State: closed - Opened by IsraelAbebe 14 days ago - 3 comments

#2309 - Scrolls branch

Pull Request - State: open - Opened by blitzionic 14 days ago - 2 comments

#2308 - Chat templates

Issue - State: closed - Opened by IsraelAbebe 15 days ago

#2307 - avoid timeout errors with high concurrency in api_model

Pull Request - State: open - Opened by dtrawins 15 days ago - 3 comments

#2306 - Running multiple processes on a shared outlines cache database

Issue - State: open - Opened by e-tornike 15 days ago - 2 comments

#2305 - New Task: `openai_mmmlu` professionaly translated by OpenAI as part of o1 release

Issue - State: open - Opened by giuliolovisotto 15 days ago - 1 comment
Labels: feature request

#2304 - Fix missing key in custom task loading.

Pull Request - State: open - Opened by giuliolovisotto 15 days ago

#2303 - Missing key in dictionary when loading tasks.

Issue - State: open - Opened by giuliolovisotto 15 days ago
Labels: bug

#2302 - Configuring Azure OPENAI

Issue - State: open - Opened by sudhanshu-myl 15 days ago - 3 comments
Labels: asking questions

#2301 - Fail to reproduce the perplexity of Llama-2 7B on wikitext

Issue - State: open - Opened by Yonghao-Tan 16 days ago - 10 comments

#2300 - add new truncation strategy

Pull Request - State: open - Opened by artemorloff 16 days ago - 3 comments

#2299 - fix some bugs of mmlu

Pull Request - State: closed - Opened by eyuansu62 17 days ago - 3 comments

#2298 - fix some bugs of mmlu (flan_cot_fewshot and flan_n_shot)

Pull Request - State: closed - Opened by eyuansu62 17 days ago

#2297 - Update README.md

Pull Request - State: closed - Opened by SYusupov 17 days ago - 3 comments

#2296 - Low GPU Utilization During Multi-GPU evaluation - Efficiency Optimization

Issue - State: open - Opened by yang3121099 17 days ago - 1 comment
Labels: asking questions

#2295 - the log is end,the gpu is not calculate,but is storing,the result is not getting,is it normal?

Issue - State: closed - Opened by belle9217 18 days ago - 1 comment
Labels: asking questions

#2294 - Worse evaluation performance with PEFT adaptors

Issue - State: open - Opened by YananLi18 18 days ago - 1 comment

#2293 - RuntimeError: CUDA error: device-side assert triggered

Issue - State: open - Opened by milliemaoo 19 days ago - 2 comments

#2292 - Using multi-GPU with accelerate is not working

Issue - State: closed - Opened by commmet-ahn 20 days ago - 2 comments

#2291 - Infer time by use library's external api is much longer than script

Issue - State: open - Opened by lonleyodd 20 days ago
Labels: bug

#2290 - Couldn't parse .yaml file for configuration

Issue - State: open - Opened by ArchitJain1201 20 days ago

#2289 - A little typing issue

Issue - State: open - Opened by yuti01 20 days ago

#2288 - Treat tags in python tasks the same as yaml tasks

Pull Request - State: closed - Opened by giuliolovisotto 21 days ago - 1 comment

#2287 - Issue with openai completions API - related to logprobs

Issue - State: closed - Opened by dmakhervaks 21 days ago - 3 comments
Labels: bug

#2286 - What's going on with swde or squadv2 tasks ?

Issue - State: closed - Opened by ahatamiz 22 days ago - 2 comments

#2285 - Can we connect to Vertex AI model

Issue - State: open - Opened by patilpriyadarshini 22 days ago

#2284 - External API - same results different models

Issue - State: closed - Opened by deema-A 24 days ago - 2 comments

#2283 - Added TurkishMMLU to LM Evaluation Harness

Pull Request - State: closed - Opened by ArdaYueksel 25 days ago - 2 comments

#2282 - add mmlu readme

Pull Request - State: closed - Opened by baberabb 25 days ago

#2281 - Multi-node MMLU support ?

Issue - State: closed - Opened by ahatamiz 25 days ago - 2 comments
Labels: asking questions

#2280 - Bump version to v0.4.4 ; Fixes to TMMLUplus

Pull Request - State: closed - Opened by haileyschoelkopf 26 days ago

#2279 - zero accuracy on `mmlu_generative`

Issue - State: open - Opened by Luodian 26 days ago - 5 comments
Labels: bug

#2278 - May be parse LAST numbers in GSM8K "flexible-extract" filter?

Issue - State: closed - Opened by Pupy101 27 days ago - 2 comments
Labels: asking questions

#2277 - Free space allocated for LM in the memory after evaluation finishes

Pull Request - State: closed - Opened by ahmedamrelhefnawy 28 days ago - 3 comments

#2276 - Do the version of CMMLU and MMLU make any differences?

Issue - State: closed - Opened by yaolu-zjut 28 days ago

#2275 - Teleia group task

Pull Request - State: closed - Opened by gonz-mart 28 days ago - 1 comment

GitHub / EleutherAI/lm-evaluation-harness issues and pull requests