EleutherAI/lm-evaluation-harness issues and pull requests

#2074 - Fix TypeError in samplers.py by converting int to str

Pull Request - State: closed - Opened by uni2237 3 months ago - 2 comments

#2073 - merge

Pull Request - State: closed - Opened by notrichardren 3 months ago - 2 comments

#2072 - Implementing Anthropic's discrimination evaluation

Issue - State: open - Opened by notrichardren 3 months ago - 3 comments

#2071 - Adding a metric and an aggregation requires knowledge of input

Issue - State: closed - Opened by notrichardren 3 months ago - 1 comment

#2070 - max_new_tokens and max_length conflict

Issue - State: open - Opened by meg-huggingface 3 months ago

#2069 - Evaluate Gemma with Chat Template

Issue - State: open - Opened by pyf98 3 months ago - 3 comments

#2068 - TinyBenchmark/TinyMMLU broken?

Issue - State: closed - Opened by skramer-dev 3 months ago - 5 comments

#2067 - Update package

Pull Request - State: closed - Opened by celiolarcher 3 months ago - 1 comment

#2066 - LLM leader board setting for mmlu.

Issue - State: closed - Opened by dsj96 3 months ago - 1 comment

#2065 - package version conflict while launching leaderboard2 eval

Issue - State: closed - Opened by dhiaEddineRhaiem 3 months ago - 2 comments

#2064 - Error Running New Open LLM Leaderboard Tasks

Issue - State: closed - Opened by annekethvij 3 months ago - 3 comments

#2063 - Running evaluation on Gemma-2 27B model

Issue - State: open - Opened by zeynepgulhanuslu 3 months ago

#2062 - Inconsistent format of `doc_to_text` in the task.yaml files?

Issue - State: closed - Opened by andrew0411 3 months ago - 2 comments

#2061 - Can I see all raw inputs to models and raw outputs from models?

Issue - State: closed - Opened by zsaladin 3 months ago - 1 comment

#2060 - Edits to GroupConfig PR

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2059 - Limiting scipy integration

Issue - State: closed - Opened by nathan-weinberg 3 months ago - 7 comments

#2058 - Chat template fix

Pull Request - State: open - Opened by NathanHB 3 months ago - 3 comments

#2057 - Fix chat templating

Pull Request - State: closed - Opened by NathanHB 3 months ago

#2056 - Dp and mp support

Pull Request - State: closed - Opened by NathanHB 3 months ago - 8 comments

#2055 - [Draft] Exploring Multimodality - HF Modeling

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2054 - x

Pull Request - State: closed - Opened by payzunb 3 months ago - 1 comment

#2053 - multimodal support prototype

Pull Request - State: closed - Opened by lintangsutawika 3 months ago

#2052 - [ vLLM ] Fix `add_bos_token` Propogation

Pull Request - State: closed - Opened by robertgshaw2-neuralmagic 3 months ago - 4 comments

#2051 - Allow gating EvaluationTracker HF Hub results; customizability

Pull Request - State: closed - Opened by NathanHB 3 months ago

#2050 - Auto batch size fix

Pull Request - State: open - Opened by NathanHB 3 months ago

#2049 - Gemma-2 also needs default `add_bos_token=True`

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2048 - Fix strip whitespace filter

Pull Request - State: closed - Opened by NathanHB 3 months ago - 1 comment

#2047 - Adds Open LLM Leaderboard Taks

Pull Request - State: closed - Opened by NathanHB 3 months ago - 8 comments

#2046 - Update package version to v0.4.3

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2045 - Bundle `exact_match` HF Evaluate metric with install, don't call evaluate.load() on import

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2044 - Multimodal support

Pull Request - State: closed - Opened by lintangsutawika 3 months ago

#2043 - `lm_eval --tasks list` return nothing?

Issue - State: closed - Opened by fahadh4ilyas 3 months ago - 3 comments

#2042 - Irokobench: Benchmark Dataset for African languages

Pull Request - State: closed - Opened by JessicaOjo 3 months ago - 3 comments

#2041 - fix wandb logger module import in example

Pull Request - State: closed - Opened by ToluClassics 3 months ago - 2 comments

#2040 - Per-sample perplexity of a continuation?

Issue - State: closed - Opened by YilunZhou 3 months ago - 2 comments

#2039 - [Draft] Exploring multimodality

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2038 - Fail gracefully upon tokenizer logging failure (#2035)

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago - 2 comments

#2037 - fix cache

Pull Request - State: closed - Opened by baberabb 3 months ago

#2036 - Fix strip whitespace filter

Pull Request - State: closed - Opened by NathanHB 3 months ago

#2035 - The problem of generate responses with my own trained model

Issue - State: closed - Opened by marvelcell 3 months ago

#2034 - Add chat template to `vllm`

Pull Request - State: closed - Opened by baberabb 3 months ago - 4 comments

#2033 - Using chat template with vllm engine

Issue - State: closed - Opened by mohit-rag 3 months ago

#2032 - Add new dataset MMLU-SR tasks

Pull Request - State: closed - Opened by SkySuperCat 3 months ago - 6 comments

#2031 - swahili_ARC_Challenge

Pull Request - State: open - Opened by msamwelmollel 3 months ago - 3 comments

#2030 - Use `shell=False` in `subprocess` Function Calls

Pull Request - State: open - Opened by pixeeai 3 months ago - 1 comment

#2029 - Update `trust_remote_code` for Hellaswag

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#2028 - vllm backend faild

Issue - State: closed - Opened by chunniunai220ml 3 months ago - 9 comments

#2027 - --log_samples not saving all inference output

Issue - State: closed - Opened by zitgit 3 months ago - 1 comment

#2026 - Test Open LLM Leaderboard 2

Issue - State: closed - Opened by matouk98 3 months ago - 10 comments
Labels: asking questions

#2025 - Duplicate `sample` entries

Issue - State: open - Opened by baberabb 3 months ago

#2024 - Fix `trust_remote_code`-related test failures

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago - 1 comment

#2023 - adds leaderboard tasks

Pull Request - State: closed - Opened by NathanHB 3 months ago

#2022 - [add] multiple-choice-question versions of fld benchmark

Pull Request - State: open - Opened by MorishT 3 months ago - 1 comment

#2021 - YAML config was updated, but the project still remains the same as before

Issue - State: closed - Opened by 2018211801 3 months ago - 3 comments

#2020 - Add Redlite tasks for safety benchmarking

Pull Request - State: open - Opened by inno-simon 3 months ago - 2 comments

#2019 - Add MMLU-ru based on MERA

Pull Request - State: closed - Opened by SpirinEgor 3 months ago - 1 comment

#2018 - Does it support Triton server？

Issue - State: closed - Opened by AndyZZt 3 months ago - 1 comment
Labels: asking questions

#2017 - [Not For Merge] Enable chat-template for vLLM

Pull Request - State: closed - Opened by akjindal53244 3 months ago - 2 comments

#2016 - Running on custom model, getting 'TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Issue - State: closed - Opened by Fchaubard 3 months ago - 1 comment

#2015 - Hotfix breaking import

Pull Request - State: closed - Opened by StellaAthena 3 months ago

#2014 - Supporting Multimodality

Issue - State: open - Opened by lintangsutawika 3 months ago - 4 comments

#2013 - Fix regexp parsing for bbh_cot_fewshot

Pull Request - State: open - Opened by arkapal3 3 months ago - 1 comment

#2012 - Compatibility with Models from PyReft Library

Issue - State: open - Opened by crux82 3 months ago - 6 comments

#2011 - Remove `LM` dependency from `build_all_requests`

Pull Request - State: closed - Opened by baberabb 3 months ago

#2010 - Added MedConceptsQA Benchmark

Pull Request - State: closed - Opened by Ofir408 3 months ago - 1 comment

#2009 - Remove `LM` dependency from `build_all_requests`

Pull Request - State: closed - Opened by baberabb 3 months ago

#2008 - Refactor API models

Pull Request - State: closed - Opened by baberabb 3 months ago - 1 comment

#2007 - Wrong calculation of score when there are ties?

Issue - State: open - Opened by apohllo 3 months ago - 2 comments

#2006 - Error Correction: Eliminate undefined parameter in function call

Pull Request - State: closed - Opened by zhabuye 3 months ago - 2 comments

#2005 - mmlu evaluation fail

Issue - State: closed - Opened by jxiw 3 months ago - 2 comments

#2004 - make pytorch an optional dependency

Pull Request - State: open - Opened by dlwh 3 months ago - 2 comments

#2003 - Fixes scrolls task bug with few_shot examples

Pull Request - State: closed - Opened by xksteven 3 months ago - 7 comments

#2002 - Implementing lessons from OLMES

Issue - State: open - Opened by lintangsutawika 3 months ago

#2001 - Add HuggingFace Text-Generation-Interface Support

Issue - State: open - Opened by taoari 3 months ago

#2000 - Incorrect Multilingual arc implementation

Issue - State: open - Opened by hynky1999 3 months ago

#1999 - Handle Empty openai response

Pull Request - State: open - Opened by ciaranby 3 months ago

#1998 - Fix Datasets `--trust_remote_code`

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago - 3 comments

#1997 - Fix partial caching of openai models

Pull Request - State: open - Opened by ciaranby 3 months ago - 6 comments

#1996 - Add Gigachat model

Pull Request - State: open - Opened by seldereyy 3 months ago

#1995 - Log `fewshot_as_multiturn` in results files

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago

#1994 - Fix naming error; 'include: _paloma_template' -> 'include: paloma.yaml'

Pull Request - State: closed - Opened by LucWeber 3 months ago - 2 comments

#1993 - Fix Paloma Template yaml

Pull Request - State: closed - Opened by haileyschoelkopf 3 months ago - 1 comment

#1992 - Add HumanEval

Pull Request - State: open - Opened by hjlee1371 3 months ago - 3 comments

#1991 - added yaml and util file

Pull Request - State: closed - Opened by satyamshukl 3 months ago - 3 comments

#1990 - Fix self assignment in neuron_optimum.py

Pull Request - State: closed - Opened by LSinev 3 months ago - 2 comments

#1989 - [Fix] Replace generic exception classes with a more specific ones

Pull Request - State: open - Opened by LSinev 3 months ago - 2 comments

#1988 - main

Pull Request - State: closed - Opened by msamwelmollel 3 months ago - 5 comments

#1987 - Added ArabicMMLU

Pull Request - State: closed - Opened by Yazeed7 3 months ago - 5 comments

#1986 - Added ArabicMMLU

Pull Request - State: closed - Opened by Yazeed7 3 months ago - 1 comment

#1985 - `piqa` task need add trust_remote_code true in piqa.yml

Issue - State: closed - Opened by changwangss 3 months ago

#1984 - Long time testing Qwen2-72B

Issue - State: open - Opened by djstrong 3 months ago - 2 comments
Labels: bug

#1983 - add trust_remote_code for piqa

Pull Request - State: closed - Opened by changwangss 3 months ago - 1 comment

#1982 - Update interface.md

Pull Request - State: closed - Opened by johnwee1 3 months ago

#1981 - Add Task: CBT

Pull Request - State: closed - Opened by ookkeeeee 4 months ago - 2 comments

#1980 - How to enable trust_remote_code when encountered programmatically via get_task_dict?

Issue - State: closed - Opened by Jack-Khuu 4 months ago - 3 comments

#1979 - add persianmmlu benchmark for assessing Persian Language understanding

Pull Request - State: open - Opened by MrzEsma 4 months ago - 2 comments

#1978 - Add a way to instantiate from HF.AutoModel (again)

Issue - State: closed - Opened by dmitrii-palisaderesearch 4 months ago - 4 comments

GitHub / EleutherAI/lm-evaluation-harness issues and pull requests