EleutherAI/lm-evaluation-harness issues and pull requests

#2028 - vllm backend faild

Issue - State: closed - Opened by chunniunai220ml 1 day ago - 3 comments

#2027 - --log_samples not saving all inference output

Issue - State: open - Opened by zitgit 1 day ago

#2026 - Test Open LLM Leaderboard 2

Issue - State: open - Opened by matouk98 1 day ago - 2 comments

#2025 - Duplicate `sample` entries

Issue - State: open - Opened by baberabb 1 day ago

#2024 - Fix `trust_remote_code`-related test failures

Pull Request - State: closed - Opened by haileyschoelkopf 2 days ago

#2023 - adds leaderboard tasks

Pull Request - State: closed - Opened by NathanHB 2 days ago

#2022 - [add] multiple-choice-question versions of fld benchmark

Pull Request - State: open - Opened by MorishT 2 days ago - 1 comment

#2021 - YAML config was updated, but the project still remains the same as before

Issue - State: closed - Opened by 2018211801 2 days ago - 3 comments

#2020 - Add Redlite tasks for safety benchmarking

Pull Request - State: open - Opened by inno-simon 3 days ago - 1 comment

#2019 - Add MMLU-ru based on MERA

Pull Request - State: closed - Opened by SpirinEgor 3 days ago - 1 comment

#2018 - Does it support Triton server？

Issue - State: closed - Opened by AndyZZt 3 days ago - 1 comment
Labels: asking questions

#2017 - [Not For Merge] Enable chat-template for vLLM

Pull Request - State: open - Opened by akjindal53244 4 days ago - 1 comment

#2016 - Running on custom model, getting 'TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Issue - State: closed - Opened by Fchaubard 4 days ago

#2015 - Hotfix breaking import

Pull Request - State: closed - Opened by StellaAthena 4 days ago

#2014 - Supporting Multimodality

Issue - State: open - Opened by lintangsutawika 4 days ago

#2013 - Fix regexp parsing for bbh_cot_fewshot

Pull Request - State: open - Opened by arkapal3 4 days ago - 1 comment

#2012 - Compatibility with Models from PyReft Library

Issue - State: open - Opened by crux82 5 days ago - 6 comments

#2011 - Remove `LM` dependency from `build_all_requests`

Pull Request - State: closed - Opened by baberabb 6 days ago

#2010 - Added MedConceptsQA Benchmark

Pull Request - State: open - Opened by Ofir408 6 days ago - 1 comment

#2009 - Remove `LM` dependency from `build_all_requests`

Pull Request - State: closed - Opened by baberabb 6 days ago

#2008 - Refactor API models

Pull Request - State: open - Opened by baberabb 6 days ago

#2007 - Wrong calculation of score when there are ties?

Issue - State: open - Opened by apohllo 7 days ago - 2 comments

#2006 - Error Correction: Eliminate undefined parameter in function call

Pull Request - State: open - Opened by zhabuye 8 days ago - 2 comments

#2005 - mmlu evaluation fail

Issue - State: closed - Opened by jxiw 8 days ago - 2 comments

#2004 - make pytorch an optional dependency

Pull Request - State: open - Opened by dlwh 8 days ago - 2 comments

#2003 - Fixes scrolls task bug with few_shot examples

Pull Request - State: open - Opened by xksteven 8 days ago - 7 comments

#2002 - Implementing lessons from OLMES

Issue - State: open - Opened by lintangsutawika 8 days ago

#2001 - Add HuggingFace Text-Generation-Interface Support

Issue - State: open - Opened by taoari 9 days ago

#2000 - Incorrect Multilingual arc implementation

Issue - State: open - Opened by hynky1999 9 days ago

#1999 - Handle Empty openai response

Pull Request - State: open - Opened by ciaranby 9 days ago

#1998 - Fix Datasets `--trust_remote_code`

Pull Request - State: closed - Opened by haileyschoelkopf 9 days ago - 2 comments

#1997 - Fix partial caching of openai models

Pull Request - State: open - Opened by ciaranby 9 days ago - 1 comment

#1996 - Add Gigachat model

Pull Request - State: open - Opened by seldereyy 9 days ago

#1995 - Log `fewshot_as_multiturn` in results files

Pull Request - State: closed - Opened by haileyschoelkopf 9 days ago

#1994 - Fix naming error; 'include: _paloma_template' -> 'include: paloma.yaml'

Pull Request - State: closed - Opened by LucWeber 9 days ago - 2 comments

#1993 - Fix Paloma Template yaml

Pull Request - State: closed - Opened by haileyschoelkopf 9 days ago - 1 comment

#1992 - Add HumanEval

Pull Request - State: open - Opened by hjlee1371 9 days ago - 1 comment

#1991 - added yaml and util file

Pull Request - State: closed - Opened by satyamshukl 9 days ago - 3 comments

#1990 - Fix self assignment in neuron_optimum.py

Pull Request - State: closed - Opened by LSinev 10 days ago - 2 comments

#1989 - [Fix] Replace generic exception classes with a more specific ones

Pull Request - State: open - Opened by LSinev 10 days ago - 1 comment

#1988 - main

Pull Request - State: open - Opened by msamwelmollel 10 days ago - 4 comments

#1987 - Added ArabicMMLU

Pull Request - State: closed - Opened by Yazeed7 10 days ago - 5 comments

#1986 - Added ArabicMMLU

Pull Request - State: closed - Opened by Yazeed7 10 days ago - 1 comment

#1985 - `piqa` task need add trust_remote_code true in piqa.yml

Issue - State: closed - Opened by changwangss 10 days ago

#1984 - Long time testing Qwen2-72B

Issue - State: open - Opened by djstrong 10 days ago - 1 comment
Labels: bug

#1983 - add trust_remote_code for piqa

Pull Request - State: closed - Opened by changwangss 10 days ago - 1 comment

#1982 - Update interface.md

Pull Request - State: closed - Opened by johnwee1 10 days ago

#1981 - Add Task: CBT

Pull Request - State: closed - Opened by ookkeeeee 11 days ago - 2 comments

#1980 - How to enable trust_remote_code when encountered programmatically via get_task_dict?

Issue - State: closed - Opened by Jack-Khuu 11 days ago - 3 comments

#1979 - add persianmmlu benchmark for assessing Persian Language understanding

Pull Request - State: open - Opened by MrzEsma 11 days ago - 2 comments

#1978 - Add a way to instantiate from HF.AutoModel (again)

Issue - State: closed - Opened by dmitrii-palisaderesearch 11 days ago - 2 comments

#1977 - add persianmmlu benchmark for assessing Persian Language understanding

Pull Request - State: closed - Opened by MrzEsma 11 days ago - 1 comment

#1976 - What is the output_type in the metric for?

Issue - State: open - Opened by dennisrall 11 days ago - 1 comment

#1975 - Fix local completion huggingface tokenizer

Pull Request - State: open - Opened by okdshin 11 days ago - 1 comment

#1974 - added bias and stereotype classification tasks

Pull Request - State: closed - Opened by aditya20t 11 days ago - 1 comment

#1973 - Add GigaChat API

Pull Request - State: closed - Opened by seldereyy 11 days ago - 1 comment

#1972 - incomplete task list

Issue - State: closed - Opened by hlzhang109 12 days ago - 2 comments

#1971 - Ubelievable long time when host the gguf mode ?

Issue - State: open - Opened by hzgdeerHo 12 days ago - 2 comments

#1970 - mela

Pull Request - State: open - Opened by Geralt-Targaryen 12 days ago - 2 comments

#1969 - Fix OpenAI API discrepancies

Pull Request - State: open - Opened by chimezie 14 days ago

#1968 - Updates to fix OpenAI API compliance

Pull Request - State: closed - Opened by chimezie 14 days ago

#1967 - OpenAI completions model not using OpenAI Completion API properly to extract LogProbs

Issue - State: open - Opened by chimezie 14 days ago - 2 comments

#1966 - TemplateLM#_encode_pair() only works for HF transformers auto-models

Issue - State: closed - Opened by Birch-san 14 days ago - 1 comment

#1965 - Error while installing

Issue - State: closed - Opened by surya-narayanan 14 days ago - 1 comment

#1964 - Add BertaQA dataset tasks

Pull Request - State: closed - Opened by juletx 15 days ago - 1 comment

#1963 - How to use a vllm hosted model?

Issue - State: open - Opened by darsh-essential 15 days ago - 1 comment

#1962 - Error when chat template is not a string

Issue - State: open - Opened by djstrong 15 days ago

#1961 - Mmlu Pro

Pull Request - State: open - Opened by ysjprojects 15 days ago - 6 comments

#1960 - Multi-gpu evaluation with external library usage.

Issue - State: closed - Opened by xinghaow99 15 days ago - 1 comment

#1959 - Making torch dep optional?

Issue - State: open - Opened by dlwh 16 days ago - 4 comments

#1958 - Wandb logger can't handle groups with heterogenous metrics

Issue - State: open - Opened by dmitrii-palisaderesearch 16 days ago - 11 comments

#1957 - Cannot load model 'local-chat-completions' and 'local-completions'

Issue - State: closed - Opened by awesom112 16 days ago

#1956 - fix: add directory filter to os.walk to ignore 'ipynb_checkpoints'

Pull Request - State: closed - Opened by johnwee1 16 days ago - 11 comments

#1955 - Fix a tiny typo in `docs/interface.md`

Pull Request - State: closed - Opened by sadra-barikbin 16 days ago

#1954 - Fix task.py and evaluator.py

Pull Request - State: closed - Opened by zhabuye 16 days ago - 1 comment

#1953 - Keep getting error: 'VLLM' object has no attribute 'AUTO_MODEL_CLASS'

Issue - State: closed - Opened by andrew0411 17 days ago - 6 comments

#1952 - .ipynb_checkpoints causes eval harness to fail

Issue - State: closed - Opened by johnwee1 17 days ago

#1951 - Plans for a new release?

Issue - State: open - Opened by nathan-weinberg 17 days ago - 4 comments

#1950 - LMJudge

Pull Request - State: open - Opened by baberabb 17 days ago - 4 comments

#1949 - Check compatibility of `local-completions` with VLLM (returns logits) for `multiple_choice` tasks

Issue - State: open - Opened by haileyschoelkopf 17 days ago
Labels: bug

#1948 - Remove AMMLU Due to Translation

Pull Request - State: closed - Opened by haileyschoelkopf 17 days ago - 2 comments

#1947 - Add MMLU-Pro Dataset

Issue - State: open - Opened by haileyschoelkopf 17 days ago
Labels: help wanted, feature request, good first issue

#1947 - Add MMLU-Pro Dataset

Issue - State: open - Opened by haileyschoelkopf 17 days ago
Labels: help wanted, feature request, good first issue

#1946 - Alghafa benchmark

Pull Request - State: open - Opened by khalil-Hennara 17 days ago - 7 comments

#1946 - Alghafa benchmark

Pull Request - State: open - Opened by khalil-Hennara 17 days ago - 5 comments

#1945 - The output of ceval is not as the same format at the official version?

Issue - State: open - Opened by ChuanhongLi 17 days ago - 1 comment

#1944 - Results is weird for Qwen2-1.5B

Issue - State: closed - Opened by SefaZeng 17 days ago - 6 comments

#1943 - Allow running hugging face models with both data parallelism and model parallelism at once

Pull Request - State: closed - Opened by clefourrier 18 days ago

#1942 - Fixed the [issue #1757](https://github.com/EleutherAI/lm-evaluation-harness/issues/1757) by editing the `yaml` files.

Pull Request - State: closed - Opened by sci-m-wang 18 days ago - 2 comments

#1941 - Save `fewshot_as_multiturn` argument in `results.json`

Issue - State: closed - Opened by djstrong 18 days ago - 1 comment

#1940 - Add the Arabic version with refactor to Arabic pica to be in alghafa …

Pull Request - State: closed - Opened by khalil-Hennara 18 days ago

#1939 - Fix a tiny typo in `main.py`

Pull Request - State: closed - Opened by sadra-barikbin 19 days ago - 1 comment

#1938 - Regarding decontamination

Issue - State: open - Opened by dsdanielpark 19 days ago

#1937 - Format of Personal Defined Dataset for Evaluation

Issue - State: closed - Opened by OscarC9912 20 days ago - 1 comment

#1936 - High Number of Tokens for openai-completions Models

Issue - State: open - Opened by selinaxiao 21 days ago

#1935 - Parallel GPU evaluation using simple_evaluate /evaluate functions? #1934

Issue - State: closed - Opened by PalaashAgrawal 21 days ago - 1 comment

#1934 - Parallel GPU evaluation using simple_evaluate /evaluate functions?

Issue - State: closed - Opened by Naitik1502 21 days ago

#1933 - Easier unitxt tasks loading and removal of unitxt library dependancy

Pull Request - State: open - Opened by elronbandel 22 days ago - 5 comments

#1932 - --trust_remote_code does it actually do anything?

Issue - State: closed - Opened by devzzzero 22 days ago - 8 comments
Labels: bug

#1931 - [add] fld logical formula task

Pull Request - State: closed - Opened by MorishT 22 days ago - 1 comment

GitHub / EleutherAI/lm-evaluation-harness issues and pull requests