EleutherAI/lm-evaluation-harness issues and pull requests

#1076 - Adding HaluEval to the list of tasks

Pull Request - State: closed - Opened by pminervini 10 months ago - 4 comments

#1075 - BBH, gsm8k benchmark accuracy mismatch with paper

Issue - State: closed - Opened by hills-code 10 months ago - 9 comments

#1074 - Update _cot_fewshot_template_yaml

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1073 - .

Issue - State: closed - Opened by DrewGalbraith 10 months ago

#1072 - Is there a current way to run lm-eval against a self-hosted inference server?

Issue - State: closed - Opened by sfriedowitz 10 months ago - 3 comments
Labels: help wanted, feature request

#1071 - FileNotFoundError: Couldn't find a module script at exact_match.py. Module 'exact_match' doesn't exist on the Hugging Face Hub either.

Issue - State: closed - Opened by xinghuang2050 10 months ago - 18 comments
Labels: bug

#1070 - Evaluation on Scrolls Tasks Error

Issue - State: closed - Opened by AdityaKulshrestha 10 months ago - 2 comments
Labels: bug

#1069 - Updates to `hf` model type modeling code

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago - 2 comments

#1068 - Support for model instance in `HFLM.pretrained` argument

Issue - State: closed - Opened by gugarosa 10 months ago - 4 comments
Labels: bug

#1067 - Eval Harness Refactor Help

Issue - State: closed - Opened by StellaAthena 10 months ago - 4 comments

#1066 - Updating docs hyperlinks

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1065 - Confirming links in docs work (WIP)

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1064 - Set actual version to v0.4.0

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1063 - Fiddling with READMEs, Reenable CI tests on `main`

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1062 - remove commented planned samplers in `lm_eval/api/samplers.py`

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1061 - Announce v0.4.0 in README

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1060 - [Refactor] Fix fewshot cot mmlu descriptions

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1059 - Indexing Bugfix in huggingface.py

Pull Request - State: closed - Opened by roy-sc 10 months ago - 4 comments

#1058 - AttributeError: can't set attribute 'pad_token'

Issue - State: closed - Opened by APiaoG 10 months ago - 1 comment

#1057 - Does lm-evaluation-harness support AWQ quantized model testing?

Issue - State: closed - Opened by Enjia 10 months ago - 3 comments

#1056 - [New Feature] Addressing Data Contamination in Evaluation Benchmarks

Issue - State: closed - Opened by liyucheng09 10 months ago - 2 comments

#1055 - `mmlu_flan_cot_fewshot` is not properly formatted?

Issue - State: closed - Opened by pengzhenghao 10 months ago - 2 comments

#1054 - How to implement zero-shot cot (calling model twice?)

Issue - State: closed - Opened by pengzhenghao 10 months ago - 3 comments

#1053 - assert len(continuation_enc) error in _loglikelihood_tokens for certain (but not all) tasks?

Issue - State: open - Opened by lhl 10 months ago - 8 comments

#1052 - Added no-softmax entries to MODEL_REGISTRY

Pull Request - State: open - Opened by denizyuret 10 months ago

#1051 - [Refactor] Update docs ToC

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1050 - A new DROP benchmark is needed

Issue - State: open - Opened by StellaAthena 10 months ago - 18 comments
Labels: opinions wanted

#1049 - Update README.md

Pull Request - State: closed - Opened by StellaAthena 10 months ago

#1048 - [Refactor] Additions to example notebook

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1047 - Miscellaneous documentation updates

Pull Request - State: closed - Opened by StellaAthena 10 months ago - 1 comment

#1046 - [Refactor] Update README.md

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1045 - Avoid creating model_cache for OVModelForCausalLM

Pull Request - State: closed - Opened by andreyanufr 10 months ago - 3 comments

#1044 - How to specify evaluation times using different seeds?

Issue - State: closed - Opened by MarshtompCS 10 months ago - 9 comments

#1043 - using "A:" replace "A: "

Issue - State: closed - Opened by milliemaoo 10 months ago - 1 comment

#1042 - Warning on gsm8k

Issue - State: closed - Opened by liranringel 10 months ago - 6 comments

#1041 - Adding nq_open to task_table.md

Pull Request - State: closed - Opened by pminervini 10 months ago

#1040 - style(README): alert markdown GooseAI link

Pull Request - State: closed - Opened by guspan-tanadi 10 months ago - 1 comment

#1039 - Added the --no_softmax option

Pull Request - State: closed - Opened by denizyuret 10 months ago - 7 comments

#1038 - fixes for sampler

Pull Request - State: closed - Opened by baberabb 10 months ago

#1037 - [refactor] mps requirement

Pull Request - State: closed - Opened by baberabb 10 months ago - 1 comment

#1036 - [Refactor] Fixes to sampler

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1035 - [Refactor] vllm data parallel

Pull Request - State: closed - Opened by baberabb 10 months ago - 7 comments

#1034 - scrolls pyrouge import error

Issue - State: open - Opened by sshleifer 10 months ago - 3 comments

#1033 - [Refactor] Urgent fix

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1032 - Rename bigbench.yml to default.yml

Pull Request - State: closed - Opened by StellaAthena 10 months ago

#1031 - [Refactor] Versioning

Pull Request - State: closed - Opened by lintangsutawika 10 months ago - 2 comments

#1030 - Social iqa

Pull Request - State: closed - Opened by StellaAthena 10 months ago

#1029 - [Refactor] BBH fixup

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago - 7 comments

#1028 - [big-refactor] Adding Flash Attention 2 to HF Model

Issue - State: closed - Opened by orendar 10 months ago - 2 comments
Labels: feature request

#1027 - [New Task] SIQA

Issue - State: closed - Opened by haileyschoelkopf 10 months ago
Labels: help wanted, feature request, good first issue

#1026 - [New Task] CommonsenseQA

Issue - State: closed - Opened by haileyschoelkopf 10 months ago - 4 comments
Labels: help wanted, feature request, good first issue

#1025 - [Refactor] add notebook for overview

Pull Request - State: closed - Opened by lintangsutawika 10 months ago - 3 comments

#1024 - [Refactor] Use correct HF model type for MBart-like models

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago - 1 comment

#1023 - Some questions on the DROP evaluations

Issue - State: closed - Opened by lpc-eol 10 months ago - 3 comments

#1022 - [big-refactor] Wrong AutoModel Assignment for MBart

Issue - State: closed - Opened by mcemilg 10 months ago - 6 comments
Labels: bug

#1021 - Error occur when evaluating local model: transformer sentencepiece piece id out of range

Issue - State: closed - Opened by AnqiZhou226 10 months ago - 3 comments

#1020 - [Refactor] Update README

Pull Request - State: closed - Opened by baberabb 10 months ago

#1019 - Is there a way to check if each sample in the dataset is correct or incorrect?

Issue - State: closed - Opened by sean0042 10 months ago - 2 comments

#1018 - [Refactor] Remove `examples/` folder

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1017 - The tokenizer add_special_tokens parameter for t5 model lambada task

Issue - State: open - Opened by daisyden 10 months ago - 11 comments

#1016 - can we pass specific configs for specific tasks while running multiple benchmarks?

Issue - State: closed - Opened by sayan1101 10 months ago - 2 comments
Labels: feature request

#1015 - Issue "No module named 'lm_eval'"

Issue - State: closed - Opened by DreF174 10 months ago - 4 comments
Labels: bug

#1014 - Add DeepSparseLM

Pull Request - State: closed - Opened by mgoin 10 months ago - 2 comments

#1013 - [New Task] COLLIE

Issue - State: open - Opened by haileyschoelkopf 10 months ago
Labels: help wanted, feature request, good first issue

#1012 - [New Task Request] IFEval / Instruction-Following Eval

Issue - State: closed - Opened by haileyschoelkopf 10 months ago
Labels: help wanted, feature request, good first issue

#1011 - [Refactor] vllm support

Pull Request - State: closed - Opened by baberabb 10 months ago - 7 comments

#1010 - [New Task] Implement GPQA dataset

Issue - State: closed - Opened by haileyschoelkopf 10 months ago - 1 comment
Labels: help wanted, feature request, good first issue

#1009 - [Refactor] Improve Handling of Stop-Sequences for HF Batched Generation

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1008 - [Refactor] Openai completions

Pull Request - State: closed - Opened by lintangsutawika 10 months ago

#1007 - Update main-branch README

Pull Request - State: closed - Opened by haileyschoelkopf 10 months ago

#1006 - Stability Upstream translated task

Issue - State: open - Opened by StellaAthena 11 months ago - 1 comment
Labels: feature request

#1005 - Fix indent in lm_eval/tasks/bigbench.py

Pull Request - State: closed - Opened by Andrei-Aksionov 11 months ago - 1 comment

#1004 - Adds Python 3.8 Compatibility

Pull Request - State: closed - Opened by StellaAthena 11 months ago

#1003 - [big-refactor] Accelerate launch FSDP Runtime Error

Issue - State: closed - Opened by fengzi258 11 months ago - 1 comment
Labels: bug

#1002 - [Refactor] Bugfixes

Pull Request - State: closed - Opened by haileyschoelkopf 11 months ago

#1001 - [Refactor] will check if group_name is None

Pull Request - State: closed - Opened by lintangsutawika 11 months ago

#1000 - How to interpret TruthfulQA_mc write out file

Issue - State: closed - Opened by Luckyluuuc 11 months ago

#999 - [Refactor] Squad misc

Pull Request - State: closed - Opened by lintangsutawika 11 months ago

#998 - [Refactor] group name error with MMLU

Issue - State: closed - Opened by tmabraham 11 months ago - 1 comment

#997 - [Refactor] Fix CI tests

Pull Request - State: closed - Opened by haileyschoelkopf 11 months ago

#996 - [Refactor] Minor cleanup on base `Task` subclasses

Pull Request - State: closed - Opened by haileyschoelkopf 11 months ago

#995 - Model is not a local folder and is not a valid identifier.

Issue - State: closed - Opened by Abhista414 11 months ago - 1 comment

#994 - Added support of OpenVINO inference

Pull Request - State: closed - Opened by AlexKoff88 11 months ago - 6 comments

#993 - How to interpret generated results for truthful_qa test

Issue - State: open - Opened by Joetib 11 months ago - 3 comments

#992 - TypeError: HFLM.init() got an unexpected keyword argument 'use_accelerate'

Issue - State: closed - Opened by shaunstoltz 11 months ago - 6 comments

#991 - it seems like a bug in winogrande.py

Issue - State: closed - Opened by gaoteng-git 11 months ago - 1 comment

#990 - feat: add option to upload results to Zeno

Pull Request - State: closed - Opened by Sparkier 11 months ago - 6 comments
Labels: feature request

#989 - llama 2 70b gptq use too much cpu memory

Issue - State: closed - Opened by fancyerii 11 months ago - 3 comments
Labels: bug

#988 - [Refactor] BigBench

Issue - State: closed - Opened by orendar 11 months ago - 9 comments
Labels: bug

#987 - [Refactor] Alias fix

Pull Request - State: closed - Opened by lintangsutawika 11 months ago

#986 - WIP: Add MedMcqa Task to lm-evaluation

Pull Request - State: closed - Opened by issamYahiaoui 11 months ago - 1 comment

#985 - [Refactor] Num_fewshot process

Pull Request - State: closed - Opened by lintangsutawika 11 months ago - 2 comments

#984 - evaluate model from local machine

Issue - State: closed - Opened by umarbeknasimov 11 months ago - 5 comments

#983 - How to see intermediate output?

Issue - State: closed - Opened by Ezra-Yu 11 months ago - 1 comment
Labels: documentation

#982 - SquadV2 results are not reproducible for Llama2-7B

Issue - State: closed - Opened by gupta-abhay 11 months ago - 11 comments

#981 - [Refactor] fixes for alternative MMLU tasks.

Pull Request - State: closed - Opened by lintangsutawika 11 months ago

#980 - Average score metric isn't normalized whatsoever

Issue - State: closed - Opened by kalomaze 11 months ago - 1 comment

#979 - add description on task/group alias

Pull Request - State: closed - Opened by lintangsutawika 11 months ago

#978 - Some questions on the DROP and WinoGrande Harness implementations

Issue - State: closed - Opened by clefourrier 11 months ago - 9 comments
Labels: help wanted, good first issue, validation

#977 - Fix unnatural tokenizations if possible

Pull Request - State: closed - Opened by KlaudiaTH 11 months ago - 1 comment

GitHub / EleutherAI/lm-evaluation-harness issues and pull requests