EleutherAI/lm-evaluation-harness issues and pull requests

#1974 - added bias and stereotype classification tasks

Pull Request - State: closed - Opened by aditya20t 4 months ago - 1 comment

#1973 - Add GigaChat API

Pull Request - State: closed - Opened by seldereyy 4 months ago - 3 comments

#1972 - incomplete task list

Issue - State: closed - Opened by hlzhang109 4 months ago - 2 comments

#1971 - Ubelievable long time when host the gguf mode ?

Issue - State: open - Opened by hzgdeerHo 4 months ago - 2 comments

#1970 - mela

Pull Request - State: closed - Opened by Geralt-Targaryen 4 months ago - 5 comments

#1969 - Fix OpenAI API discrepancies

Pull Request - State: closed - Opened by chimezie 4 months ago

#1968 - Updates to fix OpenAI API compliance

Pull Request - State: closed - Opened by chimezie 4 months ago

#1967 - OpenAI completions model not using OpenAI Completion API properly to extract LogProbs

Issue - State: open - Opened by chimezie 4 months ago - 2 comments

#1966 - TemplateLM#_encode_pair() only works for HF transformers auto-models

Issue - State: closed - Opened by Birch-san 4 months ago - 1 comment

#1965 - Error while installing

Issue - State: closed - Opened by surya-narayanan 4 months ago - 1 comment

#1964 - Add BertaQA dataset tasks

Pull Request - State: closed - Opened by juletx 4 months ago - 1 comment

#1963 - How to use a vllm hosted model?

Issue - State: open - Opened by darsh-essential 4 months ago - 1 comment

#1962 - Error when chat template is not a string

Issue - State: open - Opened by djstrong 4 months ago - 1 comment

#1961 - Mmlu Pro

Pull Request - State: closed - Opened by ysjprojects 4 months ago - 14 comments

#1960 - Multi-gpu evaluation with external library usage.

Issue - State: closed - Opened by xinghaow99 4 months ago - 1 comment

#1959 - Making torch dep optional?

Issue - State: open - Opened by dlwh 4 months ago - 4 comments

#1958 - Wandb logger can't handle groups with heterogenous metrics

Issue - State: open - Opened by dmitrii-palisaderesearch 4 months ago - 11 comments

#1957 - Cannot load model 'local-chat-completions' and 'local-completions'

Issue - State: closed - Opened by awesom112 4 months ago

#1956 - fix: add directory filter to os.walk to ignore 'ipynb_checkpoints'

Pull Request - State: closed - Opened by johnwee1 4 months ago - 11 comments

#1955 - Fix a tiny typo in `docs/interface.md`

Pull Request - State: closed - Opened by sadra-barikbin 4 months ago

#1954 - Fix task.py and evaluator.py

Pull Request - State: closed - Opened by zhabuye 4 months ago - 1 comment

#1953 - Keep getting error: 'VLLM' object has no attribute 'AUTO_MODEL_CLASS'

Issue - State: closed - Opened by andrew0411 4 months ago - 8 comments

#1952 - .ipynb_checkpoints causes eval harness to fail

Issue - State: closed - Opened by johnwee1 4 months ago

#1951 - Plans for a new release?

Issue - State: closed - Opened by nathan-weinberg 4 months ago - 5 comments

#1950 - LMJudge

Pull Request - State: closed - Opened by baberabb 4 months ago - 5 comments

#1949 - Check compatibility of `local-completions` with VLLM (returns logits) for `multiple_choice` tasks

Issue - State: open - Opened by haileyschoelkopf 4 months ago
Labels: bug

#1948 - Remove AMMLU Due to Translation

Pull Request - State: closed - Opened by haileyschoelkopf 4 months ago - 2 comments

#1947 - Add MMLU-Pro Dataset

Issue - State: open - Opened by haileyschoelkopf 4 months ago
Labels: help wanted, feature request, good first issue

#1947 - Add MMLU-Pro Dataset

Issue - State: closed - Opened by haileyschoelkopf 4 months ago
Labels: help wanted, feature request, good first issue

#1946 - Alghafa benchmark

Pull Request - State: open - Opened by khalil-Hennara 4 months ago - 7 comments

#1946 - Alghafa benchmark

Pull Request - State: open - Opened by khalil-Hennara 4 months ago - 5 comments

#1945 - The output of ceval is not as the same format at the official version?

Issue - State: open - Opened by ChuanhongLi 4 months ago - 1 comment

#1944 - Results is weird for Qwen2-1.5B

Issue - State: closed - Opened by SefaZeng 4 months ago - 6 comments

#1943 - Allow running hugging face models with both data parallelism and model parallelism at once

Pull Request - State: closed - Opened by clefourrier 4 months ago

#1942 - Fixed the [issue #1757](https://github.com/EleutherAI/lm-evaluation-harness/issues/1757) by editing the `yaml` files.

Pull Request - State: closed - Opened by sci-m-wang 4 months ago - 2 comments

#1941 - Save `fewshot_as_multiturn` argument in `results.json`

Issue - State: closed - Opened by djstrong 4 months ago - 1 comment

#1940 - Add the Arabic version with refactor to Arabic pica to be in alghafa …

Pull Request - State: closed - Opened by khalil-Hennara 4 months ago

#1939 - Fix a tiny typo in `main.py`

Pull Request - State: closed - Opened by sadra-barikbin 4 months ago - 1 comment

#1938 - Regarding decontamination

Issue - State: open - Opened by dsdanielpark 4 months ago

#1937 - Format of Personal Defined Dataset for Evaluation

Issue - State: closed - Opened by OscarC9912 4 months ago - 1 comment

#1936 - High Number of Tokens for openai-completions Models

Issue - State: open - Opened by selinaxiao 4 months ago

#1935 - Parallel GPU evaluation using simple_evaluate /evaluate functions? #1934

Issue - State: closed - Opened by PalaashAgrawal 4 months ago - 1 comment

#1934 - Parallel GPU evaluation using simple_evaluate /evaluate functions?

Issue - State: closed - Opened by Naitik1502 4 months ago

#1933 - Easier unitxt tasks loading and removal of unitxt library dependancy

Pull Request - State: closed - Opened by elronbandel 4 months ago - 8 comments

#1932 - --trust_remote_code does it actually do anything?

Issue - State: closed - Opened by devzzzero 4 months ago - 8 comments
Labels: bug

#1931 - [add] fld logical formula task

Pull Request - State: closed - Opened by MorishT 4 months ago - 1 comment

#1930 - `samples` is newline delimited

Pull Request - State: closed - Opened by baberabb 4 months ago

#1929 - Prettify lm_eval --tasks list

Pull Request - State: closed - Opened by anthony-dipofi 4 months ago - 2 comments

#1928 - [New Task] Add Paloma benchmark

Pull Request - State: closed - Opened by zafstojano 4 months ago - 5 comments

#1927 - Modify pre-commit hook to check merge conflicts accidentally committed

Pull Request - State: closed - Opened by LSinev 4 months ago

#1926 - Results filenames handling fix

Pull Request - State: closed - Opened by KonradSzafer 4 months ago - 3 comments

#1925 - --hf_hub_log_args causes IndexError

Issue - State: closed - Opened by johnwee1 4 months ago - 2 comments

#1924 - Update brier_score to be bounded [1,0]

Pull Request - State: closed - Opened by xksteven 4 months ago - 2 comments

#1923 - OOM Issue

Issue - State: closed - Opened by zhentingqi 4 months ago - 5 comments

#1922 - Multiprompt

Pull Request - State: open - Opened by lintangsutawika 4 months ago

#1921 - Confusion matrix metric

Pull Request - State: open - Opened by minaremeli 4 months ago - 10 comments

#1920 - build commit_id=b281b09, I cannot find lm-eval command.

Issue - State: closed - Opened by jieheroli 4 months ago - 1 comment

#1919 - change openai completions params to fit API documentation

Pull Request - State: open - Opened by artemorloff 4 months ago

#1918 - output_path may break postprocessing

Issue - State: open - Opened by artemorloff 4 months ago - 3 comments

#1917 - Add The Arabic version of the PICA benchmark

Pull Request - State: closed - Opened by khalil-Hennara 4 months ago

#1916 - Test output table layout consistency

Pull Request - State: closed - Opened by zafstojano 4 months ago - 1 comment

#1915 - Add New Benchmark

Issue - State: closed - Opened by khalil-Hennara 4 months ago - 2 comments

#1914 - Fix fewshot seed only set when overriding num_fewshot

Pull Request - State: closed - Opened by LSinev 4 months ago

#1913 - Update basque-glue

Pull Request - State: closed - Opened by zhabuye 4 months ago

#1912 - Implement NoticIA

Pull Request - State: closed - Opened by ikergarcia1996 4 months ago

#1911 - accuracy precision

Issue - State: closed - Opened by lernerjenny 4 months ago - 3 comments

#1910 - Add TensorRT-LLM support

Issue - State: open - Opened by taewan2002 4 months ago - 1 comment
Labels: feature request

#1909 - Fix social_iqa answer choices

Pull Request - State: closed - Opened by haileyschoelkopf 4 months ago

#1908 - social_iqa choices do not use actual answers

Issue - State: closed - Opened by ozgurcelik 4 months ago - 2 comments

#1907 - Evaluation for MegatronT5 Model

Issue - State: closed - Opened by wangyanbao666 4 months ago - 4 comments

#1906 - Fewshot seed only set when overriding num_fewshot

Issue - State: closed - Opened by stoical07 4 months ago - 1 comment
Labels: bug

#1905 - Try to make existing tests run little bit faster

Pull Request - State: closed - Opened by LSinev 4 months ago - 1 comment

#1904 - Load sentencepiece tokenizer for evaluation

Issue - State: closed - Opened by ayushsml 4 months ago - 2 comments

#1903 - OpenaiCompletionsLM invokes the completions API with max_tokens set to 0

Issue - State: open - Opened by chimezie 4 months ago - 1 comment

#1902 - mlx Model (loglikelihood & generate_until)

Pull Request - State: open - Opened by chimezie 4 months ago - 8 comments

#1901 - Complete task list from pr 1727

Pull Request - State: closed - Opened by anthony-dipofi 4 months ago - 5 comments

#1900 - add arc_challenge_mt

Pull Request - State: closed - Opened by jonabur 4 months ago - 5 comments

#1899 - model_comparator.py broken

Issue - State: open - Opened by johnwee1 4 months ago

#1898 - Add dataset card when pushing to HF hub

Pull Request - State: closed - Opened by KonradSzafer 4 months ago - 3 comments

#1897 - Add new Lambada translations

Pull Request - State: closed - Opened by zafstojano 4 months ago - 6 comments

#1896 - llama3-base gsm8k score

Issue - State: closed - Opened by rangehow 4 months ago - 2 comments

#1895 - Making hardcoded few shots compatible with the chat template mechanism

Pull Request - State: closed - Opened by clefourrier 4 months ago - 5 comments

#1894 - vLLM causing GPU memory leak with data_parallel_size=3

Issue - State: closed - Opened by johnwee1 4 months ago - 2 comments

#1893 - `higher_is_better` tickers in output table

Pull Request - State: closed - Opened by zafstojano 4 months ago - 1 comment

#1892 - GPU memory very high and unbalanced when testing Gemma

Issue - State: closed - Opened by smartliuhw 4 months ago - 2 comments

#1891 - TypeError of scrolls_narrativeqa

Issue - State: open - Opened by hicleo 4 months ago

#1890 - Updated vllm imports in vllm_causallms.py

Pull Request - State: closed - Opened by mgoin 4 months ago

#1889 - ImportError: cannot import name 'HfApi' from 'huggingface_hub'

Issue - State: closed - Opened by baberabb 4 months ago - 2 comments

#1888 - Aligning Prompts and choices of LogiQA task

Pull Request - State: closed - Opened by abzb1 4 months ago - 1 comment

#1887 - Mismatch Between Prompt Format and Expected Choices in LogiQA Dataset

Issue - State: closed - Opened by abzb1 4 months ago - 3 comments

#1886 - [HFLM]Add support for Ascend NPU

Pull Request - State: closed - Opened by statelesshz 4 months ago - 4 comments

#1885 - Multiple issues Encountered During Tasks Verification

Issue - State: open - Opened by zhabuye 4 months ago - 21 comments

#1884 - can we add C4 and PTB tasks for PpL?

Issue - State: open - Opened by 123wujiao 4 months ago - 1 comment
Labels: feature request

#1883 - Add Regression Testing

Issue - State: open - Opened by haileyschoelkopf 4 months ago - 4 comments
Labels: help wanted, feature request, good first issue

#1882 - eval with Alpaca template

Issue - State: closed - Opened by oneonlee 4 months ago - 1 comment

#1881 - test_docs of scrolls dataset

Issue - State: closed - Opened by huweim 4 months ago - 1 comment

#1880 - [HFLM]Use Accelerate's API to reduce hard-coded CUDA code

Pull Request - State: closed - Opened by statelesshz 4 months ago - 2 comments

GitHub / EleutherAI/lm-evaluation-harness issues and pull requests