Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / EleutherAI/lm-evaluation-harness issues and pull requests
#1842 - How to use Zeno
Issue -
State: open - Opened by DavidAdamczyk about 2 months ago
- 1 comment
#1841 - Inconsistent evaluation results with Chat Template
Issue -
State: open - Opened by shiweijiezero about 2 months ago
- 3 comments
#1840 - Evaluate encoder-decoder-models
Issue -
State: closed - Opened by Bachstelze about 2 months ago
- 1 comment
#1839 - AssertionError: aggregation named 'mean' conflicts with existing registered aggregation!
Issue -
State: open - Opened by hunter2009pf about 2 months ago
#1838 - Fix links in README guiding to another branch
Pull Request -
State: closed - Opened by LSinev about 2 months ago
- 1 comment
#1837 - Bug: wrong `until` default value for chat based model
Issue -
State: open - Opened by YilunZhou about 2 months ago
- 2 comments
#1836 - sha256 for datasets or samples
Issue -
State: closed - Opened by artemorloff about 2 months ago
- 1 comment
#1835 - MPS backend out of memory evaluating fine-tuned Mixtral-8x7B-Instruct-v0.1 on a machine with 100+ GB
Issue -
State: closed - Opened by chimezie about 2 months ago
- 2 comments
#1834 - Afrimmlu
Pull Request -
State: closed - Opened by IsraelAbebe about 2 months ago
- 1 comment
#1833 - Evaluation results of llama2 with lm-evaluation-harness using wikitext-2
Issue -
State: open - Opened by l2002924700 about 2 months ago
- 1 comment
#1832 - Adding LLaVa support
Pull Request -
State: open - Opened by ashvinnihalani about 2 months ago
- 5 comments
#1831 - Using Language Models as Evaluators
Issue -
State: open - Opened by lintangsutawika about 2 months ago
- 5 comments
Labels: feature request
#1830 - Errors when loading exact_match.py
Issue -
State: open - Opened by twxin about 2 months ago
- 2 comments
#1829 - eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."
Issue -
State: open - Opened by Jp-17 about 2 months ago
#1828 - Fix: support PEFT/LoRA with added tokens
Pull Request -
State: closed - Opened by mapmeld about 2 months ago
#1827 - Add More Tests
Issue -
State: open - Opened by haileyschoelkopf about 2 months ago
Labels: feature request
#1826 - I get this error whenever I try to run an eval: ImportError: cannot import name 'HfApi' from 'huggingface_hub'
Issue -
State: closed - Opened by menhguin about 2 months ago
- 15 comments
#1825 - Afrixnli Updates
Pull Request -
State: closed - Opened by JessicaOjo about 2 months ago
- 1 comment
#1824 - Avoid slow testing due to network issues.
Issue -
State: open - Opened by pixeli99 about 2 months ago
- 2 comments
#1823 - Getting error on lm-evaluation for merged models deployed on HF
Issue -
State: closed - Opened by tolgakurtuluss about 2 months ago
- 3 comments
#1822 - The input format for XNLI seems wired?
Issue -
State: open - Opened by SefaZeng about 2 months ago
- 2 comments
#1821 - TypeError: 'NoneType' object is not iterable when using cache and loglikelihood_rolling
Issue -
State: open - Opened by mdocekal about 2 months ago
#1820 - SyntaxError when import lm_eval
Issue -
State: closed - Opened by mxjmtxrm about 2 months ago
- 3 comments
#1819 - Error when evaluating math.
Issue -
State: closed - Opened by SefaZeng about 2 months ago
- 2 comments
#1818 - when MMLU eval, num_few_shot=5, more GPU overhead
Issue -
State: closed - Opened by chunniunai220ml about 2 months ago
- 1 comment
#1817 - Task description newline characters removed by Jinja templating, affecting generated requests and performance
Issue -
State: open - Opened by ma0li about 2 months ago
- 1 comment
#1816 - Multi-round evaluation for chat models
Issue -
State: open - Opened by YilunZhou about 2 months ago
- 1 comment
#1815 - Financial PhraseBank (FPB) Eval Metric
Pull Request -
State: open - Opened by bcicc about 2 months ago
#1814 - Multi Label Classification
Issue -
State: open - Opened by IsraelAbebe about 2 months ago
#1813 - how to evaluate on boolq? incorrect results
Issue -
State: closed - Opened by sidhantls about 2 months ago
- 4 comments
#1812 - Support Mamba based models for evaluation tasks
Issue -
State: closed - Opened by NamburiSrinath about 2 months ago
- 1 comment
#1811 - Out-Of-Memory Error for same batch size but different dataset
Issue -
State: closed - Opened by richardzhuang0412 about 2 months ago
- 7 comments
#1810 - Fix cost_estimate.py
Pull Request -
State: open - Opened by xksteven about 2 months ago
- 5 comments
#1809 - how to run all the bigbench tasks at once?
Issue -
State: open - Opened by kbmlcoding about 2 months ago
#1808 - Gemini 1.5/Ultra support
Issue -
State: open - Opened by notrichardren about 2 months ago
#1807 - interface doc update
Pull Request -
State: closed - Opened by KonradSzafer about 2 months ago
- 2 comments
#1806 - Update flag `--hf_hub_log_args` in interface documentation
Pull Request -
State: closed - Opened by sepiatone about 2 months ago
#1805 - Is caching large evaluation dataset like MMLU supported?
Issue -
State: closed - Opened by richardzhuang0412 about 2 months ago
- 1 comment
#1804 - How to evaluate a large model like llama-65B?
Issue -
State: closed - Opened by fayuge about 2 months ago
- 2 comments
#1803 - Copal task
Pull Request -
State: closed - Opened by Erland366 about 2 months ago
- 3 comments
#1802 - Hugging Face: Open LLM Leaderboard: how do I reproduce results for details_gpt2 repository
Issue -
State: closed - Opened by CoconutJJ about 2 months ago
- 2 comments
#1801 - Exclude all current tasks
Issue -
State: closed - Opened by YilunZhou about 2 months ago
- 2 comments
Labels: feature request
#1800 - Fix `--gen_kwargs` and VLLM (`temperature` not respected)
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
Labels: bug
#1799 - llama3 baseline reproduction problem
Issue -
State: closed - Opened by fmm170 about 2 months ago
- 5 comments
Labels: asking questions
#1798 - link to the example output on the hub
Pull Request -
State: closed - Opened by KonradSzafer about 2 months ago
#1797 - Add NPU support for huggingface.py
Issue -
State: closed - Opened by jiaqiw09 about 2 months ago
- 2 comments
#1796 - Make `scripts.write_out` error out when no splits match
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#1795 - Math or minerva_math not generating any samples via scripts.write_out
Issue -
State: closed - Opened by xksteven about 2 months ago
- 1 comment
#1794 - Vllm get tokenizer
Pull Request -
State: open - Opened by AguirreNicolas about 2 months ago
- 1 comment
#1793 - Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
- 1 comment
#1792 - Update `--tasks list` option in interface documentation
Pull Request -
State: closed - Opened by sepiatone about 2 months ago
- 1 comment
#1791 - Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774)
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
Labels: bug, feature request
#1790 - Fix `batch_size=auto` for HF Seq2Seq models (#1765)
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
Labels: bug
#1789 - Fix for bootstrap_iters = 0 case (#1715)
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
- 1 comment
Labels: bug
#1788 - Support loading slices of a split from a dataset
Issue -
State: open - Opened by alexrs about 2 months ago
#1787 - add NPU support for huggingface.py
Pull Request -
State: closed - Opened by jiaqiw09 about 2 months ago
- 4 comments
#1786 - Add Ascend NPU for huggingface.py
Pull Request -
State: closed - Opened by jiaqiw09 about 2 months ago
- 1 comment
#1785 - limit fix
Pull Request -
State: closed - Opened by KonradSzafer about 2 months ago
- 2 comments
#1784 - Fix bug in setting until kwarg in openai completions
Pull Request -
State: closed - Opened by ciaranby about 2 months ago
#1783 - openai.InternalServerError: the model generated invalid Unicode output
Issue -
State: open - Opened by djstrong about 2 months ago
#1782 - Error when limit is not specified (possibly issue with requirements?)
Issue -
State: closed - Opened by hammoudhasan about 2 months ago
- 2 comments
#1781 - Data preprocess is slow for mmlu
Issue -
State: closed - Opened by ThisisBillhe about 2 months ago
- 1 comment
Labels: asking questions
#1780 - fix limit bug when limit is None
Pull Request -
State: closed - Opened by djstrong about 2 months ago
- 3 comments
#1779 - remove echo parameter in OpenAI completions API
Pull Request -
State: closed - Opened by djstrong about 2 months ago
- 1 comment
#1778 - error in eval-tracker : 'Namespace' object has no attribute 'push_results_to_hub'
Issue -
State: closed - Opened by abgoswam about 2 months ago
- 1 comment
#1777 - eval tracker args fix
Pull Request -
State: closed - Opened by KonradSzafer about 2 months ago
#1776 - Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args`
Pull Request -
State: closed - Opened by MuhammadBinUsman03 about 2 months ago
- 1 comment
#1775 - Fix Caching Tests ; Remove `pretrained=gpt2` default
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
- 1 comment
#1774 - Sorting task output alphabetically
Issue -
State: closed - Opened by ad8e about 2 months ago
- 2 comments
#1773 - Adding some tasks
Pull Request -
State: closed - Opened by clefourrier about 2 months ago
- 1 comment
#1772 - How to filter to see only generate_until: lm-eval --tasks list
Issue -
State: open - Opened by chigkim about 2 months ago
#1771 - Same results - different models
Issue -
State: closed - Opened by aleksoren about 2 months ago
- 4 comments
#1770 - Support OpenAI's Batch API
Issue -
State: open - Opened by djstrong about 2 months ago
- 1 comment
#1769 - remove duplicated `num_fewshot: 0`
Pull Request -
State: closed - Opened by chujiezheng 2 months ago
- 1 comment
#1768 - IndexError: list index out of range when running benchmark on gguf model
Issue -
State: open - Opened by fherrmannsdoerfer 2 months ago
- 2 comments
#1767 - Cannot have both a group list and task list
Issue -
State: open - Opened by steven-basart 2 months ago
- 5 comments
Labels: bug, asking questions
#1766 - evaluation tracker implementation
Pull Request -
State: closed - Opened by KonradSzafer 2 months ago
- 11 comments
#1765 - Seq2Seq Models with Batch Size `auto`
Issue -
State: closed - Opened by KurtMica 2 months ago
#1764 - New commits for final PR + Edit to lm-eval-overview Notebook
Pull Request -
State: closed - Opened by marilevay 2 months ago
- 2 comments
#1763 - Include inference time in results
Pull Request -
State: closed - Opened by giorgossideris 2 months ago
- 4 comments
#1762 - Bug in yaml parsing
Issue -
State: open - Opened by jordane95 2 months ago
#1761 - Does this support the model to use generate functions to eval not likelihood?
Issue -
State: open - Opened by Juhywcy 2 months ago
#1760 - Fix m_arc choices
Pull Request -
State: closed - Opened by jordane95 2 months ago
- 2 comments
#1759 - Output constrained support
Issue -
State: open - Opened by Mihaiii 2 months ago
#1758 - Pile 10k new task
Pull Request -
State: closed - Opened by mukobi 2 months ago
- 3 comments
#1757 - HellaSwag with UnicodeDecodeError
Issue -
State: open - Opened by Hua-rookie 2 months ago
- 13 comments
#1756 - vllm lora support
Pull Request -
State: closed - Opened by bcicc 2 months ago
- 1 comment
#1755 - No inference time is returned in results
Issue -
State: closed - Opened by giorgossideris 2 months ago
- 3 comments
#1754 - New Task Request: LegalBench
Issue -
State: open - Opened by haileyschoelkopf 2 months ago
- 2 comments
Labels: help wanted, feature request, good first issue
#1753 - Create task `dharma2` - a small (300 qs) & wide (many topics) dataset
Pull Request -
State: open - Opened by UmerHA 2 months ago
- 6 comments
#1752 - Pytorch profiling Error In Megatron-DeepSpeed/tasks/eval_harness/evaluate.py
Issue -
State: closed - Opened by jrt-20 2 months ago
- 1 comment
#1751 - Accuracy gap between single GPU and multiple GPUs
Issue -
State: open - Opened by HsuWanTing 2 months ago
- 3 comments
#1750 - Add filter registry decorator
Pull Request -
State: closed - Opened by lozhn 2 months ago
- 2 comments
#1749 - Fix Parameter Propagation for Tasks that have `include`
Pull Request -
State: closed - Opened by lintangsutawika 2 months ago
#1748 - Add tasks for performance on long context lengths
Issue -
State: open - Opened by nairbv 2 months ago
- 1 comment
Labels: feature request
#1747 - [Feature Request] Metrics that require knowledge of input.
Issue -
State: open - Opened by ciaranby 2 months ago
#1746 - [Feature Request] pre-built Docker image support
Issue -
State: open - Opened by zsaladin 2 months ago
- 3 comments
Labels: help wanted, feature request
#1745 - add task for mmlu evaluation in arc multiple choice format
Pull Request -
State: closed - Opened by jonabur 2 months ago
- 5 comments
#1744 - Error when running lm_eval with piqa task with EleutherAI/gpt-j-6b
Issue -
State: closed - Opened by JingyangXiang 2 months ago
- 3 comments
#1743 - accelerate doesn't work with auto:(>1)
Issue -
State: open - Opened by ozgurcelik 2 months ago
- 4 comments