Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / EleutherAI/lm-evaluation-harness issues and pull requests
#2246 - Implemented Polyglotoxicityprompts
Pull Request -
State: closed - Opened by jjbuschhoff about 1 month ago
- 1 comment
#2245 - fix group args of mmlu and mmlu_pro
Pull Request -
State: closed - Opened by eyuansu62 about 1 month ago
- 1 comment
#2244 - Fix typos in multiple places
Pull Request -
State: closed - Opened by LSinev about 1 month ago
- 1 comment
#2243 - Multimodal prototyping
Pull Request -
State: closed - Opened by lintangsutawika about 1 month ago
#2242 - OpenAICompletionsAPI does not implement the chat_template function, causing a TypeError when evaluating.
Issue -
State: closed - Opened by yonatano about 1 month ago
- 1 comment
Labels: bug
#2241 - fix mmlu_pro typo
Pull Request -
State: closed - Opened by baberabb about 1 month ago
#2240 - IndexError: index out of bounds for multiple tasks of mmlu_pro
Issue -
State: closed - Opened by lxning about 1 month ago
#2239 - Fix logging when resizing embedding layer in peft mode
Pull Request -
State: closed - Opened by WPoelman about 1 month ago
- 2 comments
#2238 - fix the regex string in mmlu_pro template
Pull Request -
State: closed - Opened by lxning about 1 month ago
- 1 comment
#2237 - mmlu_pro regex in template does not work
Issue -
State: closed - Opened by lxning about 1 month ago
- 7 comments
#2236 - Created new task for testing Llama on Asdiv
Pull Request -
State: closed - Opened by Cameron7195 about 1 month ago
- 2 comments
#2235 - default chat template method fix
Pull Request -
State: closed - Opened by KonradSzafer about 1 month ago
- 6 comments
#2234 - Cannot load local `mmlu` dataset
Issue -
State: closed - Opened by AIR-hl about 1 month ago
- 2 comments
#2233 - TODOs for Implementing LLM-as-a-Judge in Eval-Harness (Work in Progress)
Issue -
State: open - Opened by SeungoneKim about 1 month ago
#2232 - Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version)
Pull Request -
State: closed - Opened by Malikeh97 about 1 month ago
- 9 comments
#2231 - apply_chat_template got 'str' object is not callable
Issue -
State: closed - Opened by lxning about 1 month ago
- 15 comments
#2230 - Update mmmu
Pull Request -
State: closed - Opened by lintangsutawika about 1 month ago
#2229 - Update CODEOWNERS
Pull Request -
State: closed - Opened by haileyschoelkopf about 1 month ago
- 1 comment
#2228 - Lingoly README update
Pull Request -
State: closed - Opened by am-bean about 1 month ago
- 1 comment
#2227 - Fix Zeno Visualizer
Pull Request -
State: closed - Opened by namtranase about 1 month ago
- 2 comments
#2226 - Can I incorporate my own benchmark as a new task within lm-evaluation-harness?
Issue -
State: closed - Opened by lioooncoder about 1 month ago
- 2 comments
#2225 - Why no results for closed-sourced models?
Issue -
State: closed - Opened by mrconter1 about 1 month ago
- 1 comment
#2224 - Fix the zeno_visualizer.py to work with Phi3 Model
Issue -
State: closed - Opened by namtranase about 1 month ago
- 1 comment
#2223 - add dolomite-engine support for lm-eval-harness
Pull Request -
State: closed - Opened by mayank31398 about 1 month ago
- 3 comments
#2222 - trust_remote_code error in simple evaluate for hellaswag
Issue -
State: closed - Opened by sidhantls about 1 month ago
- 1 comment
#2221 - test_version_stable.py does not exist?
Issue -
State: open - Opened by dorsa-zeinali about 1 month ago
- 1 comment
#2220 - Classification into categories with generate-until and metrics
Issue -
State: closed - Opened by DavidAdamczyk about 1 month ago
- 1 comment
Labels: asking questions
#2219 - fix the leaderboard doc to reflect the tasks
Pull Request -
State: closed - Opened by NathanHB about 1 month ago
#2218 - Update IFEval dataset to official one
Pull Request -
State: closed - Opened by lewtun about 1 month ago
- 1 comment
#2217 - Add GPTQModel support for inferencing GPTQ models
Pull Request -
State: open - Opened by Qubitium about 1 month ago
- 1 comment
#2216 - Update yaml to adapt to belebele dataset changes
Pull Request -
State: closed - Opened by Uminosachi about 1 month ago
- 1 comment
#2215 - Created a new task for gsm8k which corresponds to the Llama cot settings…
Pull Request -
State: closed - Opened by Cameron7195 about 1 month ago
- 3 comments
#2214 - Extremely SLOW, even slower than training
Issue -
State: closed - Opened by AaronZLT about 2 months ago
- 2 comments
Labels: asking questions
#2213 - Invalid response for loglikelihood for GGUF model
Issue -
State: open - Opened by jaslatendresse about 2 months ago
- 1 comment
#2212 - MATH is_equiv() function can not parser some ground truth
Issue -
State: open - Opened by wukaixingxp about 2 months ago
#2211 - The results in the new and old versions differ from one another.
Issue -
State: closed - Opened by ShahadSZ about 2 months ago
- 3 comments
#2210 - nltk pickle
Issue -
State: closed - Opened by bezir about 2 months ago
- 1 comment
#2209 - add option for custom aggregation
Pull Request -
State: open - Opened by lintangsutawika about 2 months ago
- 2 comments
#2208 - Add KoCommonGEN v2 benchmark
Pull Request -
State: open - Opened by metterian about 2 months ago
- 2 comments
#2207 - CoverBench
Pull Request -
State: open - Opened by ysjprojects about 2 months ago
- 1 comment
#2206 - Update README.md
Pull Request -
State: closed - Opened by ysjprojects about 2 months ago
#2205 - Error of `continuation_logprobs_dicts` is `None` when running with `vllm` on multi-choice tasks
Issue -
State: closed - Opened by tongyx361 about 2 months ago
- 1 comment
#2204 - Missing Tasks in Leaderboard
Issue -
State: closed - Opened by xiaoyuxin1002 about 2 months ago
- 2 comments
#2203 - Logging
Pull Request -
State: open - Opened by lintangsutawika about 2 months ago
#2202 - [rank1]: huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url:
Issue -
State: open - Opened by kmehant about 2 months ago
Labels: bug
#2201 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte
Issue -
State: open - Opened by YcChou about 2 months ago
- 3 comments
#2200 - Question about IFEval on LeaderBoard
Issue -
State: closed - Opened by matouk98 about 2 months ago
- 1 comment
Labels: asking questions
#2199 - AttributeError: module 'torch' has no attribute 'uint16'
Issue -
State: closed - Opened by Shinning-Zhou about 2 months ago
- 3 comments
#2198 - New task: Lingoly
Pull Request -
State: closed - Opened by am-bean about 2 months ago
- 8 comments
#2197 - AttributeError: module 'lm_eval.tasks' has no attribute 'ALL_TASKS'
Issue -
State: closed - Opened by Shinning-Zhou about 2 months ago
- 2 comments
#2196 - mmlu_pro fewshot_config
Issue -
State: closed - Opened by liewziqin about 2 months ago
- 1 comment
#2195 - Feature request: `4.4.0` Pypi release with `leaderboard`
Issue -
State: closed - Opened by younesbelkada about 2 months ago
- 2 comments
#2194 - It's hard to look at the code of this repository. Garbage.
Issue -
State: closed - Opened by Runningwater2357 about 2 months ago
- 1 comment
#2193 - How to evaluate very large models (>= 70b) ?
Issue -
State: closed - Opened by eldarkurtic about 2 months ago
- 2 comments
#2192 - CoQA evaluation
Issue -
State: open - Opened by liewziqin about 2 months ago
#2191 - gsm_plus minor fix
Pull Request -
State: closed - Opened by ysjprojects about 2 months ago
#2190 - Adding MoralChoice as a benchmark
Issue -
State: open - Opened by notrichardren about 2 months ago
#2189 - Adding RMS Calibration Error as a Metric
Issue -
State: open - Opened by notrichardren about 2 months ago
#2188 - Adding AdvGLUE as an evaluation
Issue -
State: open - Opened by notrichardren about 2 months ago
#2187 - Fix `loglikelihood_rolling` caching ( #1821 )
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#2186 - Small README tweaks
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#2185 - OOM for gpqa on A100
Issue -
State: closed - Opened by ZhichaoWang970201 about 2 months ago
- 5 comments
#2184 - Fix `revision` kwarg dtype in edge-cases
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#2183 - Better Arg Typechecking / Validation
Issue -
State: open - Opened by haileyschoelkopf about 2 months ago
- 2 comments
Labels: help wanted, feature request, good first issue
#2182 - Issue with `state-spaces/transformerpp-2.7b` when generating
Issue -
State: closed - Opened by jhuang265 about 2 months ago
- 3 comments
#2181 - IrokoBench: Fix incorrect group assignments
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#2180 - Add long context evaluation benchmarks such as LongBench and LEval.
Issue -
State: open - Opened by txchen-USTC about 2 months ago
- 2 comments
Labels: help wanted, feature request
#2179 - Support for multi-turn conversation benchmark
Issue -
State: closed - Opened by eyuansu62 about 2 months ago
- 2 comments
#2178 - TypeError: argument 'ids': 'NoneType' object cannot be converted to 'Sequence'
Issue -
State: open - Opened by Maryam142 about 2 months ago
- 3 comments
#2177 - when executes the OPT 6.7B model evaluation, the problem TypeError: 'NoneType' object is not iterable occur
Issue -
State: open - Opened by yanchenmochen about 2 months ago
- 10 comments
#2176 - add roadmap document
Pull Request -
State: open - Opened by lintangsutawika about 2 months ago
- 4 comments
#2175 - call model instead of forward
Issue -
State: closed - Opened by ad8e about 2 months ago
- 5 comments
#2174 - [hotfix] API: messages were created twice
Pull Request -
State: closed - Opened by baberabb about 2 months ago
#2173 - Test BBH using local dataset
Issue -
State: closed - Opened by shuoYan97 about 2 months ago
- 4 comments
#2172 - Merge our most recent Multimodal commits
Pull Request -
State: closed - Opened by haileyschoelkopf about 2 months ago
#2171 - local-completion: Passing Messages as List to Prompt Instead of Required String Format
Issue -
State: open - Opened by ankush13r about 2 months ago
- 1 comment
#2170 - How to calculate the "token_perplexity"
Issue -
State: open - Opened by nongfang55 about 2 months ago
- 1 comment
#2169 - fix typo.
Pull Request -
State: closed - Opened by kargaranamir about 2 months ago
#2168 - add okapi machine translated notice.
Pull Request -
State: closed - Opened by kargaranamir about 2 months ago
- 1 comment
#2167 - [BUG] Huggingface and Neuron runs fail if model revision is an integer
Issue -
State: closed - Opened by christyler3030 about 2 months ago
- 1 comment
#2166 - Can't use OpenAI Chat Completion API call with simplest tasks
Issue -
State: open - Opened by Some-random about 2 months ago
- 4 comments
#2165 - flan_held_in task is broken starting with commit #1741
Issue -
State: open - Opened by m-resta 2 months ago
#2164 - After executing tmmluplus, there are no group scores displayed, only the scores for each individual task are shown.
Issue -
State: closed - Opened by zhuangyuan123 2 months ago
- 5 comments
#2163 - mgsm task not working
Issue -
State: open - Opened by adiprasad 2 months ago
#2162 - update evaluations
Pull Request -
State: closed - Opened by ouhenio 2 months ago
- 1 comment
#2161 - Premature `num_fewshot` check with `fewshot_as_multiturn`
Issue -
State: open - Opened by baberabb 2 months ago
- 1 comment
#2160 - Better tests for API models
Issue -
State: open - Opened by baberabb 2 months ago
#2159 - GPT2 eval in lambada_openai, acc only 0.325
Issue -
State: open - Opened by KeyKy 2 months ago
- 1 comment
#2158 - Running multiple evaluations on the same task
Issue -
State: open - Opened by MihaiMasala 2 months ago
- 3 comments
#2157 - Add new benchmark: Spanish bench
Pull Request -
State: open - Opened by zxcvuser 2 months ago
- 3 comments
#2156 - Add new benchmark: Portuguese bench
Pull Request -
State: open - Opened by zxcvuser 2 months ago
- 1 comment
#2155 - Add new benchmark: Galician bench
Pull Request -
State: open - Opened by zxcvuser 2 months ago
- 1 comment
#2154 - Add new benchmark: Catalan bench
Pull Request -
State: open - Opened by zxcvuser 2 months ago
- 1 comment
#2153 - Add new benchmark: Basque bench
Pull Request -
State: open - Opened by zxcvuser 2 months ago
- 2 comments
#2152 - Bug about the output information.
Issue -
State: open - Opened by eyuansu62 2 months ago
#2151 - When using the tmmluplus dataset, I'm encountering a TypeError: init() got an unexpected keyword argument 'group_alias'.
Issue -
State: closed - Opened by zhuangyuan123 2 months ago
- 1 comment
#2150 - Errors when trying to evaluate local T-MAC gguf model
Issue -
State: open - Opened by prsmendonca 2 months ago
#2149 - [Bugfix] add temperature=0 to logprobs and seed args to API models
Pull Request -
State: closed - Opened by baberabb 2 months ago
- 2 comments
#2148 - Random results when parallelize=True
Issue -
State: closed - Opened by Ofir408 2 months ago
- 2 comments
#2147 - Bug in the `visualize-wandb.ipynb` example
Issue -
State: open - Opened by tanaymeh 2 months ago