Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / FastEval/FastEval issues and pull requests
#93 - How to run this on a pytorch or nvidia docker image?
Issue -
State: open - Opened by ohmeow 11 months ago
#92 - The NVIDIA driver on your system is too old (found version 11080)
Issue -
State: open - Opened by ohmeow 11 months ago
#91 - Support quantized model like awq with vllm
Issue -
State: open - Opened by xiechengmude about 1 year ago
- 1 comment
#90 - [FEAT] Allow setting custom openai api base
Pull Request -
State: closed - Opened by ishaan-jaff about 1 year ago
#89 - FastEval not loading
Issue -
State: closed - Opened by reachrkr about 1 year ago
- 2 comments
#88 - [Suggestion request] How to evaluate subjective tasks
Issue -
State: closed - Opened by Tostino about 1 year ago
- 2 comments
#87 - Add scores of xDAN-L1-Think model on the test field of MT-Bench / CoT
Pull Request -
State: open - Opened by xiechengmude about 1 year ago
- 5 comments
#86 - Make MMLU closer to original implementation
Issue -
State: open - Opened by tju01 about 1 year ago
- 5 comments
Labels: bug, existing-benchmark
#85 - Small updates to documentation
Pull Request -
State: closed - Opened by tju01 about 1 year ago
#84 - Custom prompt template
Issue -
State: open - Opened by tju01 about 1 year ago
Labels: enhancement
#83 - add AGIEval
Pull Request -
State: closed - Opened by Anna22042001 about 1 year ago
#82 - Add AGIEval through dev branch
Pull Request -
State: closed - Opened by Anna22042001 about 1 year ago
#81 - Add support for Palm, Claude-2, Llama2, CodeLlama (100+LLMs)
Pull Request -
State: closed - Opened by ishaan-jaff about 1 year ago
- 11 comments
#80 - Add Alignment Lab AI discord link & contributor guide
Pull Request -
State: closed - Opened by tju01 about 1 year ago
#79 - Add AGIEval into CoT evaluation.
Pull Request -
State: open - Opened by Anna22042001 about 1 year ago
#78 - Use more asyncio for inference to prevent problems with too many threads
Pull Request -
State: closed - Opened by tju01 about 1 year ago
- 12 comments
#77 - WizardCoder + other stuff
Pull Request -
State: closed - Opened by tju01 about 1 year ago
#76 - Improve documentation
Issue -
State: open - Opened by tju01 about 1 year ago
Labels: enhancement
#75 - Add OpenAssistant/codellama-13b-oasst-sft-v10
Pull Request -
State: closed - Opened by tju01 about 1 year ago
#74 - Add flag to continue resuming previous evaluations
Issue -
State: closed - Opened by tju01 about 1 year ago
- 2 comments
Labels: enhancement
#73 - Create plot from the results
Issue -
State: open - Opened by tju01 about 1 year ago
- 5 comments
Labels: enhancement
#72 - Add the new 70B OpenAssistant model
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#71 - Add comparison to lm-evaluation-harness
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#70 - AgentBench
Pull Request -
State: open - Opened by tju01 over 1 year ago
#69 - Custom test data: Allow specifying number of repetitions
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement, existing-benchmark
#68 - Improve results written to stdout
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement
#67 - Allow HF dataset for custom test data
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement, existing-benchmark
#66 - Benchmark with custom data
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#65 - Allow no system message
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#64 - LM-Eval: Use data parallel / FSDP
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#63 - Rethink ranking computation
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#62 - Add GPT-4 for more accurate rankings
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#61 - Rethink score aggregation logic
Issue -
State: closed - Opened by tju01 over 1 year ago
- 6 comments
#60 - Extend data parallel evaluation to vLLM backend
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#59 - tiiuae/falcon-7b-instruct with vLLM backend
Issue -
State: closed - Opened by tju01 over 1 year ago
#58 - Run correctness check for models on leaderboard
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
Labels: existing-model
#57 - Re-evaluate LLAMA-2-Chat on CoT
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
Labels: bug, existing-model
#56 - DS-1000
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#55 - HF Transformers backend: Add data parallel evaluation
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#54 - Allow specifying model args
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#53 - Dolphin
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#52 - FreeWilly2
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#51 - Update & Improve OpenAI Evals
Pull Request -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#50 - LLaMA-2-Chat
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#49 - Recompute all results
Pull Request -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#48 - Add text-generation-inference backend
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#47 - Clean up code
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#46 - MT-Bench
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#45 - Remove final token
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
Labels: bug, existing-model
#44 - Multilingual capabilities
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement, new-benchmark
#43 - InterCode
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#42 - Recompute OpenAI models
Issue -
State: closed - Opened by tju01 over 1 year ago
Labels: bug, existing-model
#41 - Long context
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement, new-benchmark
#40 - What does Fastchat do with the system message?
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
#39 - Add MMLU task for CoT
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#38 - Sandbox
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement
#37 - Replace lm-evaluation-harness
Issue -
State: closed - Opened by tju01 over 1 year ago
- 4 comments
Labels: enhancement
#36 - Tool usage
Issue -
State: open - Opened by tju01 over 1 year ago
- 33 comments
Labels: enhancement, new-benchmark
#35 - More coding in other programming languages
Issue -
State: open - Opened by tju01 over 1 year ago
Labels: enhancement, new-benchmark
#34 - MT-bench
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
Labels: enhancement, new-benchmark
#33 - Reduce number of tasks in OpenAI evals
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
#32 - Reduce number of required GPT reviews for Vicuna benchmark
Issue -
State: closed - Opened by tju01 over 1 year ago
Labels: enhancement, existing-benchmark
#31 - Add fastchat backend for models
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#30 - Check whether CoT results are statistically significant enough
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#29 - Change wrong model prompts & delete invalid data
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#28 - Add fastchat backend
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#27 - Check Vicuna Elo rank
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#26 - Fix trailing whitespace in prompt
Issue -
State: closed - Opened by tju01 over 1 year ago
- 6 comments
#25 - Mosaicml
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#24 - What's up with HumanEval/29?
Issue -
State: closed - Opened by tju01 over 1 year ago
- 17 comments
#23 - Ranking
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#22 - Use text-generation-webui API for generation
Pull Request -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
#21 - Use text-generation-inference for faster & distributed inference
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#20 - Refactor
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#19 - Add cot (chain of thought) benchmark
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#18 - COT evaluation
Issue -
State: closed - Opened by tju01 over 1 year ago
- 4 comments
#17 - Evaluate different languages other than english
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#16 - Coding benchmarks
Issue -
State: closed - Opened by tju01 over 1 year ago
#15 - Evaluate falcon models on lm-eval-harness
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#14 - Add falcon-instruct
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#13 - Fix & improve requirements
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#12 - Require model type & update README
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#11 - lm-evaluation-harness
Pull Request -
State: closed - Opened by tju01 over 1 year ago
#10 - Evaluate OpenAssistant with system message
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
Labels: enhancement, new-model
#9 - OpenAI eval enhancement
Issue -
State: closed - Opened by notmd over 1 year ago
- 1 comment
Labels: enhancement, existing-benchmark
#8 - Multi-GPU and parallel API requests
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
#7 - Improve model sampling
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#6 - Use more than 20 samples per task
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#5 - Check whether scores are calulated correctly
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#4 - Keep up with OpenAI evals
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
Labels: enhancement, existing-benchmark
#3 - Figure out which evals are actually worth using
Issue -
State: closed - Opened by tju01 over 1 year ago
- 2 comments
#2 - Use single model for model-graded evals
Issue -
State: closed - Opened by tju01 over 1 year ago
- 1 comment
#1 - More benchmarks
Issue -
State: closed - Opened by tju01 over 1 year ago
- 4 comments