GoodAI/goodai-ltm-benchmark issues and pull requests

#79 - Claude ltm2

Pull Request - State: closed - Opened by JosephDavidsonKSWH 7 months ago

#78 - Show results by metric

Pull Request - State: closed - Opened by dcasbol 7 months ago - 1 comment

#77 - New ltm score

Pull Request - State: closed - Opened by dcasbol 7 months ago

#76 - LLaMA 3 70B - Small adjustments and results

Pull Request - State: closed - Opened by dcasbol 7 months ago

#75 - Remove (v2) from readme title

Pull Request - State: closed - Opened by dcasbol 7 months ago

#74 - Remove references to the benchmark version

Pull Request - State: closed - Opened by dcasbol 7 months ago

#73 - Benchmark Release V3

Pull Request - State: closed - Opened by JosephDavidsonKSWH 7 months ago

#72 - Benchmark v3

Pull Request - State: closed - Opened by JosephDavidsonKSWH 7 months ago

#71 - Restaurant edge case fix

Pull Request - State: closed - Opened by dcasbol 7 months ago

#70 - LTM-143 - Accurate progress bar

Pull Request - State: closed - Opened by dcasbol 8 months ago

#69 - LTM-151 - fix chapterbreak download

Pull Request - State: closed - Opened by dcasbol 8 months ago

#68 - Increase model default max message size

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#67 - Approximate a factor to better estimate the real token count given by the API

Pull Request - State: closed - Opened by dcasbol 8 months ago

#66 - Fix exceeded context errors for Anthropic models

Pull Request - State: closed - Opened by dcasbol 8 months ago

#65 - Display a warning in the report header if any test overruns the memory span.

Pull Request - State: closed - Opened by dcasbol 8 months ago

#64 - LTM score: Weight test tokens by the accuracy on that test.

Pull Request - State: closed - Opened by dcasbol 8 months ago

#63 - Context hotfix

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#62 - not able to run goodai-ltm-benchmark

Issue - State: closed - Opened by naveen1286 8 months ago - 7 comments

#61 - LTM-148 - Filter out questions in sally anne test

Pull Request - State: closed - Opened by dcasbol 8 months ago

#60 - Ltm 147 prospective

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#59 - LTM-139 - Restaurant Evaluations

Pull Request - State: closed - Opened by dcasbol 8 months ago

#58 - Ltm 144 waits after reply

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago - 1 comment

#57 - Ltm 145 cheaper llms

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#56 - Ltm 137 litellm

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago - 1 comment

#55 - Show dataset score in report

Pull Request - State: closed - Opened by dcasbol 8 months ago

#54 - Ltm 142 locations directions fix

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#53 - Leftover incorrect code from big number PR

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#52 - LTM-140: changed response to sneeze to something less guessable

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago

#51 - LTM 138 - LTM Claude

Pull Request - State: closed - Opened by dcasbol 8 months ago

#50 - LTM-134 - Results aggregation

Pull Request - State: closed - Opened by dcasbol 8 months ago

#49 - Ltm 124 big and small

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago - 4 comments

#48 - Ltm 120 score normal

Pull Request - State: closed - Opened by FKGSOFTWARE 8 months ago - 1 comment

#47 - LTM-116 - Keep count of time and tokens wrt the agent

Pull Request - State: closed - Opened by dcasbol 8 months ago

#46 - LTM-131 - Benchmark intro

Pull Request - State: closed - Opened by dcasbol 8 months ago - 1 comment

#45 - LTM-95 - Simplify OpenAI models

Pull Request - State: closed - Opened by dcasbol 8 months ago

#44 - LTM-97 - Poor clarity on prospective memory tests

Pull Request - State: closed - Opened by dcasbol 8 months ago

#43 - Ltm 118: Spy meeting fixes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 8 months ago - 2 comments

#42 - LTM-104 - Repeated samples

Pull Request - State: closed - Opened by dcasbol 8 months ago

#41 - Add anchor IDs to detailed reports. Make pop-ups optional.

Pull Request - State: closed - Opened by dcasbol 9 months ago

#40 - LTM-105 - Irregular max scores

Pull Request - State: closed - Opened by dcasbol 9 months ago

#39 - Spy fix

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago - 1 comment

#38 - LTM-106 - Locations Directions evaluations

Pull Request - State: closed - Opened by dcasbol 9 months ago

#37 - LTM-107 - Resuming dynamic tests

Pull Request - State: closed - Opened by dcasbol 9 months ago - 1 comment

#36 - tidied duplicated LTMs in favour of the AgentWrapper

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#35 - LTM Agent 1 results on 10k Benchmark 2

Pull Request - State: closed - Opened by dcasbol 9 months ago

#34 - Benchmark 2

Pull Request - State: closed - Opened by dcasbol 9 months ago

#33 - Benchmark v2 - Readmes, test definitions, results and reports

Pull Request - State: closed - Opened by dcasbol 9 months ago

#32 - some fixes for claudes context trimming

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#31 - Trigger response eval fix

Pull Request - State: closed - Opened by dcasbol 9 months ago

#30 - revamped memgpt + interface

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#29 - Log fixes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#28 - ChapterBreak delivers all messages without yielding

Pull Request - State: closed - Opened by dcasbol 9 months ago

#27 - upload gpt-4 usages to newest version

Pull Request - State: closed - Opened by dcasbol 9 months ago

#26 - Chapterbreak guardrails

Pull Request - State: closed - Opened by dcasbol 9 months ago

#25 - Small changes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#24 - Claude opus endpoint

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#23 - Restaurant - percentage waits

Pull Request - State: closed - Opened by dcasbol 9 months ago

#22 - LTM-89 - Faulty GPT evaluations

Pull Request - State: closed - Opened by dcasbol 9 months ago

#21 - Spy Meeting

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#20 - More advanced scheduling

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#19 - LTM-82 - Restaurant task

Pull Request - State: closed - Opened by dcasbol 9 months ago

#18 - Dev

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#17 - Update runner readme

Pull Request - State: closed - Opened by JosephDavidsonKSWH 9 months ago

#16 - Reduction of code duplication - LTM agents

Pull Request - State: closed - Opened by goodai-jose-solorzano 10 months ago

#15 - Draft: Stateful agents

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#14 - Adapt and include ChapterBreak

Pull Request - State: closed - Opened by dcasbol 10 months ago

#13 - LTMAgentWrapper logging + Agent for length bias testing

Pull Request - State: closed - Opened by goodai-jose-solorzano 10 months ago - 2 comments

#12 - Resuming tests

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#11 - Send reset messages last in the script

Pull Request - State: closed - Opened by dcasbol 10 months ago

#10 - removed quickbenchmark defs

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#9 - Groupless tests

Pull Request - State: closed - Opened by dcasbol 10 months ago

#8 - prospective memory prompt and gpt changes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#7 - Prospective memory prompt and gpt changes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#6 - Added LTMAgent implementation from GoodAI-LTM

Pull Request - State: closed - Opened by goodai-jose-solorzano 10 months ago - 2 comments

#5 - added templates, update readme

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#4 - Continuous conversation testing method

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago - 3 comments

#3 - Minimum of a single piece of triva for a distraction segment

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago - 3 comments

#2 - Full logs fixes

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

#1 - Adjusted generation prompt for instruction recall

Pull Request - State: closed - Opened by JosephDavidsonKSWH 10 months ago

GitHub / GoodAI/goodai-ltm-benchmark issues and pull requests