thudm/longbench issues and pull requests

#104 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 131072 tokens. However, you requested 351430 tokens. Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400

Issue - State: open - Opened by guanzy2012 5 days ago

#103 - Token indices sequence length is longer than the specified maximum sequence length for this model (1113927 > 128000). Running this sequence through the model will result in indexing errors

Issue - State: open - Opened by guanzy2012 5 days ago

#102 - Error code: 404 - {'object': 'error', 'message': 'The model `/dev/shm/glm-4-9b-chat/` does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}

Issue - State: open - Opened by guanzy2012 5 days ago

#101 - Any Implementation of new models like Meta-Llama-3.1-8B , Qwen2.5-7B?

Issue - State: open - Opened by WeiweiZhang1 6 days ago

#100 - Why are empty responses ignored in LongBench v2?

Issue - State: open - Opened by ZeonfaiHo 8 days ago

#99 - How to evaluate Llama-3.1-8B-Instruct model on LongBench v1 dataset with A100 80GB GPU? Encountering out-of-memory issues.

Issue - State: closed - Opened by xlim1996 10 days ago

#98 - A100上跑Llama-3.1-8B-Instruct out of memory

Issue - State: closed - Opened by xlim1996 10 days ago

#97 - docs: update task.md

Pull Request - State: open - Opened by eltociear 13 days ago

#96 - [LongBench] How was Edit Sim for code tasks calculated?

Issue - State: open - Opened by cornzz 14 days ago - 1 comment

#95 - 如何使vllm下测评结果更接近hf？

Issue - State: open - Opened by miaoyuxun 16 days ago - 2 comments

#94 - 关于Llama-3.1-8B-Instruct在Longbench v2 测试结果和排行榜有出入的问题

Issue - State: open - Opened by chaochen99 21 days ago - 3 comments

#93 - 针对 Paper中 Long数据集的评测配置

Issue - State: open - Opened by MaiziXiao 21 days ago - 3 comments

#92 - 关于Claude和gemini的token处理

Issue - State: open - Opened by Violettttee 22 days ago - 1 comment

#91 - LongBench v2 Evaluation regarding Qwen2.5

Issue - State: closed - Opened by guanzhchen 27 days ago - 5 comments
Labels: enhancement

#90 - 关于cot

Issue - State: open - Opened by Violettttee 28 days ago - 7 comments

#89 - DeepSeek V3 Result

Issue - State: open - Opened by zmwang03 28 days ago - 1 comment
Labels: enhancement

#88 - results.py bug

Issue - State: open - Opened by cizhenshi about 1 month ago - 1 comment

#87 - LongBench v2

Pull Request - State: closed - Opened by bys0318 about 1 month ago

#86 - 安装完longbench后，运行报错

Issue - State: open - Opened by fxnie about 1 month ago

#85 - Hf eval

Pull Request - State: closed - Opened by Wangmerlyn about 2 months ago

#84 - Request for Mixtral-8*7B model

Issue - State: open - Opened by Aaronhuang-778 2 months ago

#83 - 请问用GPT-3.5-turbo-16K测试一遍完整的数据集需要花费多少？

Issue - State: open - Opened by HelloEveryonehh 3 months ago - 1 comment

#82 - Evaluation mechanism update

Issue - State: open - Opened by cizhenshi 3 months ago - 1 comment

#81 - 请问文章中图2 different truncation对应的评估代码在哪里呢？

Issue - State: closed - Opened by HelloEveryonehh 3 months ago - 2 comments

#80 - Add HF Evalute Utils

Pull Request - State: open - Opened by yongchanghao 4 months ago

#79 - 长文关键信息位置

Issue - State: closed - Opened by ruoyuxie 4 months ago - 2 comments

#78 - Long context dataset

Issue - State: closed - Opened by nzw0301 4 months ago

#77 - Why there is no need special token for chatglm3 when counting the tokens?

Issue - State: closed - Opened by condy0919 5 months ago - 2 comments

#76 - Fix Grammar Error in NarrativeQA Prompt

Pull Request - State: open - Opened by sidjha1 5 months ago

#75 - inference with kv cache

Pull Request - State: closed - Opened by mohammadh-cerebras 5 months ago

#74 - 可以测试基于OpenAI接口的模型管理框架吗，比如ollama, xinference

Issue - State: open - Opened by jiusi9 5 months ago - 1 comment

#73 - No initialization for the process group

Issue - State: open - Opened by Mugariya 5 months ago - 1 comment

#72 - Some questions on the processed dataset in LongBench

Issue - State: closed - Opened by jiqimaoke 5 months ago - 1 comment

#71 - How to evaluate on llama3-8b-instruct?

Issue - State: open - Opened by txchen-USTC 5 months ago - 1 comment
Labels: enhancement

#70 - 关于提升数据集测试有效性的建议

Issue - State: open - Opened by wsn555 6 months ago - 7 comments

#69 - Code for evaluation with GPT-3.5?

Issue - State: open - Opened by RuskinManku 6 months ago - 3 comments

#68 - Load dataset from hf failed

Issue - State: open - Opened by murphypei 6 months ago - 4 comments

#67 - The "anwser" for some examples in "qasper.jsonl" is strange

Issue - State: open - Opened by Zcchill 7 months ago - 6 comments

#66 - Llama2-7B-chat-4k测试出来结果不一样

Issue - State: closed - Opened by PengWenChen 8 months ago - 2 comments

#65 - Loading local datasets with split=‘test’

Issue - State: open - Opened by yichen0104 8 months ago - 1 comment

#64 - Chinese Examples in MultiFieldQA-en

Issue - State: open - Opened by wendywangwwt 9 months ago - 1 comment

#63 - 请问数据集中 avg length 是单词长度/字长度还是token个数？

Issue - State: closed - Opened by deepindeed2022 9 months ago - 1 comment

#62 - Table reproduce

Issue - State: closed - Opened by hzw20200301 9 months ago

#61 - `Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

Issue - State: open - Opened by fuqichen1998 10 months ago - 5 comments

#60 - Include data on which passage contains answer

Issue - State: open - Opened by danielmisrael 11 months ago - 1 comment
Labels: enhancement

#59 - chatglm3-6b-32k的中文测试结果远远低于README里的benchmark

Issue - State: closed - Opened by Strivin0311 11 months ago - 5 comments

#58 - RuntimeError when running pred.py for Vicuna-v1.5-7B-16k

Issue - State: closed - Opened by fuqichen1998 11 months ago - 2 comments

#57 - 求问 Spearman correlation 是怎么计算的

Issue - State: open - Opened by randomtutu 11 months ago - 1 comment

#56 - CUDA error??????

Issue - State: closed - Opened by xvolcano02 11 months ago - 2 comments

#55 - Llama2-7B-chat-4k测试出来结果不一样

Issue - State: closed - Opened by slatter666 11 months ago - 3 comments

#54 - Any Implementation of Mistral-7B?

Issue - State: open - Opened by leeyeehoo 11 months ago - 1 comment

#53 - AttributeError: 'str' object has no attribute 'to'

Issue - State: closed - Opened by vincent507cpu 11 months ago - 1 comment

#52 - 报错TypeError: Couldn't cast array of type list<item: string> to null

Issue - State: open - Opened by xxcoco763 11 months ago - 1 comment

#51 - Update retrieval/

Pull Request - State: closed - Opened by FaustLyu 12 months ago

#50 - Disable grad to avoid OOM

Pull Request - State: closed - Opened by acherstyx 12 months ago

#49 - 测试13b，比如百川，1*A100（80G）会OOM

Issue - State: open - Opened by lvjianxin about 1 year ago

#48 - Evaluate on long context (32k,64k etc..) on 30B/70B large models

Issue - State: open - Opened by CaesarWWK about 1 year ago - 5 comments

#47 - 如何评测GPT-3.5或GPT-4

Issue - State: closed - Opened by jing-my about 1 year ago - 3 comments

#46 - 长度外推的三种方式得到的answer竟一模一样？

Issue - State: closed - Opened by IT-five about 1 year ago

#45 - OOM

Issue - State: closed - Opened by IT-five about 1 year ago - 3 comments

#44 - 单卡A100无法推理

Issue - State: closed - Opened by Huwei-deeplearning about 1 year ago - 3 comments

#43 - 单张A100 40G 无法运行（OOM） llama2-7b-chat-4k，但是可以运行 chatglm2-6b-32k

Issue - State: closed - Opened by fishiu about 1 year ago - 4 comments

#42 - how to apply to baichuan?

Issue - State: closed - Opened by IT-five about 1 year ago - 1 comment

#41 - 关于评测的合理性

Issue - State: closed - Opened by rayleoyoung about 1 year ago - 2 comments

#40 - Kimi-Chat 测试

Issue - State: closed - Opened by kunpeng199494 about 1 year ago - 1 comment

#39 - Update support chatglm3

Pull Request - State: closed - Opened by JackKuo666 about 1 year ago - 1 comment

#38 - 关于被测试的模型

Issue - State: closed - Opened by pengcheng-yan about 1 year ago - 2 comments

#37 - 使用chatglm3-6b-32k 无法复现repo dureader的结果

Issue - State: closed - Opened by siqi13579 about 1 year ago - 4 comments

#36 - classification_score计算得分代码有误

Issue - State: closed - Opened by zhangleiedu about 1 year ago - 1 comment

#35 - pred.py中的typo

Issue - State: closed - Opened by ignorejjj about 1 year ago - 1 comment

#35 - pred.py中的typo

Issue - State: closed - Opened by ignorejjj about 1 year ago - 1 comment

#34 - Add support for Ollama, Palm, Claude-2, Cohere, Replicate, Llama2 CodeLlama (100+LLMs) [LiteLLM]

Pull Request - State: closed - Opened by ishaan-jaff about 1 year ago - 2 comments

#33 - Add dataset file(retrieval)

Pull Request - State: closed - Opened by FaustLyu about 1 year ago

#32 - KeyError: 'retrieved'

Issue - State: closed - Opened by liujingcs about 1 year ago - 3 comments

#31 - chatglm3这个效果说没有在微调的时候灌数据我是不信的→_→

Issue - State: closed - Opened by hxs91 about 1 year ago

#30 - Is it necessary to add build_prompt to the tokenizer of chatglm3-6b-32k in pred.py?

Issue - State: closed - Opened by MrYxJ about 1 year ago - 3 comments

#29 - How is the data length distribution computed for LongBench-E?

Issue - State: closed - Opened by es94129 about 1 year ago - 2 comments

#28 - 关于TREC数据集中的typo

Issue - State: open - Opened by QuanYuhan over 1 year ago - 1 comment

#27 - No repeat_kv in llama_flash_attn_monkey_patch.py ?

Issue - State: closed - Opened by Orion-Zheng over 1 year ago - 3 comments

#26 - llama2 chat's template

Issue - State: closed - Opened by Arist12 over 1 year ago - 5 comments

#25 - directly cutting from the middle seems unfair

Issue - State: closed - Opened by Arist12 over 1 year ago - 4 comments

#24 - LongBench-E和LongBench有什么区别？

Issue - State: closed - Opened by youngallien over 1 year ago - 2 comments

#23 - Error with multi-gpus

Issue - State: closed - Opened by Xnhyacinth over 1 year ago - 3 comments

#22 - LongbenchE & Longbench

Issue - State: closed - Opened by yaoguany over 1 year ago - 1 comment

#21 - Max length not guarenteed

Issue - State: closed - Opened by fuvty over 1 year ago - 2 comments

#20 - Cannot load LongBench-E

Issue - State: closed - Opened by airaria over 1 year ago - 1 comment

#19 - About the calculation of the Avg score?

Issue - State: closed - Opened by WSPeng over 1 year ago - 3 comments

#18 - why use greedy decoding ?

Issue - State: closed - Opened by WSPeng over 1 year ago - 1 comment

#17 - 超长文本的推理OOM

Issue - State: closed - Opened by freshbirdDD over 1 year ago - 11 comments

#16 - 评测是否能用于Base模型?

Issue - State: closed - Opened by forceshorty over 1 year ago - 1 comment

#15 - Questions on dataset answers.

Issue - State: closed - Opened by chengeharrison over 1 year ago - 3 comments

#14 - Add visualization

Pull Request - State: closed - Opened by McJackTang over 1 year ago

#13 - 好奇为什么多文档QA（中文）的部分又使用的rouge-L，而不是保持跟英文一致的F1？

Issue - State: closed - Opened by skykiseki over 1 year ago - 2 comments

#12 - 关于截断长度的公平性问题

Issue - State: closed - Opened by bojone over 1 year ago - 2 comments
Labels: question

#11 - About prompt

Issue - State: closed - Opened by ftgreat over 1 year ago - 1 comment

#10 - add support for other models

Issue - State: closed - Opened by yaoguany over 1 year ago - 3 comments
Labels: enhancement

#9 - requreiments.txt

Issue - State: closed - Opened by yaoguany over 1 year ago - 1 comment

#8 - 评测模型有在对应数据集上微调过吗？

Issue - State: closed - Opened by LiuShixing over 1 year ago - 1 comment

#7 - 中文输出编码问题

Issue - State: closed - Opened by wen2cheng over 1 year ago - 4 comments

#6 - pred.py 代码有问题，for json_obj in tqdm(data[:10]) 需要删除[:10]

Issue - State: closed - Opened by nkfnn over 1 year ago - 1 comment

GitHub / thudm/longbench issues and pull requests