Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / thudm/longbench issues and pull requests
#104 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 131072 tokens. However, you requested 351430 tokens. Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400
Issue -
State: open - Opened by guanzy2012 5 days ago
#102 - Error code: 404 - {'object': 'error', 'message': 'The model `/dev/shm/glm-4-9b-chat/` does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}
Issue -
State: open - Opened by guanzy2012 5 days ago
#101 - Any Implementation of new models like Meta-Llama-3.1-8B , Qwen2.5-7B?
Issue -
State: open - Opened by WeiweiZhang1 6 days ago
#100 - Why are empty responses ignored in LongBench v2?
Issue -
State: open - Opened by ZeonfaiHo 8 days ago
#99 - How to evaluate Llama-3.1-8B-Instruct model on LongBench v1 dataset with A100 80GB GPU? Encountering out-of-memory issues.
Issue -
State: closed - Opened by xlim1996 10 days ago
#98 - A100上跑Llama-3.1-8B-Instruct out of memory
Issue -
State: closed - Opened by xlim1996 10 days ago
#97 - docs: update task.md
Pull Request -
State: open - Opened by eltociear 13 days ago
#96 - [LongBench] How was Edit Sim for code tasks calculated?
Issue -
State: open - Opened by cornzz 14 days ago
- 1 comment
#95 - 如何使vllm下测评结果更接近hf?
Issue -
State: open - Opened by miaoyuxun 16 days ago
- 2 comments
#94 - 关于Llama-3.1-8B-Instruct在Longbench v2 测试结果和排行榜有出入的问题
Issue -
State: open - Opened by chaochen99 21 days ago
- 3 comments
#93 - 针对 Paper中 Long数据集的评测配置
Issue -
State: open - Opened by MaiziXiao 21 days ago
- 3 comments
#92 - 关于Claude和gemini的token处理
Issue -
State: open - Opened by Violettttee 22 days ago
- 1 comment
#91 - LongBench v2 Evaluation regarding Qwen2.5
Issue -
State: closed - Opened by guanzhchen 27 days ago
- 5 comments
Labels: enhancement
#90 - 关于cot
Issue -
State: open - Opened by Violettttee 28 days ago
- 7 comments
#89 - DeepSeek V3 Result
Issue -
State: open - Opened by zmwang03 28 days ago
- 1 comment
Labels: enhancement
#88 - results.py bug
Issue -
State: open - Opened by cizhenshi about 1 month ago
- 1 comment
#87 - LongBench v2
Pull Request -
State: closed - Opened by bys0318 about 1 month ago
#86 - 安装完longbench后,运行报错
Issue -
State: open - Opened by fxnie about 1 month ago
#85 - Hf eval
Pull Request -
State: closed - Opened by Wangmerlyn about 2 months ago
#84 - Request for Mixtral-8*7B model
Issue -
State: open - Opened by Aaronhuang-778 2 months ago
#83 - 请问用GPT-3.5-turbo-16K测试一遍完整的数据集需要花费多少?
Issue -
State: open - Opened by HelloEveryonehh 3 months ago
- 1 comment
#82 - Evaluation mechanism update
Issue -
State: open - Opened by cizhenshi 3 months ago
- 1 comment
#81 - 请问文章中图2 different truncation对应的评估代码在哪里呢?
Issue -
State: closed - Opened by HelloEveryonehh 3 months ago
- 2 comments
#80 - Add HF Evalute Utils
Pull Request -
State: open - Opened by yongchanghao 4 months ago
#79 - 长文关键信息位置
Issue -
State: closed - Opened by ruoyuxie 4 months ago
- 2 comments
#78 - Long context dataset
Issue -
State: closed - Opened by nzw0301 4 months ago
#77 - Why there is no need special token for chatglm3 when counting the tokens?
Issue -
State: closed - Opened by condy0919 5 months ago
- 2 comments
#76 - Fix Grammar Error in NarrativeQA Prompt
Pull Request -
State: open - Opened by sidjha1 5 months ago
#75 - inference with kv cache
Pull Request -
State: closed - Opened by mohammadh-cerebras 5 months ago
#74 - 可以测试基于OpenAI接口的模型管理框架吗,比如ollama, xinference
Issue -
State: open - Opened by jiusi9 5 months ago
- 1 comment
#73 - No initialization for the process group
Issue -
State: open - Opened by Mugariya 5 months ago
- 1 comment
#72 - Some questions on the processed dataset in LongBench
Issue -
State: closed - Opened by jiqimaoke 5 months ago
- 1 comment
#71 - How to evaluate on llama3-8b-instruct?
Issue -
State: open - Opened by txchen-USTC 5 months ago
- 1 comment
Labels: enhancement
#70 - 关于提升数据集测试有效性的建议
Issue -
State: open - Opened by wsn555 6 months ago
- 7 comments
#69 - Code for evaluation with GPT-3.5?
Issue -
State: open - Opened by RuskinManku 6 months ago
- 3 comments
#68 - Load dataset from hf failed
Issue -
State: open - Opened by murphypei 6 months ago
- 4 comments
#67 - The "anwser" for some examples in "qasper.jsonl" is strange
Issue -
State: open - Opened by Zcchill 7 months ago
- 6 comments
#66 - Llama2-7B-chat-4k测试出来结果不一样
Issue -
State: closed - Opened by PengWenChen 8 months ago
- 2 comments
#65 - Loading local datasets with split=‘test’
Issue -
State: open - Opened by yichen0104 8 months ago
- 1 comment
#64 - Chinese Examples in MultiFieldQA-en
Issue -
State: open - Opened by wendywangwwt 9 months ago
- 1 comment
#63 - 请问数据集中 avg length 是单词长度/字长度还是token个数?
Issue -
State: closed - Opened by deepindeed2022 9 months ago
- 1 comment
#62 - Table reproduce
Issue -
State: closed - Opened by hzw20200301 9 months ago
#61 - `Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`
Issue -
State: open - Opened by fuqichen1998 10 months ago
- 5 comments
#60 - Include data on which passage contains answer
Issue -
State: open - Opened by danielmisrael 11 months ago
- 1 comment
Labels: enhancement
#59 - chatglm3-6b-32k的中文测试结果远远低于README里的benchmark
Issue -
State: closed - Opened by Strivin0311 11 months ago
- 5 comments
#58 - RuntimeError when running pred.py for Vicuna-v1.5-7B-16k
Issue -
State: closed - Opened by fuqichen1998 11 months ago
- 2 comments
#57 - 求问 Spearman correlation 是怎么计算的
Issue -
State: open - Opened by randomtutu 11 months ago
- 1 comment
#56 - CUDA error??????
Issue -
State: closed - Opened by xvolcano02 11 months ago
- 2 comments
#55 - Llama2-7B-chat-4k测试出来结果不一样
Issue -
State: closed - Opened by slatter666 11 months ago
- 3 comments
#54 - Any Implementation of Mistral-7B?
Issue -
State: open - Opened by leeyeehoo 11 months ago
- 1 comment
#53 - AttributeError: 'str' object has no attribute 'to'
Issue -
State: closed - Opened by vincent507cpu 11 months ago
- 1 comment
#52 - 报错TypeError: Couldn't cast array of type list<item: string> to null
Issue -
State: open - Opened by xxcoco763 11 months ago
- 1 comment
#51 - Update retrieval/
Pull Request -
State: closed - Opened by FaustLyu 12 months ago
#50 - Disable grad to avoid OOM
Pull Request -
State: closed - Opened by acherstyx 12 months ago
#49 - 测试13b,比如百川,1*A100(80G)会OOM
Issue -
State: open - Opened by lvjianxin about 1 year ago
#48 - Evaluate on long context (32k,64k etc..) on 30B/70B large models
Issue -
State: open - Opened by CaesarWWK about 1 year ago
- 5 comments
#47 - 如何评测GPT-3.5或GPT-4
Issue -
State: closed - Opened by jing-my about 1 year ago
- 3 comments
#46 - 长度外推的三种方式得到的answer竟一模一样?
Issue -
State: closed - Opened by IT-five about 1 year ago
#44 - 单卡A100无法推理
Issue -
State: closed - Opened by Huwei-deeplearning about 1 year ago
- 3 comments
#43 - 单张A100 40G 无法运行(OOM) llama2-7b-chat-4k,但是可以运行 chatglm2-6b-32k
Issue -
State: closed - Opened by fishiu about 1 year ago
- 4 comments
#42 - how to apply to baichuan?
Issue -
State: closed - Opened by IT-five about 1 year ago
- 1 comment
#41 - 关于评测的合理性
Issue -
State: closed - Opened by rayleoyoung about 1 year ago
- 2 comments
#40 - Kimi-Chat 测试
Issue -
State: closed - Opened by kunpeng199494 about 1 year ago
- 1 comment
#39 - Update support chatglm3
Pull Request -
State: closed - Opened by JackKuo666 about 1 year ago
- 1 comment
#38 - 关于被测试的模型
Issue -
State: closed - Opened by pengcheng-yan about 1 year ago
- 2 comments
#37 - 使用chatglm3-6b-32k 无法复现repo dureader的结果
Issue -
State: closed - Opened by siqi13579 about 1 year ago
- 4 comments
#36 - classification_score计算得分代码有误
Issue -
State: closed - Opened by zhangleiedu about 1 year ago
- 1 comment
#35 - pred.py中的typo
Issue -
State: closed - Opened by ignorejjj about 1 year ago
- 1 comment
#35 - pred.py中的typo
Issue -
State: closed - Opened by ignorejjj about 1 year ago
- 1 comment
#34 - Add support for Ollama, Palm, Claude-2, Cohere, Replicate, Llama2 CodeLlama (100+LLMs) [LiteLLM]
Pull Request -
State: closed - Opened by ishaan-jaff about 1 year ago
- 2 comments
#33 - Add dataset file(retrieval)
Pull Request -
State: closed - Opened by FaustLyu about 1 year ago
#32 - KeyError: 'retrieved'
Issue -
State: closed - Opened by liujingcs about 1 year ago
- 3 comments
#31 - chatglm3这个效果说没有在微调的时候灌数据我是不信的→_→
Issue -
State: closed - Opened by hxs91 about 1 year ago
#30 - Is it necessary to add build_prompt to the tokenizer of chatglm3-6b-32k in pred.py?
Issue -
State: closed - Opened by MrYxJ about 1 year ago
- 3 comments
#29 - How is the data length distribution computed for LongBench-E?
Issue -
State: closed - Opened by es94129 about 1 year ago
- 2 comments
#28 - 关于TREC数据集中的typo
Issue -
State: open - Opened by QuanYuhan over 1 year ago
- 1 comment
#27 - No repeat_kv in llama_flash_attn_monkey_patch.py ?
Issue -
State: closed - Opened by Orion-Zheng over 1 year ago
- 3 comments
#26 - llama2 chat's template
Issue -
State: closed - Opened by Arist12 over 1 year ago
- 5 comments
#25 - directly cutting from the middle seems unfair
Issue -
State: closed - Opened by Arist12 over 1 year ago
- 4 comments
#24 - LongBench-E和LongBench有什么区别?
Issue -
State: closed - Opened by youngallien over 1 year ago
- 2 comments
#23 - Error with multi-gpus
Issue -
State: closed - Opened by Xnhyacinth over 1 year ago
- 3 comments
#22 - LongbenchE & Longbench
Issue -
State: closed - Opened by yaoguany over 1 year ago
- 1 comment
#21 - Max length not guarenteed
Issue -
State: closed - Opened by fuvty over 1 year ago
- 2 comments
#20 - Cannot load LongBench-E
Issue -
State: closed - Opened by airaria over 1 year ago
- 1 comment
#19 - About the calculation of the Avg score?
Issue -
State: closed - Opened by WSPeng over 1 year ago
- 3 comments
#18 - why use greedy decoding ?
Issue -
State: closed - Opened by WSPeng over 1 year ago
- 1 comment
#17 - 超长文本的推理OOM
Issue -
State: closed - Opened by freshbirdDD over 1 year ago
- 11 comments
#16 - 评测是否能用于Base模型?
Issue -
State: closed - Opened by forceshorty over 1 year ago
- 1 comment
#15 - Questions on dataset answers.
Issue -
State: closed - Opened by chengeharrison over 1 year ago
- 3 comments
#14 - Add visualization
Pull Request -
State: closed - Opened by McJackTang over 1 year ago
#13 - 好奇为什么多文档QA(中文)的部分又使用的rouge-L,而不是保持跟英文一致的F1?
Issue -
State: closed - Opened by skykiseki over 1 year ago
- 2 comments
#12 - 关于截断长度的公平性问题
Issue -
State: closed - Opened by bojone over 1 year ago
- 2 comments
Labels: question
#11 - About prompt
Issue -
State: closed - Opened by ftgreat over 1 year ago
- 1 comment
#10 - add support for other models
Issue -
State: closed - Opened by yaoguany over 1 year ago
- 3 comments
Labels: enhancement
#9 - requreiments.txt
Issue -
State: closed - Opened by yaoguany over 1 year ago
- 1 comment
#8 - 评测模型有在对应数据集上微调过吗?
Issue -
State: closed - Opened by LiuShixing over 1 year ago
- 1 comment
#7 - 中文输出编码问题
Issue -
State: closed - Opened by wen2cheng over 1 year ago
- 4 comments
#6 - pred.py 代码有问题,for json_obj in tqdm(data[:10]) 需要删除[:10]
Issue -
State: closed - Opened by nkfnn over 1 year ago
- 1 comment