Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / modelscope/data-juicer issues and pull requests
#580 - 搭建好环境后执行python tools/process_data.py --config configs/demo/process.yaml 命令报错
Issue -
State: open - Opened by ctgushiwei 1 day ago
#579 - 一点小问题改进
Issue -
State: open - Opened by 976311200 2 days ago
#578 - process_data.py pre-start is too slow 数据处理脚本启动过慢
Issue -
State: open - Opened by hhhhsc701 2 days ago
Labels: question
#577 - datajuicer是否可以理解成给Ray data提供了多模态数据处理的能力?
Issue -
State: open - Opened by nihaoqingtuan 4 days ago
Labels: question
#576 - Installation progress could be optimzed. (Cmake error during installation)
Issue -
State: open - Opened by zhenqincn 6 days ago
Labels: enhancement, environment
#575 - [Bug]: HumanVbench test error: ERROR opening: HumanVBench/Emotion_Intensity_Compare/Emotion_Intensity_Compare_1.mp4, No such file or directory
Issue -
State: closed - Opened by Reneea1 8 days ago
- 2 comments
Labels: bug
#574 - 以ray模式启动,当内存不足的时候,会溢写到磁盘吗?
Issue -
State: open - Opened by javapythonphp 9 days ago
Labels: question
#573 - DAAR文章里面图1的ef小标题是不是写错了
Issue -
State: closed - Opened by xiafeng-nb 9 days ago
- 3 comments
Labels: question
#572 - Fix typos
Pull Request -
State: closed - Opened by co63oc 10 days ago
#571 - Fix typos
Pull Request -
State: closed - Opened by co63oc 12 days ago
Labels: documentation
#570 - Optimization for sdxl_prompt2prompt_mapper dependency importing
Pull Request -
State: closed - Opened by HYLcool 13 days ago
Labels: enhancement, environment
#569 - Update sdxl_prompt2prompt_mapper.py
Pull Request -
State: open - Opened by xiaokun-hadoop 13 days ago
- 2 comments
#568 - Optimize dedup to avoid oom
Pull Request -
State: open - Opened by coolderli 13 days ago
Labels: enhancement, good first issue, dj:dist, dj:efficiency, dj:tools
#567 - Update sdxl_prompt2prompt_mapper.py
Pull Request -
State: closed - Opened by xiaokun-hadoop 14 days ago
#566 - update the 2.0 paper link & the DaaR news
Pull Request -
State: closed - Opened by yxdyc 14 days ago
Labels: documentation, dj:cookbook, dj:post-tuning
#566 - update the 2.0 paper link & the DaaR news
Pull Request -
State: closed - Opened by yxdyc 14 days ago
Labels: documentation, dj:cookbook, dj:post-tuning
#565 - Language support
Issue -
State: closed - Opened by ken-arf 16 days ago
- 1 comment
Labels: question
#564 - [Bug]: Test failed with no language_id_score_filter
Issue -
State: closed - Opened by monsieurzhang 22 days ago
- 1 comment
Labels: bug
#563 - [Typo]correct a small typo
Pull Request -
State: closed - Opened by liuyuhanalex 25 days ago
#562 - fix translation error
Pull Request -
State: closed - Opened by yxdyc 29 days ago
Labels: documentation
#561 - Refactor and improve doc for RecipeGallery, DeveloperGuide, DistributedProcess and DJ-related Competitions
Pull Request -
State: closed - Opened by yxdyc 29 days ago
Labels: documentation, enhancement, dj:cookbook
#560 - process过程有算子会导致卡死
Issue -
State: open - Opened by SkyAndFly 29 days ago
- 2 comments
Labels: question
#559 - Resolve most skipped unittests
Pull Request -
State: closed - Opened by HYLcool 29 days ago
Labels: bug, enhancement, dj:ci/cd, environment
#558 - 数据分类器有具体的下载链接吗
Issue -
State: open - Opened by obj12 30 days ago
- 2 comments
Labels: question
#557 - fix export error when export_stats columns is null
Pull Request -
State: closed - Opened by Cathy0908 about 1 month ago
Labels: bug, dj:core
#556 - How to do sentence_dedup
Issue -
State: open - Opened by ftgreat about 1 month ago
- 1 comment
Labels: enhancement
#555 - Update __init__.py for v1.1.0
Pull Request -
State: closed - Opened by BeachWang about 1 month ago
#554 - Update translator of OP doc building.
Pull Request -
State: closed - Opened by HYLcool about 1 month ago
Labels: documentation, dj:ci/cd
#553 - Add humanvbench operators
Pull Request -
State: open - Opened by SYSUzhouting about 1 month ago
Labels: good first issue, dj:multimodal, dj:op
#552 - optimize op doc for global textual search; correct beta into stable
Pull Request -
State: closed - Opened by yxdyc about 1 month ago
Labels: documentation, dj:op
#551 - humanvbench operators
Pull Request -
State: closed - Opened by SYSUzhouting about 1 month ago
- 1 comment
#550 - Add Img-Diff ops.
Pull Request -
State: closed - Opened by Qirui-jiao about 1 month ago
Labels: enhancement, dj:multimodal, dj:op
#549 - Resplit input dataset in ray mode
Pull Request -
State: closed - Opened by chenyushuo about 1 month ago
- 1 comment
#548 - When will version 2.0 be released
Issue -
State: open - Opened by javapythonphp about 1 month ago
- 1 comment
Labels: question
#547 - [Bug]: Fail to run ray_bts_minhash_deduplicator
Issue -
State: open - Opened by javapythonphp about 1 month ago
- 2 comments
Labels: bug
#546 - Hash configuration information for the dedup performance test of DataJuicer 2.0
Issue -
State: open - Opened by cist about 1 month ago
- 3 comments
Labels: question
#545 - Fix bug and add gif demo for role playing
Pull Request -
State: closed - Opened by BeachWang about 1 month ago
Labels: bug, documentation, dj:cookbook
#544 - Bug fixed: generating too short texts and no valid QA is extracted.
Pull Request -
State: closed - Opened by HYLcool about 1 month ago
Labels: bug, dj:op
#543 - update a quick cdn link for arch figure
Pull Request -
State: closed - Opened by yxdyc about 1 month ago
Labels: documentation
#542 - update homepage and docs for DJ2.0 and DJ-Cookbook
Pull Request -
State: closed - Opened by yxdyc about 1 month ago
Labels: documentation
#541 - limit the generated qa num for each text in generate_qa_from_text_mapper
Pull Request -
State: closed - Opened by BeachWang about 1 month ago
Labels: enhancement, dj:op
#540 - Add unittest for ray text dedup
Pull Request -
State: closed - Opened by chenyushuo about 1 month ago
#539 - [Bug]: ds.JSONDatasource
Issue -
State: open - Opened by ariexBear about 1 month ago
- 4 comments
Labels: bug
#538 - fix missing field meta tag on ray mode
Pull Request -
State: closed - Opened by Cathy0908 about 1 month ago
Labels: bug
#537 - [WIP] refactor of dataset builder and executor
Pull Request -
State: open - Opened by cyruszhang about 1 month ago
Labels: enhancement, dj:dataset, dj:core
#536 - fix save_ckpt bug
Pull Request -
State: closed - Opened by HYLcool about 1 month ago
Labels: bug, dj:core
#535 - Support others LLMs & APIs for the OP `generate_qa_from_text_mapper`
Issue -
State: open - Opened by yxdyc about 1 month ago
Labels: enhancement, dj:op
#534 - log summarization
Pull Request -
State: closed - Opened by HYLcool about 1 month ago
- 2 comments
Labels: enhancement
#533 - [BUG]: inappropriate arguments for `map_batches` in ray mode
Issue -
State: open - Opened by HYLcool about 1 month ago
Labels: bug, dj:dist
#532 - [Hot Fix] Update Ray version
Pull Request -
State: closed - Opened by pan-x-c about 1 month ago
Labels: environment
#531 - [Bug]: 开启checkpoint,当配置的一个pipline执行到最后一个算子时,np>left samples,在开启checkpoint会报错。
Issue -
State: closed - Opened by HunterLG about 1 month ago
- 2 comments
Labels: bug, dj:core
#530 - Remove sandbox requirements installation from Dockerfile
Pull Request -
State: closed - Opened by HYLcool about 1 month ago
Labels: dj:ci/cd, environment
#529 - fix force download bug
Pull Request -
State: closed - Opened by BeachWang about 1 month ago
Labels: bug, dj:core
#528 - Refine/llm api op unittest
Pull Request -
State: closed - Opened by BeachWang about 2 months ago
Labels: enhancement, dj:core
#527 - [Feature] Auto generation for OP docs
Pull Request -
State: closed - Opened by HYLcool about 2 months ago
Labels: documentation, enhancement, dj:ci/cd
#526 - Add Actors for Ray Dedup.
Pull Request -
State: closed - Opened by chenyushuo about 2 months ago
Labels: dj:op, dj:dist
#525 - 建议搞一个微信群,钉钉群,默认的钉钉群二维码已失效
Issue -
State: closed - Opened by baiyi-os about 2 months ago
- 1 comment
Labels: enhancement
#524 - 是否可以修改依赖中的transformers版本,怀疑下面报错为依赖问题
Issue -
State: closed - Opened by baiyi-os about 2 months ago
- 4 comments
Labels: question, stale-issue, environment
#523 - docs for distributed processing
Pull Request -
State: closed - Opened by HYLcool about 2 months ago
- 2 comments
Labels: documentation, dj:dist
#522 - Error in running distributed task on ray cluster
Issue -
State: closed - Opened by awangzy about 2 months ago
- 3 comments
Labels: question
#521 - Fix operators doc link for aggregators
Pull Request -
State: closed - Opened by jackylee-ch about 2 months ago
Labels: documentation
#520 - ModuleNotFoundError of cmake and fail to build wheel for samplerate
Issue -
State: closed - Opened by BeachWang about 2 months ago
#519 - Undefined symbol when running video_captioning_from_summarizer_mapper
Issue -
State: closed - Opened by BeachWang about 2 months ago
#518 - Dev/manage meta
Pull Request -
State: closed - Opened by BeachWang about 2 months ago
Labels: enhancement, dj:dataset, dj:core
#517 - fix bug in generate_qa_from_example_mapper
Pull Request -
State: closed - Opened by BeachWang about 2 months ago
Labels: bug, dj:op
#516 - [Feat] OP-wise Insight Mining
Pull Request -
State: closed - Opened by HYLcool 2 months ago
Labels: enhancement, dj:core
#515 - DJ Ray mode supports streaming loading of `jsonl` files
Pull Request -
State: closed - Opened by pan-x-c 2 months ago
Labels: dj:dataset, dj:efficiency
#514 - Format conversion tools for post tuning datasets
Pull Request -
State: closed - Opened by HYLcool 2 months ago
Labels: documentation, enhancement, dj:dataset, dj:tools
#513 - 10 more post-tuning OPs, regarding dialog data analysis from multiple aspects
Pull Request -
State: closed - Opened by BeachWang 2 months ago
Labels: documentation, enhancement, dj:op, dj:post-tuning
#512 - [Feature] add auto mode for analyzer
Pull Request -
State: closed - Opened by HYLcool 2 months ago
Labels: enhancement, dj:core
#511 - support ray actor
Pull Request -
State: closed - Opened by Cathy0908 2 months ago
Labels: dj:dist, dj:efficiency
#510 - Simplifying Open Source Contributions Through Operator Tiering from Dev aspect
Issue -
State: closed - Opened by yxdyc 2 months ago
- 1 comment
Labels: enhancement, good first issue, dj:op
#509 - How to use Data-Juicer to process Chinese documents
Issue -
State: closed - Opened by aruig666 2 months ago
- 4 comments
Labels: question, stale-issue
#508 - install by recipe
Pull Request -
State: closed - Opened by BeachWang 2 months ago
- 1 comment
Labels: enhancement
#507 - add op video_extract_frames_mapper
Pull Request -
State: closed - Opened by Cathy0908 2 months ago
#506 - Patch for Perf Bench
Pull Request -
State: closed - Opened by HYLcool 3 months ago
Labels: enhancement
#506 - Patch for Perf Bench
Pull Request -
State: closed - Opened by HYLcool 3 months ago
Labels: enhancement
#505 - Registe all other formatters
Pull Request -
State: closed - Opened by jackylee-ch 3 months ago
- 2 comments
Labels: invalid
#505 - Registe all other formatters
Pull Request -
State: closed - Opened by jackylee-ch 3 months ago
- 2 comments
Labels: invalid
#504 - fix batch bug
Pull Request -
State: closed - Opened by BeachWang 3 months ago
Labels: bug
#503 - Quick fix for some minor problems
Pull Request -
State: closed - Opened by HYLcool 3 months ago
Labels: bug, dj:multimodal
#502 - Add minhash deduplicator based on RAY.
Pull Request -
State: closed - Opened by chenyushuo 3 months ago
Labels: dj:op, dj:dist, dj:efficiency
#500 - add grouper and aggregator op for system_prompt
Pull Request -
State: open - Opened by BeachWang 3 months ago
Labels: agent
#500 - add grouper and aggregator op for system_prompt
Pull Request -
State: closed - Opened by BeachWang 3 months ago
Labels: dj:op, agent, dj:cookbook
#499 - Can the cleaning statistics be viewed after creating the config file and performing the cleaning?
Issue -
State: open - Opened by Tendo33 3 months ago
Labels: question
#499 - Can the cleaning statistics be viewed after creating the config file and performing the cleaning?
Issue -
State: open - Opened by Tendo33 3 months ago
Labels: question
#498 - 一个文件夹下放多个pdf,为什么在flagged_words_filter流程里只能处理number-process个文件?
Issue -
State: open - Opened by mkzzz 3 months ago
Labels: bug
#498 - 一个文件夹下放多个pdf,为什么在flagged_words_filter流程里只能处理number-process个文件?
Issue -
State: closed - Opened by mkzzz 3 months ago
Labels: bug
#497 - generate_qa_from_examples_mapper Error How to solve it
Issue -
State: open - Opened by zdbss1990 3 months ago
- 1 comment
Labels: bug
#497 - generate_qa_from_examples_mapper Error How to solve it
Issue -
State: closed - Opened by zdbss1990 3 months ago
- 2 comments
Labels: bug, stale-issue
#496 - Guidance on Monitoring Task Execution with Ray Executor in Data Juicer
Issue -
State: open - Opened by Fatima-0SA 3 months ago
Labels: question, dj:dist
#496 - Guidance on Monitoring Task Execution with Ray Executor in Data Juicer
Issue -
State: open - Opened by Fatima-0SA 3 months ago
Labels: question, dj:dist
#495 - AttributeError: 'FusedFilter' object has no attribute '_name'
Issue -
State: open - Opened by xunmenglt 3 months ago
- 1 comment
Labels: bug, dj:op
#495 - AttributeError: 'FusedFilter' object has no attribute '_name'
Issue -
State: closed - Opened by xunmenglt 3 months ago
- 2 comments
Labels: bug, stale-issue, dj:op
#494 - Auto docker image building on release
Pull Request -
State: closed - Opened by HYLcool 3 months ago
Labels: enhancement, priority:high
#493 - add python_file_mapper
Pull Request -
State: closed - Opened by drcege 3 months ago
#492 - add python_lambda_mapper
Pull Request -
State: closed - Opened by drcege 3 months ago
- 1 comment
#491 - Add DPO data OP
Pull Request -
State: open - Opened by drcege 3 months ago
#490 - Merge local and API LLM calling
Issue -
State: closed - Opened by BeachWang 3 months ago
Labels: enhancement
#489 - Add minhash deduplicator based on RAY and Redis
Pull Request -
State: open - Opened by pan-x-c 3 months ago
Labels: dj:op, dj:dist, dj:efficiency