Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / shjwudp/c4-dataset-script issues and pull requests

#10 - feat: add Chinese duplicated text removal strategy

Pull Request - State: closed - Opened by shjwudp over 1 year ago

#9 - fix: remove invalid script path in setup.py

Pull Request - State: closed - Opened by shjwudp almost 2 years ago

#8 - feat: add Repetition Removal in Chinese data processing pipeline

Pull Request - State: closed - Opened by shjwudp about 2 years ago

#7 - feat: add "Chinese c4" script, add docs

Pull Request - State: closed - Opened by shjwudp about 2 years ago

#6 - fix: Fix tensorflow-datasets dependent & update README

Pull Request - State: closed - Opened by shjwudp about 2 years ago

#5 - perf: Replace reduceByGroup to improve performance

Pull Request - State: closed - Opened by shjwudp over 2 years ago

#3 - feat: support python package & introduce "how to submit cluster" to readme

Pull Request - State: closed - Opened by shjwudp over 2 years ago

#2 - feat: support badwords filter

Pull Request - State: closed - Opened by shjwudp over 2 years ago

#1 - Create LICENSE

Pull Request - State: closed - Opened by shjwudp over 2 years ago