Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / shjwudp/c4-dataset-script issues and pull requests
#10 - feat: add Chinese duplicated text removal strategy
Pull Request -
State: closed - Opened by shjwudp over 1 year ago
#9 - fix: remove invalid script path in setup.py
Pull Request -
State: closed - Opened by shjwudp almost 2 years ago
#8 - feat: add Repetition Removal in Chinese data processing pipeline
Pull Request -
State: closed - Opened by shjwudp about 2 years ago
#7 - feat: add "Chinese c4" script, add docs
Pull Request -
State: closed - Opened by shjwudp about 2 years ago
#6 - fix: Fix tensorflow-datasets dependent & update README
Pull Request -
State: closed - Opened by shjwudp about 2 years ago
#5 - perf: Replace reduceByGroup to improve performance
Pull Request -
State: closed - Opened by shjwudp over 2 years ago
#4 - feat: implements "Repetition Removal" and "Document Deduplication" described as Gopher
Pull Request -
State: closed - Opened by shjwudp over 2 years ago
#3 - feat: support python package & introduce "how to submit cluster" to readme
Pull Request -
State: closed - Opened by shjwudp over 2 years ago
#2 - feat: support badwords filter
Pull Request -
State: closed - Opened by shjwudp over 2 years ago
#1 - Create LICENSE
Pull Request -
State: closed - Opened by shjwudp over 2 years ago