Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / eleutherai/dps issues and pull requests
#82 - k
Issue -
State: open - Opened by NguyenNhoTrung 5 months ago
#81 - dedup_job java.lang.UnsatisfiedLinkError
Issue -
State: open - Opened by syedhasnainrazashah 8 months ago
- 3 comments
#80 - Fix #79
Pull Request -
State: closed - Opened by ohwi over 1 year ago
#79 - Bug in the function `remove_repeated_text`
Issue -
State: closed - Opened by ohwi over 1 year ago
#78 - documentation fixes for the dataframe workflow
Pull Request -
State: closed - Opened by paulovn over 1 year ago
#77 - Clean up JA pipeline
Pull Request -
State: closed - Opened by polm-stability over 1 year ago
- 2 comments
#76 - DataFrame processing pipeline
Pull Request -
State: closed - Opened by paulovn over 1 year ago
- 4 comments
#75 - fix carriage return removed
Pull Request -
State: open - Opened by jason9693 over 1 year ago
#74 - [ja] `.filter` is used instead of `.map` for non-filter methods
Issue -
State: open - Opened by mrorii over 1 year ago
- 1 comment
#73 - fix: return list[str] from word_tokenize instead of str
Pull Request -
State: closed - Opened by mrorii over 1 year ago
- 1 comment
#72 - dev japanese
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
- 1 comment
#71 - Japanese development branch
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
- 1 comment
#70 - Update README.md
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
#69 - [WIP] preprocessing vietnamese language
Pull Request -
State: open - Opened by wookee3 over 1 year ago
- 1 comment
#68 - compatible for v2
Pull Request -
State: closed - Opened by jason9693 over 1 year ago
- 1 comment
#67 - add pre-processing for Chinese
Pull Request -
State: closed - Opened by Kaeun-Lee over 1 year ago
- 1 comment
#66 - Work In Progress (thai)
Pull Request -
State: closed - Opened by skytmddus27 over 1 year ago
- 2 comments
#65 - Chiese dedup memory error
Issue -
State: open - Opened by hyeinhyun over 1 year ago
- 1 comment
#64 - [WIP] [#62] improve MinHashLSH-based deduplication for Japanese
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
- 1 comment
#63 - [WIP] [#62] add refactored method for Japanese MinHashLSH-based near-deduplication
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
- 1 comment
#62 - [ja] refactor MinHashLSH-based near deduplication method
Issue -
State: open - Opened by fujiki-1emon over 1 year ago
#61 - improve the japanese pre-processing (namely `japanese_job`)
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
#60 - [WIP] improve the japanese pre-processing (namely `japanese_job`)
Pull Request -
State: closed - Opened by fujiki-1emon over 1 year ago
#59 - [#52] freq char filter: flip comparison and ratio->cnt
Pull Request -
State: closed - Opened by skjang54 over 1 year ago
#58 - add indonesia and malaysia preprocessed
Pull Request -
State: closed - Opened by acul3 over 1 year ago
#57 - Refactor RDD process to Dataframe process
Issue -
State: open - Opened by Taekyoon over 1 year ago
Labels: enhancement
#56 - Need to add ignore null or empty text during korean text process
Issue -
State: open - Opened by Taekyoon over 1 year ago
Labels: bug
#55 - [#54] Improve korean preprocessing algorithm
Pull Request -
State: closed - Opened by hyunwoongko over 1 year ago
#54 - Improve Korean preprocessing algorithm
Issue -
State: closed - Opened by hyunwoongko over 1 year ago
#53 - [#52] add japanese_frequent_char_existence_filter
Pull Request -
State: closed - Opened by skjang54 over 1 year ago
- 1 comment
#52 - Japanese pre-procesesing - remove text with low rate of Japanese stopwords
Issue -
State: closed - Opened by fujiki-1emon over 1 year ago
- 4 comments
#51 - [ja] spam word filter
Issue -
State: open - Opened by fujiki-1emon over 1 year ago
#50 - [ja] reduce emoticon
Issue -
State: closed - Opened by fujiki-1emon over 1 year ago
- 1 comment
#49 - [ja] replace Japanese PII
Issue -
State: open - Opened by fujiki-1emon over 1 year ago
#48 - First commit for Romance filtering
Pull Request -
State: closed - Opened by josemlopez over 1 year ago
- 3 comments
#47 - small fix to requirements.txt
Pull Request -
State: closed - Opened by fujiki-1emon almost 2 years ago
- 1 comment
#46 - add pre-processing for Japanese
Pull Request -
State: closed - Opened by fujiki-1emon almost 2 years ago
- 3 comments
#45 - Add many features to korean
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#44 - Add BR to html processing
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#43 - Add html and url processing
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#42 - modify dedup_job
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#41 - modify readme
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#40 - rename sample-jsonl to sample-job
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#39 - modify readme
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#39 - modify readme
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#38 - Fetch from master (modify readme)
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#37 - Dev to master
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
#36 - Feature/#33 Add email, url, spam detection
Pull Request -
State: closed - Opened by hyunwoongko almost 2 years ago
- 4 comments
#35 - [WIP] Add minhash dedup job
Pull Request -
State: closed - Opened by Taekyoon almost 2 years ago
- 3 comments
#34 - Implement minhash dedup module
Issue -
State: closed - Opened by Taekyoon almost 2 years ago
Labels: enhancement
#33 - Task consideration
Issue -
State: closed - Opened by hyunwoongko about 2 years ago
- 3 comments
#32 - Replace html2text from Beautifulsoup
Issue -
State: closed - Opened by Taekyoon about 2 years ago
- 1 comment
#31 - WIP: add Deduplication for Japanese text datasets
Pull Request -
State: closed - Opened by fujiki-1emon about 2 years ago
- 1 comment
#30 - WIP: Deduplication for Japanese text datasets
Pull Request -
State: closed - Opened by fujiki-1emon about 2 years ago
#29 - Add massive text filter logics
Pull Request -
State: closed - Opened by Taekyoon about 2 years ago
#28 - Add pre-processing for Japanese texts
Issue -
State: closed - Opened by fujiki-1emon about 2 years ago
#27 - Remove `soynlp` library
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: enhancement
#26 - Change logics for sample jsonl
Pull Request -
State: closed - Opened by Taekyoon over 2 years ago
#25 - Add scripts to run hadoop cluster
Pull Request -
State: closed - Opened by Taekyoon over 2 years ago
#24 - [WIP] Feature/#23
Pull Request -
State: closed - Opened by Ronalmoo over 2 years ago
- 1 comment
#23 - Update additional preprocess function
Issue -
State: closed - Opened by Ronalmoo over 2 years ago
- 1 comment
#22 - Add normalize `?,:"!` in common preprocess job
Issue -
State: closed - Opened by Taekyoon over 2 years ago
#21 - Feature/#17
Pull Request -
State: closed - Opened by Kaeun-Lee over 2 years ago
- 3 comments
#20 - Add scripts to run hadoop cluster
Issue -
State: open - Opened by Taekyoon over 2 years ago
Labels: enhancement
#19 - Add function for processing empty string
Pull Request -
State: closed - Opened by Ronalmoo over 2 years ago
#18 - Add function for processing empty string
Issue -
State: closed - Opened by Ronalmoo over 2 years ago
#17 - Add huggingface tokenizers for data length statistics
Issue -
State: closed - Opened by Kaeun-Lee over 2 years ago
#16 - Add job to separate train and validate data
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: add job
#15 - Feature/#13
Pull Request -
State: closed - Opened by donggrii over 2 years ago
- 7 comments
#14 - Feature/#13
Pull Request -
State: closed - Opened by donggrii over 2 years ago
#13 - Add statistics by data category
Issue -
State: closed - Opened by donggrii over 2 years ago
Labels: add tool
#12 - [python] feat: Add build news paper data job
Pull Request -
State: closed - Opened by Taekyoon over 2 years ago
Labels: add job
#11 - Feature/#4
Pull Request -
State: closed - Opened by jayseok-park over 2 years ago
- 1 comment
#10 - Feature/#1
Pull Request -
State: closed - Opened by Ronalmoo over 2 years ago
- 1 comment
#9 - Add build news paper dataset as long text data form
Issue -
State: open - Opened by Taekyoon over 2 years ago
Labels: add job
#8 - Add Toxic text labeler
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: add job
#7 - Add Text length Stats for datasets
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: add job
#6 - [etc] feat: Add development guides
Pull Request -
State: closed - Opened by Taekyoon over 2 years ago
Labels: documentation
#5 - [etc] feat: Add requriements-dev.txt
Pull Request -
State: closed - Opened by Taekyoon over 2 years ago
Labels: enhancement
#4 - MassiveText Quality Filtering
Issue -
State: closed - Opened by jayseok-park over 2 years ago
- 3 comments
Labels: add job
#3 - Add guides to run dps jobs
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: enhancement
#2 - Add requirements-dev.txt
Issue -
State: closed - Opened by Taekyoon over 2 years ago
Labels: enhancement
#1 - Add general text refinement job
Issue -
State: closed - Opened by Taekyoon over 2 years ago
- 2 comments
Labels: add job