Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / bigscience-workshop/catalogue_data issues and pull requests

#67 - fix issue with config streamlit app

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#67 - fix issue with config streamlit app

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#66 - add a streamlit app to show PII logs

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#66 - add a streamlit app to show PII logs

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#65 - Multiprocessing with datasets in jsonl format

Pull Request - State: open - Opened by HugoLaurencon over 2 years ago

#65 - Multiprocessing with datasets in jsonl format

Pull Request - State: open - Opened by HugoLaurencon over 2 years ago

#64 - Execute pii on the whole oscar dataset

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#64 - Execute pii on the whole oscar dataset

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#63 - [WIP] add multiprocessing for pii

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#63 - [WIP] add multiprocessing for pii

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#62 - Add streamlit viewer app

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 1 comment

#62 - Add streamlit viewer app

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 1 comment

#61 - fixed typo in clean.py

Pull Request - State: closed - Opened by TevenLeScao over 2 years ago - 2 comments

#61 - fixed typo in clean.py

Pull Request - State: closed - Opened by TevenLeScao over 2 years ago - 2 comments

#60 - Making sure that things are sorted

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#60 - Making sure that things are sorted

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#59 - Concatenate ester dataset

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#59 - Concatenate ester dataset

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#58 - Generalise deduplicate pattern.

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#58 - Generalise deduplicate pattern.

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#57 - new way to simplify dedup url

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 2 comments

#57 - new way to simplify dedup url

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 2 comments

#56 - Make new experiment concerning filtering

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#56 - Make new experiment concerning filtering

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#55 - Replace filter with map

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#55 - Replace filter with map

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#54 - Fix vi sent tokenizer

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 1 comment

#54 - Fix vi sent tokenizer

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 1 comment

#53 - Fix stanza num proc dirty

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#53 - Fix stanza num proc dirty

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#52 - Fix stanza num proc

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#52 - Fix stanza num proc

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#51 - remove whitespace before checking for emptyness

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#51 - remove whitespace before checking for emptyness

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#50 - Generalise deduplication function

Pull Request - State: open - Opened by thomasw21 over 2 years ago

#50 - Generalise deduplication function

Pull Request - State: open - Opened by thomasw21 over 2 years ago

#49 - add sentence splitter functions

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 2 comments

#49 - add sentence splitter functions

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 2 comments

#48 - Update preprocessing key to use the new value from the google sheet

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#48 - Update preprocessing key to use the new value from the google sheet

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#47 - Add documentation.

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#47 - Add documentation.

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#46 - Code doesn't need to run deduplication script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 3 comments

#46 - Code doesn't need to run deduplication script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 3 comments

#45 - Remove unecessary deduplication

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#45 - Remove unecessary deduplication

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#43 - change way to compute the size of the text

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 3 comments

#43 - change way to compute the size of the text

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 3 comments

#42 - Make scripts robust to meta format

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#42 - Make scripts robust to meta format

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#41 - Add deduplication script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#41 - Add deduplication script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#40 - Make substring stripper regex faster

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#40 - Make substring stripper regex faster

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#39 - Fix to accurate logging

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#39 - Fix to accurate logging

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#38 - Non-Wikipedia Wikis Dedup script

Pull Request - State: closed - Opened by cakiki over 2 years ago - 1 comment

#38 - Non-Wikipedia Wikis Dedup script

Pull Request - State: closed - Opened by cakiki over 2 years ago - 1 comment

#37 - Accurate size modification logging

Pull Request - State: closed - Opened by TevenLeScao over 2 years ago

#37 - Accurate size modification logging

Pull Request - State: closed - Opened by TevenLeScao over 2 years ago

#36 - Add deduplication on url level

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments

#36 - Add deduplication on url level

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments

#35 - Short document filter in byte

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#35 - Short document filter in byte

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#34 - Compile regex

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#34 - Compile regex

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#33 - remove whitespace, numbers and punctuation before hashing

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#33 - remove whitespace, numbers and punctuation before hashing

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#32 - Remove short lines

Pull Request - State: open - Opened by thomasw21 over 2 years ago

#32 - Remove short lines

Pull Request - State: open - Opened by thomasw21 over 2 years ago

#31 - add more line filters

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 1 comment

#31 - add more line filters

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 1 comment

#30 - Add substring remover mapper

Pull Request - State: closed - Opened by cakiki over 2 years ago - 1 comment

#30 - Add substring remover mapper

Pull Request - State: closed - Opened by cakiki over 2 years ago - 1 comment

#29 - Let's save json when we need to

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#29 - Let's save json when we need to

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#28 - Opentiti fix

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 2 comments

#28 - Opentiti fix

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 2 comments

#27 - add "[if" and "<script" to list of excluded lines

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#27 - add "[if" and "<script" to list of excluded lines

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#26 - Deduplication document

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#26 - Deduplication document

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#25 - Use MD5 to obtain persistent hash

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#25 - Use MD5 to obtain persistent hash

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#24 - Test for wikis filters

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 1 comment

#24 - Test for wikis filters

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 1 comment

#23 - Remove excessive duplicates

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments

#23 - Remove excessive duplicates

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments

#22 - Curly fix

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#22 - Curly fix

Pull Request - State: closed - Opened by lvwerra over 2 years ago

#21 - Slurm script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#21 - Slurm script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago

#20 - Allow deduplication scripts to be added to the preprocessing script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#20 - Allow deduplication scripts to be added to the preprocessing script

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#19 - Add feature to see the modified examples by a map operation

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 2 comments

#19 - Add feature to see the modified examples by a map operation

Pull Request - State: closed - Opened by SaulLu over 2 years ago - 2 comments

#18 - Allow no maps or filters

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments

#18 - Allow no maps or filters

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 2 comments