Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / google/corpuscrawler issues and pull requests
#92 - Use available sentences corpora for Wikipedia (290+ languages)
Issue -
State: open - Opened by hugolpz 12 months ago
#91 - Fix robots.txt fallback to be a byte string
Pull Request -
State: closed - Opened by sffc about 1 year ago
#90 - Fix parsing for rfa.org
Pull Request -
State: closed - Opened by sffc almost 3 years ago
- 1 comment
#89 - Add __main__.py file so that corpuscrawler can be invoked as a module
Pull Request -
State: closed - Opened by sffc almost 3 years ago
#88 - [ga] update crawler
Pull Request -
State: closed - Opened by jimregan over 3 years ago
- 2 comments
#87 - Undefined names
Issue -
State: open - Opened by cclauss almost 4 years ago
#86 - No module named 'corpuscrawler' error
Issue -
State: open - Opened by Aayush-hub almost 4 years ago
- 2 comments
#85 - Update README.md
Pull Request -
State: closed - Opened by 83-W almost 4 years ago
- 1 comment
#84 - Use corpora from Universal Dependencies
Issue -
State: open - Opened by brawer almost 4 years ago
#83 - Documentation > Clarify language codes system in uses
Issue -
State: closed - Opened by hugolpz almost 4 years ago
- 4 comments
#82 - Shorten project structure
Issue -
State: open - Opened by hugolpz almost 4 years ago
- 3 comments
#81 - Define crawlers' output format
Issue -
State: open - Opened by hugolpz almost 4 years ago
#80 - Improve readme documentation on how to provide a new crawler
Issue -
State: open - Opened by hugolpz almost 4 years ago
- 5 comments
#79 - Use available corpora for opensubtitles (63 languages)
Issue -
State: open - Opened by hugolpz almost 4 years ago
- 3 comments
#78 - Add Wikipedia crawler ? (300+ languages)
Issue -
State: open - Opened by hugolpz almost 4 years ago
- 5 comments
#77 - Adding Pali and Karen
Pull Request -
State: closed - Opened by sffc over 4 years ago
#76 - Add Pali, Mon, and Karen
Issue -
State: closed - Opened by sffc over 4 years ago
- 1 comment
#75 - Update crawl_su.py
Pull Request -
State: closed - Opened by mahalisyarifuddin over 4 years ago
- 1 comment
#74 - Adding New URLs
Issue -
State: closed - Opened by Mounika2405 over 4 years ago
- 2 comments
#73 - Does not run in python3.7 or python 2.7
Issue -
State: open - Opened by ftyers about 5 years ago
- 1 comment
#72 - [ga] new crawlers
Pull Request -
State: closed - Opened by jimregan about 5 years ago
#71 - [ga] new crawlers
Pull Request -
State: closed - Opened by jimregan about 5 years ago
#70 - Set context settable
Pull Request -
State: closed - Opened by jimregan about 5 years ago
- 1 comment
#69 - Create crawl_sea.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
- 1 comment
#68 - Update crawl_id.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#67 - Create crawl_xmm.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#66 - Create crawl_bug.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#65 - Create crawl_tet.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#64 - Create crawl_nn.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#63 - Create crawl_nb.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#62 - Create crawl_eip.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#61 - Create crawl_saj.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#60 - Create crawl_xte.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#59 - Create crawl_bhz.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#58 - Create crawl_frd.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#57 - Create crawl_lbw.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#56 - Update crawl_id.py
Pull Request -
State: closed - Opened by mahalisyarifuddin about 5 years ago
#55 - [ga] fix regex
Pull Request -
State: closed - Opened by jimregan about 5 years ago
- 1 comment
#54 - [th] Add crawl bibleis
Pull Request -
State: closed - Opened by wannaphong over 5 years ago
- 1 comment
#53 - [th] Thai crawler
Pull Request -
State: closed - Opened by wannaphong over 5 years ago
- 1 comment
#52 - Fixed Python 3 compatibility
Pull Request -
State: closed - Opened by wannaphong over 5 years ago
- 3 comments
#51 - Skip urls with non-200 http status
Pull Request -
State: closed - Opened by blackblitz over 5 years ago
- 3 comments
#50 - 404 error with Myanmar Zawgyi
Issue -
State: closed - Opened by blackblitz over 5 years ago
- 2 comments
#49 - Portuguese: doubt about the corpus result
Issue -
State: open - Opened by ghost over 5 years ago
- 1 comment
Labels: help wanted
#48 - Add Norwegian language
Issue -
State: open - Opened by Orekhov over 5 years ago
- 1 comment
Labels: help wanted
#47 - Adding title to CONTRIBUTING.md
Pull Request -
State: closed - Opened by kshithijiyer over 5 years ago
#46 - Fixed 3 crawlers
Pull Request -
State: closed - Opened by cash over 5 years ago
- 2 comments
#45 - fixes bibleis crawler
Pull Request -
State: closed - Opened by cash over 5 years ago
- 2 comments
#44 - crawler gets hung after downloading a few hits
Issue -
State: closed - Opened by thebucketmouse over 5 years ago
- 2 comments
#43 - what sites are crawled?
Issue -
State: closed - Opened by thebucketmouse almost 6 years ago
- 2 comments
Labels: question
#42 - Error when crawling Kaqchikel
Issue -
State: closed - Opened by ftyers almost 6 years ago
- 3 comments
#41 - Crawl Pali corpora
Issue -
State: open - Opened by brawer about 6 years ago
Labels: help wanted
#40 - Update Zawgyi locale to Qaag
Issue -
State: open - Opened by sffc over 6 years ago
#39 - [iba] Crawl a larger corpus for the Iban language
Pull Request -
State: closed - Opened by brawer over 6 years ago
#38 - US embassy crawler for Polish
Pull Request -
State: closed - Opened by jimregan over 6 years ago
#37 - how to
Issue -
State: closed - Opened by MayuraVerma almost 7 years ago
- 1 comment
#36 - [ga] 3 new crawlers
Pull Request -
State: closed - Opened by jimregan about 7 years ago
- 1 comment
#35 - [ga] CHG crawler
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#34 - Irish Times
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#33 - move crawl_bibleis to util; add for Ukrainian
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#32 - [ace] bible crawl
Pull Request -
State: closed - Opened by jimregan about 7 years ago
- 3 comments
#31 - basic crawler for Aceh
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#30 - Rename crawl_taq to crawl_kab
Issue -
State: closed - Opened by brawer about 7 years ago
#29 - [be-tarask] Add corpus for Belarusian (Taraškievica)
Issue -
State: closed - Opened by brawer about 7 years ago
#28 - [cy] add basic Welsh crawler
Pull Request -
State: closed - Opened by cwd24 about 7 years ago
- 1 comment
#27 - [mi] Filter out lines with English “the” from the Maori corpus
Pull Request -
State: closed - Opened by brawer about 7 years ago
#26 - [mi] Filter out English text
Issue -
State: closed - Opened by brawer about 7 years ago
- 1 comment
#25 - Allow Zawgyi crawling separate from my
Issue -
State: closed - Opened by sffc about 7 years ago
#24 - Thanlwintimes.com No Longer Available
Issue -
State: closed - Opened by sffc about 7 years ago
#23 - [mi] (public domain) Bible scraper
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#22 - [ga] another sentence start to omit
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#21 - [ga] conditions were right, needed to cast to int
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#20 - need more ns/no ns handling here
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#19 - Python 3 compatibility
Issue -
State: open - Opened by sffc about 7 years ago
- 1 comment
#18 - [ga] url conditions were backwards
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#17 - handle mixed broken/unbroken namespaces
Pull Request -
State: closed - Opened by jimregan about 7 years ago
#16 - [gd] scraper for dasg corpus (#12)
Pull Request -
State: closed - Opened by jimregan about 7 years ago
- 1 comment
#15 - [mi] Maori scraper
Pull Request -
State: closed - Opened by jimregan about 7 years ago
- 1 comment
#14 - [util] Add filepath to FetchResult
Pull Request -
State: closed - Opened by behnam over 7 years ago
#13 - [ga] Irish: fixed RTE news scraper
Pull Request -
State: closed - Opened by jimregan over 7 years ago
#12 - [gd] Extend Scottish Gaelic corpus
Issue -
State: closed - Opened by brawer over 7 years ago
- 3 comments
#11 - [WIP] [ga] basic crawler for Irish
Pull Request -
State: closed - Opened by jimregan over 7 years ago
#10 - basic crawler for Scots Gaelic (gd)
Pull Request -
State: closed - Opened by jimregan over 7 years ago
#9 - [si] Add crawler for Sinhala
Pull Request -
State: closed - Opened by keshan over 7 years ago
#8 - harfbuzz-testing-wikipedia
Issue -
State: open - Opened by behdad over 7 years ago
- 1 comment
#7 - [util] Replace unichr() for narrow Python builds
Pull Request -
State: closed - Opened by behnam over 7 years ago
#6 - [ar] Add bbc_news and sputnik_news
Pull Request -
State: closed - Opened by behnam over 7 years ago
#5 - [ar] Add Modern Standard Arabic: UDHR and DW
Pull Request -
State: closed - Opened by behnam over 7 years ago
#4 - [util/fetch] Add more prints for showing progress
Pull Request -
State: closed - Opened by behnam over 7 years ago
#3 - Add (Modern Standard) Arabic language
Issue -
State: open - Opened by behnam over 7 years ago
- 9 comments
#2 - [util/fetch_sitemap] Add subsitemap_filter option
Pull Request -
State: closed - Opened by behnam over 7 years ago
- 3 comments
Labels: enhancement
#1 - [shn] Add crawler for the Shan language
Pull Request -
State: closed - Opened by brawer over 7 years ago