Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / google/corpuscrawler issues and pull requests

#91 - Fix robots.txt fallback to be a byte string

Pull Request - State: closed - Opened by sffc 7 months ago

#90 - Fix parsing for rfa.org

Pull Request - State: closed - Opened by sffc about 2 years ago - 1 comment

#89 - Add __main__.py file so that corpuscrawler can be invoked as a module

Pull Request - State: closed - Opened by sffc about 2 years ago

#88 - [ga] update crawler

Pull Request - State: closed - Opened by jimregan almost 3 years ago - 2 comments

#87 - Undefined names

Issue - State: open - Opened by cclauss over 3 years ago

#86 - No module named 'corpuscrawler' error

Issue - State: open - Opened by Aayush-hub over 3 years ago - 2 comments

#85 - Update README.md

Pull Request - State: closed - Opened by 83-W over 3 years ago - 1 comment

#84 - Use corpora from Universal Dependencies

Issue - State: open - Opened by brawer over 3 years ago

#83 - Documentation > Clarify language codes system in uses

Issue - State: closed - Opened by hugolpz over 3 years ago - 4 comments

#82 - Shorten project structure

Issue - State: open - Opened by hugolpz over 3 years ago - 3 comments

#81 - Define crawlers' output format

Issue - State: open - Opened by hugolpz over 3 years ago

#80 - Improve readme documentation on how to provide a new crawler

Issue - State: open - Opened by hugolpz over 3 years ago - 5 comments

#79 - Use available corpora for opensubtitles (63 languages)

Issue - State: open - Opened by hugolpz over 3 years ago - 3 comments

#78 - Add Wikipedia crawler ? (300+ languages)

Issue - State: open - Opened by hugolpz over 3 years ago - 5 comments

#77 - Adding Pali and Karen

Pull Request - State: closed - Opened by sffc almost 4 years ago

#76 - Add Pali, Mon, and Karen

Issue - State: closed - Opened by sffc almost 4 years ago - 1 comment

#75 - Update crawl_su.py

Pull Request - State: closed - Opened by mahalisyarifuddin about 4 years ago - 1 comment

#74 - Adding New URLs

Issue - State: closed - Opened by Mounika2405 about 4 years ago - 2 comments

#73 - Does not run in python3.7 or python 2.7

Issue - State: open - Opened by ftyers over 4 years ago - 1 comment

#72 - [ga] new crawlers

Pull Request - State: closed - Opened by jimregan over 4 years ago

#71 - [ga] new crawlers

Pull Request - State: closed - Opened by jimregan over 4 years ago

#70 - Set context settable

Pull Request - State: closed - Opened by jimregan over 4 years ago - 1 comment

#69 - Create crawl_sea.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago - 1 comment

#68 - Update crawl_id.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#67 - Create crawl_xmm.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#66 - Create crawl_bug.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#65 - Create crawl_tet.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#64 - Create crawl_nn.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#63 - Create crawl_nb.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#62 - Create crawl_eip.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#61 - Create crawl_saj.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#60 - Create crawl_xte.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#59 - Create crawl_bhz.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#58 - Create crawl_frd.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#57 - Create crawl_lbw.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#56 - Update crawl_id.py

Pull Request - State: closed - Opened by mahalisyarifuddin over 4 years ago

#55 - [ga] fix regex

Pull Request - State: closed - Opened by jimregan over 4 years ago - 1 comment

#54 - [th] Add crawl bibleis

Pull Request - State: closed - Opened by wannaphong over 4 years ago - 1 comment

#53 - [th] Thai crawler

Pull Request - State: closed - Opened by wannaphong over 4 years ago - 1 comment

#52 - Fixed Python 3 compatibility

Pull Request - State: closed - Opened by wannaphong over 4 years ago - 3 comments

#51 - Skip urls with non-200 http status

Pull Request - State: closed - Opened by blackblitz over 4 years ago - 3 comments

#50 - 404 error with Myanmar Zawgyi

Issue - State: closed - Opened by blackblitz over 4 years ago - 2 comments

#49 - Portuguese: doubt about the corpus result

Issue - State: open - Opened by ghost almost 5 years ago - 1 comment
Labels: help wanted

#48 - Add Norwegian language

Issue - State: open - Opened by Orekhov almost 5 years ago - 1 comment
Labels: help wanted

#47 - Adding title to CONTRIBUTING.md

Pull Request - State: closed - Opened by kshithijiyer almost 5 years ago

#46 - Fixed 3 crawlers

Pull Request - State: closed - Opened by cash almost 5 years ago - 2 comments

#45 - fixes bibleis crawler

Pull Request - State: closed - Opened by cash almost 5 years ago - 2 comments

#44 - crawler gets hung after downloading a few hits

Issue - State: closed - Opened by thebucketmouse about 5 years ago - 2 comments

#43 - what sites are crawled?

Issue - State: closed - Opened by thebucketmouse about 5 years ago - 2 comments
Labels: question

#42 - Error when crawling Kaqchikel

Issue - State: closed - Opened by ftyers over 5 years ago - 3 comments

#41 - Crawl Pali corpora

Issue - State: open - Opened by brawer over 5 years ago
Labels: help wanted

#40 - Update Zawgyi locale to Qaag

Issue - State: open - Opened by sffc almost 6 years ago

#39 - [iba] Crawl a larger corpus for the Iban language

Pull Request - State: closed - Opened by brawer about 6 years ago

#38 - US embassy crawler for Polish

Pull Request - State: closed - Opened by jimregan about 6 years ago

#37 - how to

Issue - State: closed - Opened by MayuraVerma about 6 years ago - 1 comment

#36 - [ga] 3 new crawlers

Pull Request - State: closed - Opened by jimregan over 6 years ago - 1 comment

#35 - [ga] CHG crawler

Pull Request - State: closed - Opened by jimregan over 6 years ago

#34 - Irish Times

Pull Request - State: closed - Opened by jimregan over 6 years ago

#33 - move crawl_bibleis to util; add for Ukrainian

Pull Request - State: closed - Opened by jimregan over 6 years ago

#32 - [ace] bible crawl

Pull Request - State: closed - Opened by jimregan over 6 years ago - 3 comments

#31 - basic crawler for Aceh

Pull Request - State: closed - Opened by jimregan over 6 years ago

#30 - Rename crawl_taq to crawl_kab

Issue - State: closed - Opened by brawer over 6 years ago

#29 - [be-tarask] Add corpus for Belarusian (Taraškievica)

Issue - State: closed - Opened by brawer over 6 years ago

#28 - [cy] add basic Welsh crawler

Pull Request - State: closed - Opened by cwd24 over 6 years ago - 1 comment

#27 - [mi] Filter out lines with English “the” from the Maori corpus

Pull Request - State: closed - Opened by brawer over 6 years ago

#26 - [mi] Filter out English text

Issue - State: closed - Opened by brawer over 6 years ago - 1 comment

#25 - Allow Zawgyi crawling separate from my

Issue - State: closed - Opened by sffc over 6 years ago

#24 - Thanlwintimes.com No Longer Available

Issue - State: closed - Opened by sffc over 6 years ago

#23 - [mi] (public domain) Bible scraper

Pull Request - State: closed - Opened by jimregan over 6 years ago

#22 - [ga] another sentence start to omit

Pull Request - State: closed - Opened by jimregan over 6 years ago

#21 - [ga] conditions were right, needed to cast to int

Pull Request - State: closed - Opened by jimregan over 6 years ago

#20 - need more ns/no ns handling here

Pull Request - State: closed - Opened by jimregan over 6 years ago

#19 - Python 3 compatibility

Issue - State: open - Opened by sffc over 6 years ago - 1 comment

#18 - [ga] url conditions were backwards

Pull Request - State: closed - Opened by jimregan over 6 years ago

#17 - handle mixed broken/unbroken namespaces

Pull Request - State: closed - Opened by jimregan over 6 years ago

#16 - [gd] scraper for dasg corpus (#12)

Pull Request - State: closed - Opened by jimregan over 6 years ago - 1 comment

#15 - [mi] Maori scraper

Pull Request - State: closed - Opened by jimregan over 6 years ago - 1 comment

#14 - [util] Add filepath to FetchResult

Pull Request - State: closed - Opened by behnam over 6 years ago

#13 - [ga] Irish: fixed RTE news scraper

Pull Request - State: closed - Opened by jimregan over 6 years ago

#12 - [gd] Extend Scottish Gaelic corpus

Issue - State: closed - Opened by brawer over 6 years ago - 3 comments

#11 - [WIP] [ga] basic crawler for Irish

Pull Request - State: closed - Opened by jimregan over 6 years ago

#10 - basic crawler for Scots Gaelic (gd)

Pull Request - State: closed - Opened by jimregan over 6 years ago

#9 - [si] Add crawler for Sinhala

Pull Request - State: closed - Opened by keshan over 6 years ago

#8 - harfbuzz-testing-wikipedia

Issue - State: open - Opened by behdad over 6 years ago - 1 comment

#7 - [util] Replace unichr() for narrow Python builds

Pull Request - State: closed - Opened by behnam over 6 years ago

#6 - [ar] Add bbc_news and sputnik_news

Pull Request - State: closed - Opened by behnam over 6 years ago

#5 - [ar] Add Modern Standard Arabic: UDHR and DW

Pull Request - State: closed - Opened by behnam over 6 years ago

#4 - [util/fetch] Add more prints for showing progress

Pull Request - State: closed - Opened by behnam over 6 years ago

#3 - Add (Modern Standard) Arabic language

Issue - State: open - Opened by behnam over 6 years ago - 9 comments

#2 - [util/fetch_sitemap] Add subsitemap_filter option

Pull Request - State: closed - Opened by behnam over 6 years ago - 3 comments
Labels: enhancement

#1 - [shn] Add crawler for the Shan language

Pull Request - State: closed - Opened by brawer almost 7 years ago