Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / fhamborg/news-please issues and pull requests

#180 - Add language filter for commoncrawler

Pull Request - State: closed - Opened by AlviseSembenico about 4 years ago - 1 comment

#179 - Heuristic url

Pull Request - State: closed - Opened by AlviseSembenico about 4 years ago - 1 comment

#178 - article.date_modify returns 'None' despite the article having a modified date

Issue - State: closed - Opened by Anacoder1 about 4 years ago - 3 comments

#177 - DateFilter not working when using CLI

Issue - State: closed - Opened by benjamin-kraatz about 4 years ago - 3 comments

#176 - DateFilters are not respected from config.cfg file

Issue - State: closed - Opened by basingh over 4 years ago

#175 - Finished crawling with no results

Issue - State: open - Opened by tobiasstrauss over 4 years ago - 13 comments

#174 - RecursiveCrawler : ValueError('Missing scheme in request url: %s' % self._url)

Issue - State: closed - Opened by basingh over 4 years ago - 4 comments

#173 - fixes #172 and #169: NewsPlease.from_urls() - use multiprocessing

Pull Request - State: closed - Opened by arcolife over 4 years ago - 17 comments

#172 - NewsPlease.from_urls() could use multiprocessing

Issue - State: closed - Opened by arcolife over 4 years ago

#171 - fixes #170: custom headers for requests

Pull Request - State: closed - Opened by arcolife over 4 years ago

#170 - customized HEADERS are sometimes problematic

Issue - State: closed - Opened by arcolife over 4 years ago

#169 - handle ArticleExtractor error on empty html returns

Issue - State: closed - Opened by arcolife over 4 years ago - 1 comment

#168 - how to download warc files between specific dates

Issue - State: closed - Opened by Prateek-Tyagi over 4 years ago - 1 comment

#167 - Verbose logging of exceptions if continue_after_error

Pull Request - State: closed - Opened by sebastian-nagel over 4 years ago - 1 comment

#166 - error in executing commoncrawl.py

Issue - State: closed - Opened by Prateek-Tyagi over 4 years ago - 16 comments

#165 - Amjltc295/use str html instead of bytes html to speed up

Pull Request - State: closed - Opened by amjltc295 over 4 years ago - 3 comments

#164 - Using bytes HTML is significantly slower than str HTML for parsing content

Issue - State: closed - Opened by amjltc295 over 4 years ago - 3 comments

#162 - Crawl Specific RSS Feeds on NYTimes

Issue - State: closed - Opened by ericagredo over 4 years ago - 6 comments

#161 - problem with 'mailto' links

Issue - State: closed - Opened by nicolabertoldi over 4 years ago - 10 comments

#160 - #159 Add missing hurry.filesize to requirements.txt

Pull Request - State: closed - Opened by petlack over 4 years ago - 1 comment

#159 - ModuleNotFoundError: No module named 'hurry'

Issue - State: closed - Opened by petlack over 4 years ago

#158 - my_delete_warc_after_extraction

Pull Request - State: closed - Opened by tbrknt over 4 years ago - 1 comment

#157 - Less OS Dependency

Pull Request - State: closed - Opened by tbrknt over 4 years ago

#156 - Add exitcode check for Subprocesses

Pull Request - State: closed - Opened by tbrknt over 4 years ago - 1 comment

#155 - Windows Compatability and Subprocess Check

Pull Request - State: closed - Opened by tbrknt over 4 years ago - 1 comment

#154 - rename text to maintext consistently

Issue - State: open - Opened by fhamborg over 4 years ago
Labels: help wanted

#153 - language filtering

Issue - State: closed - Opened by lalimili6 over 4 years ago - 1 comment

#151 - Javascript is disabled on your browser

Issue - State: closed - Opened by lalimili6 over 4 years ago - 1 comment

#149 - crawl comments

Issue - State: closed - Opened by lalimili6 over 4 years ago - 1 comment

#148 - Fixed broken date extraction due to beautiful soup's tag.text.

Pull Request - State: closed - Opened by thihara over 4 years ago - 3 comments

#147 - Add library interface to scrape multiple articles from domain url

Pull Request - State: closed - Opened by mrknight21 over 4 years ago - 4 comments

#146 - add date filter for commoncrawl warc files

Pull Request - State: closed - Opened by moyid over 4 years ago - 4 comments

#145 - Changes to make Scrapy Item class customizable via configuration

Pull Request - State: closed - Opened by thihara over 4 years ago - 11 comments

#144 - Stopped the LOG_ENABLED variable from being unset

Pull Request - State: closed - Opened by thihara over 4 years ago - 2 comments

#142 - CommonCrawl: Start and End Date Not Working

Issue - State: closed - Opened by ozgurakyazi over 4 years ago - 10 comments

#141 - psycopg2 issue on macOS

Issue - State: closed - Opened by hellc over 4 years ago - 5 comments

#133 - article.text returns None on english article

Issue - State: closed - Opened by ysig about 5 years ago - 8 comments

#130 - ElasticsearchStorage can't save scraped files

Issue - State: closed - Opened by JeromeGill about 5 years ago - 6 comments
Labels: help wanted

#129 - RssCrawler doesn't support valid Rss XML

Issue - State: closed - Opened by JeromeGill about 5 years ago - 2 comments
Labels: help wanted

#101 - filter articles for keywords

Issue - State: open - Opened by fhamborg over 5 years ago - 7 comments
Labels: help wanted

#94 - Can I crawl a root site?

Issue - State: closed - Opened by truenodeverano over 5 years ago - 6 comments

#88 - Issue #54

Issue - State: closed - Opened by aamin3 over 5 years ago - 3 comments

#61 - To str convertion of the datetime fields

Issue - State: closed - Opened by anastasia-zhukova over 6 years ago - 5 comments

#12 - get order on main page

Issue - State: closed - Opened by fhamborg almost 8 years ago
Labels: help wanted

#8 - Merge articles spread on multiple pages

Issue - State: closed - Opened by fhamborg almost 8 years ago - 2 comments
Labels: help wanted