GitHub / scrapinghub/autoextract-spiders issues and pull requests
#29 - Update pyyaml requirement from <=3.13,>=3.10 to >=3.10,<=6.0.1 in the pip group across 1 directory
Pull Request -
State: open - Opened by dependabot[bot] over 1 year ago
Labels: dependencies
#28 - Include -s FRONTERA_DISABLED=True in README examples to ensure local …
Pull Request -
State: closed - Opened by ivanprado over 4 years ago
#27 - Set/reset Frontera for all page types
Pull Request -
State: closed - Opened by vshlapakov about 5 years ago
Labels: bug
#26 - Breadth-first order for crawling
Pull Request -
State: closed - Opened by ivanprado about 5 years ago
#25 - Deduplication to avoid infinite loops. Better priority queue for multiple domains.
Pull Request -
State: closed - Opened by ivanprado about 5 years ago
- 15 comments
#24 - Fix discovery because body was never empty
Pull Request -
State: closed - Opened by ivanprado about 5 years ago
- 1 comment
#23 - Mention seeds-file-url in the docstring
Pull Request -
State: closed - Opened by vshlapakov about 5 years ago
Labels: documentation
#22 - Make the seeds-file-url param optional
Pull Request -
State: closed - Opened by vshlapakov about 5 years ago
Labels: bug
#21 - Provide support for a seeds file url
Pull Request -
State: closed - Opened by vshlapakov about 5 years ago
- 7 comments
Labels: enhancement
#20 - Fixes and improvements
Pull Request -
State: closed - Opened by croqaz over 5 years ago
#19 - Provide Docker image based on SC stack
Pull Request -
State: closed - Opened by vshlapakov over 5 years ago
Labels: enhancement
#18 - Checking for invalid feeds may be too strict
Issue -
State: closed - Opened by kmike over 5 years ago
- 1 comment
#17 - Why is html5 stripping called for response.url?
Issue -
State: closed - Opened by kmike over 5 years ago
#16 - Does rss.xml link discovery work?
Issue -
State: closed - Opened by kmike over 5 years ago
- 1 comment
#15 - Support for job posting
Pull Request -
State: closed - Opened by croqaz over 5 years ago
- 1 comment
#14 - Updated dependencies
Pull Request -
State: closed - Opened by croqaz over 5 years ago
#13 - Added spider User-Agent header
Pull Request -
State: closed - Opened by croqaz over 5 years ago
- 1 comment
#12 - Use newer scrapy:1.8-py3 stack
Pull Request -
State: closed - Opened by vshlapakov over 5 years ago
- 1 comment
Labels: enhancement
#11 - Describe using Crawlera with AE
Pull Request -
State: closed - Opened by vshlapakov over 5 years ago
- 1 comment
#10 - Add optional fake-useragent support
Pull Request -
State: closed - Opened by vshlapakov over 5 years ago
- 1 comment
#9 - Add optional Crawlera support
Pull Request -
State: closed - Opened by vshlapakov over 5 years ago
- 2 comments
#8 - Implemented date filter rules, specified as spider arg
Pull Request -
State: open - Opened by croqaz over 5 years ago
#7 - Better de-duplication of URLs
Issue -
State: open - Opened by croqaz over 5 years ago
#6 - Filter extracted articles by date
Issue -
State: open - Opened by croqaz over 5 years ago
Labels: enhancement
#5 - It adds Fake UserAgent support
Pull Request -
State: open - Opened by rafaelcapucho almost 6 years ago
- 3 comments
#4 - It adds optional Crawlera support
Pull Request -
State: closed - Opened by rafaelcapucho almost 6 years ago
- 1 comment
#3 - Add optional AWS S3 export feature
Pull Request -
State: closed - Opened by vshlapakov almost 6 years ago
- 3 comments
#2 - Update Scrapy stack version
Pull Request -
State: closed - Opened by vshlapakov almost 6 years ago
- 2 comments
#1 - Add Frontera integration via HCF
Pull Request -
State: closed - Opened by vshlapakov almost 6 years ago
- 4 comments