Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / wellcometrust/wsf-web-scraper issues and pull requests

#153 - ⬆️ Bump certifi from 2018.11.29 to 2023.7.22

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#152 - ⬆️ Bump scrapy from 1.5.1 to 2.6.3

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#151 - ⬆️ Bump cryptography from 2.4.2 to 39.0.1

Pull Request - State: open - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#150 - ⬆️ Bump certifi from 2018.11.29 to 2022.12.7

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago - 1 comment
Labels: dependencies

#149 - ⬆️ Bump scrapy from 1.5.1 to 2.6.2

Pull Request - State: closed - Opened by dependabot[bot] about 2 years ago - 1 comment
Labels: dependencies

#148 - ⬆️ Bump lxml from 4.2.5 to 4.9.1

Pull Request - State: open - Opened by dependabot[bot] about 2 years ago
Labels: dependencies

#147 - ⬆️ Bump scrapy from 1.5.1 to 2.6.1

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 1 comment
Labels: dependencies

#146 - ⬆️ Bump twisted from 18.9.0 to 22.4.0

Pull Request - State: open - Opened by dependabot[bot] over 2 years ago
Labels: dependencies

#145 - ⬆️ Bump lxml from 4.2.5 to 4.6.5

Pull Request - State: closed - Opened by dependabot[bot] almost 3 years ago - 1 comment
Labels: dependencies

#144 - ⬆️ Bump scrapy from 1.5.1 to 1.8.1

Pull Request - State: closed - Opened by dependabot[bot] almost 3 years ago - 1 comment
Labels: dependencies

#143 - ⬆️ Bump py from 1.7.0 to 1.10.0

Pull Request - State: open - Opened by dependabot[bot] over 3 years ago
Labels: dependencies

#142 - ⬆️ Bump lxml from 4.2.5 to 4.6.3

Pull Request - State: closed - Opened by dependabot[bot] over 3 years ago - 1 comment
Labels: dependencies

#141 - ⬆️ Bump lxml from 4.2.5 to 4.6.2

Pull Request - State: closed - Opened by dependabot[bot] over 3 years ago - 1 comment
Labels: dependencies

#140 - ⬆️ Bump cryptography from 2.4.2 to 3.2

Pull Request - State: closed - Opened by dependabot[bot] almost 4 years ago - 1 comment
Labels: dependencies

#139 - ⬆️ Bump twisted from 18.9.0 to 19.7.0

Pull Request - State: closed - Opened by dependabot[bot] almost 5 years ago - 1 comment
Labels: dependencies

#138 - Add other reference section names

Issue - State: open - Opened by lizgzil over 5 years ago - 1 comment
Labels: enhancement

#137 - Fix Sentry spamming us with Http Errors

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago

#136 - Remove the pdf_text column from the scraper database

Issue - State: closed - Opened by SamDepardieu over 5 years ago - 3 comments
Labels: enhancement

#134 - Update the Makefile version

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago - 1 comment

#133 - Remove the pdf_text field from the scraper

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago

#132 - Catch the TypeErrors encountered by the pdf_parser module

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago

#130 - Add scrape again to the updated columns

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago - 3 comments

#129 - Pulling publications from the database into the warehouse leads to an oom

Issue - State: closed - Opened by SamDepardieu over 5 years ago - 7 comments
Labels: bug

#128 - OSError 36 fix

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago

#127 - Twisted errors investigation

Issue - State: closed - Opened by SamDepardieu over 5 years ago - 1 comment
Labels: bug

#126 - UnboundLocal error

Issue - State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug

#125 - NoneType on some PDF

Issue - State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug

#124 - OSError 36 on some gov uk files

Issue - State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug

#123 - Fix sentry

Pull Request - State: closed - Opened by SamDepardieu over 5 years ago

#122 - Keyword sentence content in varying formats

Issue - State: closed - Opened by lizgzil over 5 years ago - 8 comments
Labels: bug

#121 - Sentry

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago

#120 - Fix the scraper's JSON results file

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment

#119 - Scraper doesn't set the scrape again flag to false

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug

#118 - :arrow_up: Upgrade requests library

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago

#117 - Scraper got killed when scraping WHO

Issue - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment
Labels: bug

#116 - The scraper has some issues with the json files

Issue - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment
Labels: bug

#115 - Accessing scraped data for gov_uk takes more than 900 seconds.

Issue - State: closed - Opened by nsorros almost 6 years ago - 2 comments
Labels: question

#114 - Convert the item to a dict is we want to scrape again

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago

#112 - Fix some pdf related issues

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment

#111 - Use tempfiles instead of files in a tempdir

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement

#110 - Remove files on error and already scraped items

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago - 6 comments

#109 - Parliament spider sometimes fail to recognize PDFs

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug

#108 - Some documents leads to an issue

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug

#107 - Some documents leads to an Item error

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug

#106 - Parliament scraping taking too long/not deleting files

Issue - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment
Labels: bug

#105 - Add a push option to the makefile

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement

#104 - Add parliament scraping to the scraper

Pull Request - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement

#103 - WHO scraping sometimes fails

Issue - State: closed - Opened by SamDepardieu almost 6 years ago - 1 comment
Labels: bug

#102 - Long scrapings are failing in AWS

Issue - State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug

#101 - Facilitate development from within docker

Pull Request - State: closed - Opened by hblanks almost 6 years ago - 4 comments

#100 - tools/AWSFeedStorage.py: remove references to DynamoDB

Pull Request - State: closed - Opened by hblanks almost 6 years ago

#99 - [WiP] Fix the WHO scraping task

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 2 comments
Labels: bug

#98 - Scrape the Parliament website

Issue - State: closed - Opened by SamDepardieu about 6 years ago
Labels: enhancement

#97 - Update readme and installation worklow

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago

#96 - Add Contributing guidelines and PR template to the repo

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 3 comments

#95 - Change the UNICEF scraping attributes

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago

#94 - Remove DynamoDB testing as we're not using dynamo anymore

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago

#93 - Remove moto from the pipenv dependencies

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago

#92 - Keyword match returns no associated text

Issue - State: closed - Opened by nsorros about 6 years ago - 2 comments
Labels: bug

#91 - Fixing a few issues concerning the scraper and RDS

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 2 comments

#90 - Dependencies compatibility

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 2 comments

#89 - Allow spliting per year

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 2 comments

#87 - Rds compatibility

Pull Request - State: closed - Opened by SamDepardieu about 6 years ago - 8 comments

#86 - Add timestamp to output

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#85 - Sorted in s3 sorts in the opposite than expected order potentially

Issue - State: closed - Opened by nsorros over 6 years ago
Labels: bug

#83 - Provider shouldn't be spider name for parallelisation pruposes

Issue - State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement

#82 - Item['pdf'] is not obvious that contains the filename of the pdf

Issue - State: closed - Opened by nsorros over 6 years ago - 3 comments

#81 - Refactor the file management part of the spiders

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#80 - Add a MSF spider to scrape their website

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 3 comments

#79 - Add the output format in the readme.md

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 3 comments

#78 - Fix error message due to who item lacking an attribute

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#77 - Add gov.uk to the scraping list

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment

#76 - Add new fields to the result file

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment

#75 - Parametered launch of scrapy

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#74 - Create a spider to scrape the unicef website

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment

#73 - Modify the web scrapper in order to make it works with the new AWS architecture

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago - 2 comments

#72 - Fix OSError [2] on file deletion

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#71 - Fix "Error article table doesn't exist"

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#70 - Update README.md

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#69 - Investigate scraping issues on AWS

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#68 - Dynamodb compatibility

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#67 - Make AWS credentials an environment variable

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#66 - Write a better readme

Issue - State: closed - Opened by SamDepardieu over 6 years ago - 6 comments

#65 - Remove DSX support

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#62 - Implement (optionnal) file type checks for generic crawls

Issue - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment
Labels: enhancement

#61 - Switch keyword analysis to a different project ?

Issue - State: closed - Opened by SamDepardieu over 6 years ago
Labels: help wanted, question

#60 - Create specific url filter for generic crawls

Issue - State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement

#59 - Create a generic crawl pipeline

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#58 - Change the scraper to use an ORM

Issue - State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement

#57 - Create a generic crawling spider

Issue - State: closed - Opened by SamDepardieu over 6 years ago - 3 comments

#56 - Who Iris design change

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#55 - Make database connector abstract

Issue - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment

#54 - Make Scrapy compatible with DynamoDB

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#53 - Create an entrypoint script

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#51 - Add authentication to webservices

Issue - State: closed - Opened by SamDepardieu over 6 years ago - 1 comment

#45 - Small changes to PDF processing and code cleaning

Pull Request - State: closed - Opened by SamDepardieu over 6 years ago

#40 - Add new webservices to scrapyd

Issue - State: closed - Opened by SamDepardieu over 6 years ago

#27 - Incremental scraping 🎊

Issue - State: closed - Opened by SamDepardieu over 6 years ago
Labels: Epic