Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / wellcometrust/wsf-web-scraper issues and pull requests
#153 - ⬆️ Bump certifi from 2018.11.29 to 2023.7.22
Pull Request -
State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies
#152 - ⬆️ Bump scrapy from 1.5.1 to 2.6.3
Pull Request -
State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies
#151 - ⬆️ Bump cryptography from 2.4.2 to 39.0.1
Pull Request -
State: open - Opened by dependabot[bot] over 1 year ago
Labels: dependencies
#150 - ⬆️ Bump certifi from 2018.11.29 to 2022.12.7
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
- 1 comment
Labels: dependencies
#149 - ⬆️ Bump scrapy from 1.5.1 to 2.6.2
Pull Request -
State: closed - Opened by dependabot[bot] about 2 years ago
- 1 comment
Labels: dependencies
#148 - ⬆️ Bump lxml from 4.2.5 to 4.9.1
Pull Request -
State: open - Opened by dependabot[bot] about 2 years ago
Labels: dependencies
#147 - ⬆️ Bump scrapy from 1.5.1 to 2.6.1
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 1 comment
Labels: dependencies
#146 - ⬆️ Bump twisted from 18.9.0 to 22.4.0
Pull Request -
State: open - Opened by dependabot[bot] over 2 years ago
Labels: dependencies
#145 - ⬆️ Bump lxml from 4.2.5 to 4.6.5
Pull Request -
State: closed - Opened by dependabot[bot] almost 3 years ago
- 1 comment
Labels: dependencies
#144 - ⬆️ Bump scrapy from 1.5.1 to 1.8.1
Pull Request -
State: closed - Opened by dependabot[bot] almost 3 years ago
- 1 comment
Labels: dependencies
#143 - ⬆️ Bump py from 1.7.0 to 1.10.0
Pull Request -
State: open - Opened by dependabot[bot] over 3 years ago
Labels: dependencies
#142 - ⬆️ Bump lxml from 4.2.5 to 4.6.3
Pull Request -
State: closed - Opened by dependabot[bot] over 3 years ago
- 1 comment
Labels: dependencies
#141 - ⬆️ Bump lxml from 4.2.5 to 4.6.2
Pull Request -
State: closed - Opened by dependabot[bot] over 3 years ago
- 1 comment
Labels: dependencies
#140 - ⬆️ Bump cryptography from 2.4.2 to 3.2
Pull Request -
State: closed - Opened by dependabot[bot] almost 4 years ago
- 1 comment
Labels: dependencies
#139 - ⬆️ Bump twisted from 18.9.0 to 19.7.0
Pull Request -
State: closed - Opened by dependabot[bot] almost 5 years ago
- 1 comment
Labels: dependencies
#138 - Add other reference section names
Issue -
State: open - Opened by lizgzil over 5 years ago
- 1 comment
Labels: enhancement
#137 - Fix Sentry spamming us with Http Errors
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
#136 - Remove the pdf_text column from the scraper database
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
- 3 comments
Labels: enhancement
#134 - Update the Makefile version
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
- 1 comment
#133 - Remove the pdf_text field from the scraper
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
#132 - Catch the TypeErrors encountered by the pdf_parser module
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
#130 - Add scrape again to the updated columns
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
- 3 comments
#129 - Pulling publications from the database into the warehouse leads to an oom
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
- 7 comments
Labels: bug
#128 - OSError 36 fix
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
#127 - Twisted errors investigation
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
- 1 comment
Labels: bug
#126 - UnboundLocal error
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug
#125 - NoneType on some PDF
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug
#124 - OSError 36 on some gov uk files
Issue -
State: closed - Opened by SamDepardieu over 5 years ago
Labels: bug
#123 - Fix sentry
Pull Request -
State: closed - Opened by SamDepardieu over 5 years ago
#122 - Keyword sentence content in varying formats
Issue -
State: closed - Opened by lizgzil over 5 years ago
- 8 comments
Labels: bug
#121 - Sentry
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
#120 - Fix the scraper's JSON results file
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
#119 - Scraper doesn't set the scrape again flag to false
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug
#118 - :arrow_up: Upgrade requests library
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
#117 - Scraper got killed when scraping WHO
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
Labels: bug
#116 - The scraper has some issues with the json files
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
Labels: bug
#115 - Accessing scraped data for gov_uk takes more than 900 seconds.
Issue -
State: closed - Opened by nsorros almost 6 years ago
- 2 comments
Labels: question
#114 - Convert the item to a dict is we want to scrape again
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
#112 - Fix some pdf related issues
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
#111 - Use tempfiles instead of files in a tempdir
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement
#110 - Remove files on error and already scraped items
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
- 6 comments
#109 - Parliament spider sometimes fail to recognize PDFs
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug
#108 - Some documents leads to an issue
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug
#107 - Some documents leads to an Item error
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug
#106 - Parliament scraping taking too long/not deleting files
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
Labels: bug
#105 - Add a push option to the makefile
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement
#104 - Add parliament scraping to the scraper
Pull Request -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: enhancement
#103 - WHO scraping sometimes fails
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
- 1 comment
Labels: bug
#102 - Long scrapings are failing in AWS
Issue -
State: closed - Opened by SamDepardieu almost 6 years ago
Labels: bug
#101 - Facilitate development from within docker
Pull Request -
State: closed - Opened by hblanks almost 6 years ago
- 4 comments
#100 - tools/AWSFeedStorage.py: remove references to DynamoDB
Pull Request -
State: closed - Opened by hblanks almost 6 years ago
#99 - [WiP] Fix the WHO scraping task
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 2 comments
Labels: bug
#98 - Scrape the Parliament website
Issue -
State: closed - Opened by SamDepardieu about 6 years ago
Labels: enhancement
#97 - Update readme and installation worklow
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
#96 - Add Contributing guidelines and PR template to the repo
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 3 comments
#95 - Change the UNICEF scraping attributes
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
#94 - Remove DynamoDB testing as we're not using dynamo anymore
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
#93 - Remove moto from the pipenv dependencies
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
#92 - Keyword match returns no associated text
Issue -
State: closed - Opened by nsorros about 6 years ago
- 2 comments
Labels: bug
#91 - Fixing a few issues concerning the scraper and RDS
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 2 comments
#90 - Dependencies compatibility
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 2 comments
#89 - Allow spliting per year
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 2 comments
#88 - Allow gov uk scraping to be splitted to let it finish on AWS
Issue -
State: closed - Opened by SamDepardieu about 6 years ago
#87 - Rds compatibility
Pull Request -
State: closed - Opened by SamDepardieu about 6 years ago
- 8 comments
#86 - Add timestamp to output
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#85 - Sorted in s3 sorts in the opposite than expected order potentially
Issue -
State: closed - Opened by nsorros over 6 years ago
Labels: bug
#83 - Provider shouldn't be spider name for parallelisation pruposes
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement
#82 - Item['pdf'] is not obvious that contains the filename of the pdf
Issue -
State: closed - Opened by nsorros over 6 years ago
- 3 comments
#81 - Refactor the file management part of the spiders
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#80 - Add a MSF spider to scrape their website
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 3 comments
#79 - Add the output format in the readme.md
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 3 comments
#78 - Fix error message due to who item lacking an attribute
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#77 - Add gov.uk to the scraping list
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
#76 - Add new fields to the result file
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
#75 - Parametered launch of scrapy
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#74 - Create a spider to scrape the unicef website
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
#73 - Modify the web scrapper in order to make it works with the new AWS architecture
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
- 2 comments
#72 - Fix OSError [2] on file deletion
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#71 - Fix "Error article table doesn't exist"
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#70 - Update README.md
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#69 - Investigate scraping issues on AWS
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#68 - Dynamodb compatibility
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#67 - Make AWS credentials an environment variable
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#66 - Write a better readme
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
- 6 comments
#65 - Remove DSX support
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#62 - Implement (optionnal) file type checks for generic crawls
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
Labels: enhancement
#61 - Switch keyword analysis to a different project ?
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
Labels: help wanted, question
#60 - Create specific url filter for generic crawls
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement
#59 - Create a generic crawl pipeline
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#58 - Change the scraper to use an ORM
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
Labels: enhancement
#57 - Create a generic crawling spider
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
- 3 comments
#56 - Who Iris design change
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#55 - Make database connector abstract
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
#54 - Make Scrapy compatible with DynamoDB
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#53 - Create an entrypoint script
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#51 - Add authentication to webservices
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
- 1 comment
#45 - Small changes to PDF processing and code cleaning
Pull Request -
State: closed - Opened by SamDepardieu over 6 years ago
#40 - Add new webservices to scrapyd
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#38 - Change DB schema to include sections, keywords and parsed_pdf
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
#27 - Incremental scraping 🎊
Issue -
State: closed - Opened by SamDepardieu over 6 years ago
Labels: Epic