Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / teamhg-memex/undercrawler issues and pull requests
#83 - How to store urls and html content to json format?
Issue -
State: open - Opened by AlexPapas over 3 years ago
- 1 comment
#82 - How to set splash.plugins_enabled for Undercrawler.
Issue -
State: closed - Opened by nehakansal almost 6 years ago
- 3 comments
#81 - Blank pages extracted in a crawl.
Issue -
State: open - Opened by nehakansal about 6 years ago
- 5 comments
#80 - Where are debugLogs logged when splash.args debug is true?
Issue -
State: closed - Opened by nehakansal about 6 years ago
- 2 comments
#79 - Lua error.
Issue -
State: closed - Opened by nehakansal about 6 years ago
- 1 comment
#78 - How can i get both cookie and html through def parse(self,response)
Issue -
State: closed - Opened by bswbatman over 6 years ago
- 2 comments
#77 - Question about Downloader Middlewares
Issue -
State: open - Opened by nehakansal over 6 years ago
- 4 comments
#76 - Memory problems: SplashRequest references keep going up
Issue -
State: open - Opened by nehakansal over 6 years ago
- 9 comments
#75 - Undercrawler concurrency and Splash slots
Issue -
State: open - Opened by nehakansal almost 7 years ago
- 2 comments
#74 - Config/issues with running multiple crawls?
Issue -
State: closed - Opened by nehakansal almost 7 years ago
- 2 comments
#73 - What is the location of CDRv2 exports?
Issue -
State: open - Opened by arjunv over 7 years ago
- 1 comment
#72 - crazy form submitter is not using form url
Issue -
State: open - Opened by kmike over 7 years ago
#71 - updated from orig
Pull Request -
State: closed - Opened by thebennos almost 8 years ago
- 1 comment
#70 - Accept multiple URLs from the command line
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#69 - Use pathlib instead of codecs
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#68 - Add FOLLOW_LINKS option
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#67 - CDR v3
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#66 - Don't canonicalize file URLs: scrapy 1.4 compatability
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#65 - test_documents fails on scrapy master
Issue -
State: closed - Opened by lopuhin almost 8 years ago
- 1 comment
#64 - Update scrapy
Pull Request -
State: closed - Opened by lopuhin almost 8 years ago
- 2 comments
#63 - More screenshot options, save screenshot path to item metadata
Pull Request -
State: closed - Opened by lopuhin about 8 years ago
- 1 comment
#62 - EvalError: Refused to evaluate a string as JavaScript
Issue -
State: open - Opened by lopuhin about 8 years ago
- 5 comments
#61 - feature request Soft404
Issue -
State: open - Opened by thebennos about 8 years ago
#60 - Creating a working docker image
Issue -
State: closed - Opened by thebennos about 8 years ago
- 5 comments
#59 - S3 Filestorage
Issue -
State: closed - Opened by thebennos over 8 years ago
- 4 comments
#58 - Bad interaction of subdomains and autologin keychain
Issue -
State: open - Opened by lopuhin over 8 years ago
#57 - Redirect from domain to www.domain is not handled correctly without splash
Issue -
State: closed - Opened by lopuhin over 8 years ago
- 1 comment
#56 - Dockerfile for running undercrawler with arachnado
Pull Request -
State: closed - Opened by lopuhin over 8 years ago
- 1 comment
#55 - Simplify autologin installation on travis
Pull Request -
State: closed - Opened by lopuhin over 8 years ago
#54 - Do not fail too early - return error pages as well
Pull Request -
State: closed - Opened by lopuhin over 8 years ago
#53 - Use new settings variable names from autologin-middleware
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#52 - Optional splash support
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#51 - Lua page script timeouts when trying to render binary pages
Issue -
State: open - Opened by lopuhin almost 9 years ago
- 5 comments
#50 - An option to run without splash
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 2 comments
#49 - Use dupe predictor and utils from MaybeDont
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#48 - External autologin middleware
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#47 - Continue exploring possible duplicates
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#46 - Non-blocking autologin
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 3 comments
#45 - Long delay for the first request
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#44 - Do not create items for document urls we already fetched
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#43 - Fix domain regexp
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 2 comments
#42 - DupePredictor should assign more weight for recent samples
Issue -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#41 - don't always ignore duplicate pages
Issue -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#40 - download out-of-domain iframes
Issue -
State: open - Opened by kmike almost 9 years ago
- 3 comments
#39 - increase aggresiveness for file downloads
Issue -
State: open - Opened by kmike almost 9 years ago
- 1 comment
#38 - fragment is removed from pagination links
Issue -
State: open - Opened by kmike almost 9 years ago
#37 - issues with allowed domain regexp
Issue -
State: closed - Opened by kmike almost 9 years ago
#36 - confusing `WARNING: Dropped` lines in log
Issue -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#35 - spider can't be stopped with Ctrl-C when autologin is pending
Issue -
State: closed - Opened by kmike almost 9 years ago
- 3 comments
#34 - Cache lua_source and js_source
Pull Request -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#33 - Do not save duplicate documents
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#32 - log request priorities
Pull Request -
State: closed - Opened by kmike almost 9 years ago
#31 - use % formatter
Pull Request -
State: closed - Opened by EdwardBetts almost 9 years ago
- 2 comments
#30 - High memory and disk usage of splash requests
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 6 comments
#29 - Download files via splash
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#28 - Autologin should pass cookies for file download requests
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#27 - Do not apply AvoidDupContentMiddleware to downloaded documents
Issue -
State: closed - Opened by lopuhin almost 9 years ago
#26 - Support downloading files from Tor sites
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#25 - AutoThrottle is applied to downloaded files as well
Issue -
State: closed - Opened by lopuhin almost 9 years ago
#24 - NetworkX errors not retried
Issue -
State: open - Opened by lopuhin almost 9 years ago
- 5 comments
Labels: bug
#23 - SplashFormRequest
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#22 - Exception when autologin is not available
Issue -
State: closed - Opened by kmike almost 9 years ago
- 3 comments
#21 - Adjust crawling speed with AutoThrottle middleware
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#20 - Scrapy-splash cookies, autologin middleware tests
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#19 - Bug with cookies & redirect
Issue -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#18 - Add basic spider tests
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 2 comments
#17 - Remove hh_splash middleware, use scrapy-splash
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 1 comment
#16 - Pass splash_url to autologin http API
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#15 - Saving documents in CDR format
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#14 - Learn to predict duplicates by URLs during crawling
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 5 comments
#13 - detect logout URLs automatically
Issue -
State: open - Opened by kmike almost 9 years ago
- 4 comments
#12 - Allow passing auth cookies from settings
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#11 - Domain constraint by default, follow first redirect
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#10 - increase depth if a different paginator is followed
Issue -
State: open - Opened by kmike almost 9 years ago
#9 - Crazy search query submitter
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
#8 - follow more URLs: onclick, iframes
Pull Request -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#7 - an option to enable AdBlock filters
Pull Request -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#6 - A script to check for duplicate content
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 2 comments
#5 - splash render script: disable images, enable resource timeout
Pull Request -
State: closed - Opened by kmike almost 9 years ago
#4 - don't go out of domain for pagination URLs
Pull Request -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#3 - Export in CDRv2 format
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago
- 4 comments
#2 - prefer pagination links
Pull Request -
State: closed - Opened by kmike almost 9 years ago
- 1 comment
#1 - Autologin middleware
Pull Request -
State: closed - Opened by lopuhin almost 9 years ago