Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / teamhg-memex/undercrawler issues and pull requests

#83 - How to store urls and html content to json format?

Issue - State: open - Opened by AlexPapas over 3 years ago - 1 comment

#82 - How to set splash.plugins_enabled for Undercrawler.

Issue - State: closed - Opened by nehakansal almost 6 years ago - 3 comments

#81 - Blank pages extracted in a crawl.

Issue - State: open - Opened by nehakansal about 6 years ago - 5 comments

#80 - Where are debugLogs logged when splash.args debug is true?

Issue - State: closed - Opened by nehakansal about 6 years ago - 2 comments

#79 - Lua error.

Issue - State: closed - Opened by nehakansal about 6 years ago - 1 comment

#78 - How can i get both cookie and html through def parse(self,response)

Issue - State: closed - Opened by bswbatman over 6 years ago - 2 comments

#77 - Question about Downloader Middlewares

Issue - State: open - Opened by nehakansal over 6 years ago - 4 comments

#76 - Memory problems: SplashRequest references keep going up

Issue - State: open - Opened by nehakansal over 6 years ago - 9 comments

#75 - Undercrawler concurrency and Splash slots

Issue - State: open - Opened by nehakansal almost 7 years ago - 2 comments

#74 - Config/issues with running multiple crawls?

Issue - State: closed - Opened by nehakansal almost 7 years ago - 2 comments

#73 - What is the location of CDRv2 exports?

Issue - State: open - Opened by arjunv over 7 years ago - 1 comment

#72 - crazy form submitter is not using form url

Issue - State: open - Opened by kmike over 7 years ago

#71 - updated from orig

Pull Request - State: closed - Opened by thebennos almost 8 years ago - 1 comment

#70 - Accept multiple URLs from the command line

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#69 - Use pathlib instead of codecs

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#68 - Add FOLLOW_LINKS option

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#67 - CDR v3

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#66 - Don't canonicalize file URLs: scrapy 1.4 compatability

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#65 - test_documents fails on scrapy master

Issue - State: closed - Opened by lopuhin almost 8 years ago - 1 comment

#64 - Update scrapy

Pull Request - State: closed - Opened by lopuhin almost 8 years ago - 2 comments

#63 - More screenshot options, save screenshot path to item metadata

Pull Request - State: closed - Opened by lopuhin about 8 years ago - 1 comment

#62 - EvalError: Refused to evaluate a string as JavaScript

Issue - State: open - Opened by lopuhin about 8 years ago - 5 comments

#61 - feature request Soft404

Issue - State: open - Opened by thebennos about 8 years ago

#60 - Creating a working docker image

Issue - State: closed - Opened by thebennos about 8 years ago - 5 comments

#59 - S3 Filestorage

Issue - State: closed - Opened by thebennos over 8 years ago - 4 comments

#58 - Bad interaction of subdomains and autologin keychain

Issue - State: open - Opened by lopuhin over 8 years ago

#57 - Redirect from domain to www.domain is not handled correctly without splash

Issue - State: closed - Opened by lopuhin over 8 years ago - 1 comment

#56 - Dockerfile for running undercrawler with arachnado

Pull Request - State: closed - Opened by lopuhin over 8 years ago - 1 comment

#55 - Simplify autologin installation on travis

Pull Request - State: closed - Opened by lopuhin over 8 years ago

#54 - Do not fail too early - return error pages as well

Pull Request - State: closed - Opened by lopuhin over 8 years ago

#53 - Use new settings variable names from autologin-middleware

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#52 - Optional splash support

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#51 - Lua page script timeouts when trying to render binary pages

Issue - State: open - Opened by lopuhin almost 9 years ago - 5 comments

#50 - An option to run without splash

Issue - State: closed - Opened by lopuhin almost 9 years ago - 2 comments

#49 - Use dupe predictor and utils from MaybeDont

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#48 - External autologin middleware

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#47 - Continue exploring possible duplicates

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#46 - Non-blocking autologin

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 3 comments

#45 - Long delay for the first request

Issue - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#44 - Do not create items for document urls we already fetched

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#43 - Fix domain regexp

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 2 comments

#42 - DupePredictor should assign more weight for recent samples

Issue - State: closed - Opened by kmike almost 9 years ago - 1 comment

#41 - don't always ignore duplicate pages

Issue - State: closed - Opened by kmike almost 9 years ago - 1 comment

#40 - download out-of-domain iframes

Issue - State: open - Opened by kmike almost 9 years ago - 3 comments

#39 - increase aggresiveness for file downloads

Issue - State: open - Opened by kmike almost 9 years ago - 1 comment

#38 - fragment is removed from pagination links

Issue - State: open - Opened by kmike almost 9 years ago

#37 - issues with allowed domain regexp

Issue - State: closed - Opened by kmike almost 9 years ago

#36 - confusing `WARNING: Dropped` lines in log

Issue - State: closed - Opened by kmike almost 9 years ago - 1 comment

#35 - spider can't be stopped with Ctrl-C when autologin is pending

Issue - State: closed - Opened by kmike almost 9 years ago - 3 comments

#34 - Cache lua_source and js_source

Pull Request - State: closed - Opened by kmike almost 9 years ago - 1 comment

#33 - Do not save duplicate documents

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#32 - log request priorities

Pull Request - State: closed - Opened by kmike almost 9 years ago

#31 - use % formatter

Pull Request - State: closed - Opened by EdwardBetts almost 9 years ago - 2 comments

#30 - High memory and disk usage of splash requests

Issue - State: closed - Opened by lopuhin almost 9 years ago - 6 comments

#29 - Download files via splash

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#28 - Autologin should pass cookies for file download requests

Issue - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#27 - Do not apply AvoidDupContentMiddleware to downloaded documents

Issue - State: closed - Opened by lopuhin almost 9 years ago

#26 - Support downloading files from Tor sites

Issue - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#25 - AutoThrottle is applied to downloaded files as well

Issue - State: closed - Opened by lopuhin almost 9 years ago

#24 - NetworkX errors not retried

Issue - State: open - Opened by lopuhin almost 9 years ago - 5 comments
Labels: bug

#23 - SplashFormRequest

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#22 - Exception when autologin is not available

Issue - State: closed - Opened by kmike almost 9 years ago - 3 comments

#21 - Adjust crawling speed with AutoThrottle middleware

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#20 - Scrapy-splash cookies, autologin middleware tests

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#19 - Bug with cookies & redirect

Issue - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#18 - Add basic spider tests

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 2 comments

#17 - Remove hh_splash middleware, use scrapy-splash

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 1 comment

#16 - Pass splash_url to autologin http API

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#15 - Saving documents in CDR format

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#14 - Learn to predict duplicates by URLs during crawling

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 5 comments

#13 - detect logout URLs automatically

Issue - State: open - Opened by kmike almost 9 years ago - 4 comments

#12 - Allow passing auth cookies from settings

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#11 - Domain constraint by default, follow first redirect

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#10 - increase depth if a different paginator is followed

Issue - State: open - Opened by kmike almost 9 years ago

#9 - Crazy search query submitter

Pull Request - State: closed - Opened by lopuhin almost 9 years ago

#8 - follow more URLs: onclick, iframes

Pull Request - State: closed - Opened by kmike almost 9 years ago - 1 comment

#7 - an option to enable AdBlock filters

Pull Request - State: closed - Opened by kmike almost 9 years ago - 1 comment

#6 - A script to check for duplicate content

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 2 comments

#5 - splash render script: disable images, enable resource timeout

Pull Request - State: closed - Opened by kmike almost 9 years ago

#4 - don't go out of domain for pagination URLs

Pull Request - State: closed - Opened by kmike almost 9 years ago - 1 comment

#3 - Export in CDRv2 format

Pull Request - State: closed - Opened by lopuhin almost 9 years ago - 4 comments

#2 - prefer pagination links

Pull Request - State: closed - Opened by kmike almost 9 years ago - 1 comment

#1 - Autologin middleware

Pull Request - State: closed - Opened by lopuhin almost 9 years ago