Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / DigitalPebble/storm-crawler issues and pull requests

#1077 - HttpRobotRulesParser.java is not correctly formatted

Issue - State: closed - Opened by michaeldinzinger about 1 year ago - 1 comment

#1076 - Fix flaky test in AdaptiveSchedulerTest.testSchedule

Issue - State: open - Opened by rzo1 about 1 year ago
Labels: enhancement

#1075 - Add test coverage reports with JaCoCo

Issue - State: closed - Opened by jnioche about 1 year ago - 9 comments
Labels: enhancement, core

#1074 - Increase the number of redirects to 5 for Robots.txt fetching

Pull Request - State: closed - Opened by michaeldinzinger about 1 year ago - 1 comment
Labels: core

#1073 - Create DeletionBolt.java for Solr. #1050

Pull Request - State: closed - Opened by syefimov about 1 year ago - 3 comments
Labels: enhancement, SOLR

#1072 - Batch requests in DeleterBolt

Issue - State: closed - Opened by jnioche about 1 year ago
Labels: enhancement, OpenSearch

#1071 - mechanism to retrieve more generic value of configuration

Pull Request - State: closed - Opened by jnioche about 1 year ago - 1 comment
Labels: enhancement, OpenSearch

#1070 - Add mechanism to retrieve more generic value of configuration if a specific one is not found

Issue - State: closed - Opened by jnioche about 1 year ago
Labels: enhancement, core, OpenSearch

#1069 - Automatic creation of index definitions should use the bolt type

Pull Request - State: closed - Opened by jnioche about 1 year ago
Labels: enhancement, OpenSearch

#1068 - Automatic creation of index definitions should use the bolt type

Issue - State: closed - Opened by jnioche about 1 year ago
Labels: enhancement, OpenSearch

#1067 - Dependency upgrades. fixes #1066

Pull Request - State: closed - Opened by jnioche about 1 year ago
Labels: dependency

#1066 - Dependency upgrades

Issue - State: closed - Opened by jnioche about 1 year ago - 1 comment

#1065 - (Re)separate injection from crawl topologies in *Search archetypes

Issue - State: closed - Opened by jnioche about 1 year ago - 1 comment
Labels: elasticsearch, archetype, OpenSearch

#1064 - OpenSearch 2.7.0 + renamed OpenSearchConnection

Pull Request - State: closed - Opened by jnioche about 1 year ago
Labels: dependency

#1063 - Upgrade to OpenSearch 2.7.0

Issue - State: closed - Opened by jnioche about 1 year ago
Labels: dependency, OpenSearch

#1061 - SolrSpout IndexOutOfBoundsException in parsing group query result.

Issue - State: closed - Opened by syefimov about 1 year ago - 1 comment

#1060 - Upgrade version of TestContainers

Issue - State: closed - Opened by jnioche about 1 year ago
Labels: enhancement, external

#1059 - BasicURLNormalizer .unmangleQueryString() returns invalid results if "&" symbol in a parents path

Issue - State: closed - Opened by syefimov about 1 year ago - 3 comments
Labels: bug, core

#1058 - Increasing the number of redirects for Robots.txt fetching

Issue - State: closed - Opened by michaeldinzinger about 1 year ago
Labels: core

#1057 - Cache redirected robots.txt for target host only if path is /robots.txt and query is empty

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago - 1 comment
Labels: bug, core

#1056 - Change HttpProtocol to defer to configured values for retryOnConnectionFailure and followRedirects

Pull Request - State: closed - Opened by ndtreviv about 1 year ago - 1 comment
Labels: enhancement, core

#1055 - Issue #1042: Adapt parsing of robots.txt files

Pull Request - State: closed - Opened by michaeldinzinger about 1 year ago - 7 comments
Labels: enhancement, core

#1054 - Issue #1043: Fixing problems after restart of Frontier service

Pull Request - State: closed - Opened by michaeldinzinger about 1 year ago - 5 comments
Labels: bug, urlfrontier

#1053 - #1049 Replace "Collapse and Expand Results" Solr query with "Result Grouping" query.

Pull Request - State: closed - Opened by syefimov over 1 year ago - 8 comments
Labels: bug, SOLR

#1052 - Refactoring

Pull Request - State: closed - Opened by gabbar23 over 1 year ago - 2 comments

#1051 - nextFetchDate field in SOLR schema should be optional

Issue - State: closed - Opened by syefimov over 1 year ago - 1 comment
Labels: bug, SOLR

#1050 - storm-crawler-solr bug. Missing DeletionBolt bolt code.

Issue - State: closed - Opened by syefimov over 1 year ago - 3 comments
Labels: enhancement, SOLR

#1049 - Solr cloud results "Collapse/Expand" bug

Issue - State: closed - Opened by syefimov over 1 year ago - 1 comment
Labels: bug, SOLR

#1048 - Bump snakeyaml from 1.33 to 2.0 in /core

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 3 comments
Labels: dependency

#1047 - Fix #1032: Catch the exception inside the loop to avoid breaking if one remote instance is misbehaving

Pull Request - State: closed - Opened by rzo1 over 1 year ago
Labels: enhancement

#1046 - Fixes #1045. Remove range syntax from snakeyaml

Pull Request - State: closed - Opened by rzo1 over 1 year ago - 1 comment
Labels: bug, core, dependency

#1045 - Bug: while submitting topology via flux to Aapache Storm

Issue - State: closed - Opened by msghasan over 1 year ago - 4 comments
Labels: bug, dependency

#1044 - WARCHdfsBolt forwarding WARC file path to StatusUpdaterBolt

Issue - State: open - Opened by michaeldinzinger over 1 year ago - 4 comments

#1043 - urlfrontier.StatusUpdaterBolt fails after reconnecting the URL Frontier to the SC

Issue - State: closed - Opened by michaeldinzinger over 1 year ago - 1 comment
Labels: bug, external, urlfrontier

#1042 - Adapting rules for parsing robots.txt file

Issue - State: closed - Opened by michaeldinzinger over 1 year ago - 6 comments

#1041 - Allow override on HttpProtocol's method addHeadersToRequest

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 1 comment

#1040 - okhttp.httpprotocol : Replace User Agent if specified into Metadata

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 4 comments

#1039 - Limit the amount of text to be returned by the text extraction, #1038

Pull Request - State: closed - Opened by jnioche over 1 year ago - 2 comments

#1038 - Limit the amount of text to be returned by the text extraction

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, parser, core

#1037 - Limit number of outlinks while filtering them

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, parser

#1036 - Status ES document id

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 2 comments
Labels: enhancement, elasticsearch, OpenSearch

#1035 - Improvements to URL Filtering

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, core

#1034 - Create method to add SearchHit info to metadata

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 2 comments
Labels: enhancement, core

#1033 - Dependency upgrades

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: dependency

#1032 - RemoteDriverProtocol Throwing exception in catch and this will lead to misbehaviour of the crawler

Issue - State: closed - Opened by msghasan over 1 year ago - 5 comments
Labels: enhancement

#1030 - Fix #1027: Ensure SC can be build with Java 17

Pull Request - State: closed - Opened by rzo1 over 1 year ago - 1 comment
Labels: enhancement

#1029 - Enforce Java 11 in archetypes

Pull Request - State: closed - Opened by msghasan over 1 year ago - 12 comments

#1028 - Indexer ES document id

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 7 comments
Labels: enhancement, elasticsearch, OpenSearch

#1027 - Ensure SC can be build with Java 17 and Maven 3.8.x

Issue - State: closed - Opened by rzo1 over 1 year ago
Labels: enhancement

#1026 - JsoupFilter as Interface

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 1 comment

#1025 - Need Code refactoring for better code reusability in StatusUpdaterBolt

Issue - State: closed - Opened by msghasan over 1 year ago - 11 comments
Labels: invalid

#1024 - Add support for Playwright

Issue - State: open - Opened by rzo1 over 1 year ago - 2 comments
Labels: wish, help wanted

#1023 - Add support for Selenium Grid

Issue - State: open - Opened by rzo1 over 1 year ago - 2 comments
Labels: wish, help wanted

#1022 - If stormcrawler above 2.5 uses Jdk 11 why the archetypes pom are not updated to 11

Issue - State: closed - Opened by msghasan over 1 year ago - 15 comments
Labels: archetype

#1021 - Exclude xml-apis from xerces dependency

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, core

#1020 - Handle single quotes in value of http-equiv="refresh"

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, core

#1019 - Ignore empty fields indexer

Pull Request - State: closed - Opened by jnioche over 1 year ago - 2 comments
Labels: enhancement, indexer

#1018 - Spouts should try to define the mapping for the status index if none exists

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: bug, OpenSearch

#1017 - Add an archetype for crawling with the OpenSearch module

Issue - State: closed - Opened by jnioche over 1 year ago - 1 comment
Labels: enhancement, archetype, OpenSearch

#1016 - Dependency upgrades

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: dependency

#1015 - Fix typos in YAML configuration files

Pull Request - State: closed - Opened by sebastian-nagel over 1 year ago
Labels: enhancement, documentation

#1013 - Usage of FastURLFilter via ES JSONURLFilterWrapper

Issue - State: closed - Opened by jnioche over 1 year ago - 3 comments
Labels: enhancement, elasticsearch

#1011 - Opensearch module

Pull Request - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, external

#1010 - [WARC] Backward compatible storage of HTTP/2 headers

Pull Request - State: closed - Opened by sebastian-nagel over 1 year ago - 5 comments
Labels: warc

#1009 - Using URLFrontier in archetype

Pull Request - State: closed - Opened by jnioche over 1 year ago - 1 comment

#1008 - Use URLFrontier in archetype

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: archetype

#1007 - [SECURITY] Fix Temporary File Information Disclosure Vulnerability

Pull Request - State: closed - Opened by JLLeitschuh over 1 year ago - 1 comment

#1006 - Added AbstractConfigurable + URLFilter becomes an abstract class. Nam…

Pull Request - State: closed - Opened by jnioche over 1 year ago - 2 comments
Labels: core

#1006 - Added AbstractConfigurable + URLFilter becomes an abstract class. Nam…

Pull Request - State: closed - Opened by jnioche over 1 year ago - 2 comments
Labels: core

#1003 - Can disable MaxDepthFilter

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 4 comments
Labels: bug

#1003 - Can disable MaxDepthFilter

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 4 comments
Labels: bug

#1002 - Create an AbstractFilter

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 5 comments

#1001 - Fix MalformedURLException in JsoupParserBolt

Pull Request - State: closed - Opened by Mikwiss over 1 year ago - 2 comments
Labels: bug

#1000 - Bump jackson-databind from 2.13.3 to 2.13.4.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependency

#999 - JSoupParserBolt improve performance of link extraction

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, parser

#999 - JSoupParserBolt improve performance of link extraction

Issue - State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, parser

#998 - Dependency upgrades

Issue - State: closed - Opened by jnioche almost 2 years ago - 1 comment
Labels: dependency

#997 - Bump snakeyaml from 1.30 to 1.31 in /core

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependency

#996 - Blocking fetcher thread

Issue - State: open - Opened by Mikwiss almost 2 years ago - 4 comments
Labels: bug, fetcher

#995 - storm ui

Issue - State: closed - Opened by abls1 almost 2 years ago - 2 comments

#994 - Delete redirected pages

Issue - State: open - Opened by jnioche almost 2 years ago - 1 comment
Labels: core

#994 - Delete redirected pages

Issue - State: open - Opened by jnioche almost 2 years ago - 1 comment
Labels: core

#993 - HttpProtocol use the md protocol.set-headers to add custom header by url

Pull Request - State: closed - Opened by Mikwiss almost 2 years ago - 1 comment
Labels: enhancement, core

#993 - HttpProtocol use the md protocol.set-headers to add custom header by url

Pull Request - State: closed - Opened by Mikwiss almost 2 years ago - 1 comment
Labels: enhancement, core

#992 - ES IndexerBold - Fix behaviour of afterBulk

Issue - State: open - Opened by FelixEngl almost 2 years ago - 6 comments

#992 - ES IndexerBold - Fix behaviour of afterBulk

Issue - State: open - Opened by FelixEngl almost 2 years ago - 6 comments

#991 - ConcurrentModificationException thrown by metrics in Fetcher executor

Issue - State: open - Opened by jnioche almost 2 years ago
Labels: bug

#989 - Fix starvation and busy waiting of ES IndexerBolt

Pull Request - State: closed - Opened by FelixEngl almost 2 years ago - 4 comments
Labels: enhancement, elasticsearch

#988 - Fix starvation and busy waiting of ES StatusUpdaterBolt (Fixes #986)

Pull Request - State: closed - Opened by FelixEngl almost 2 years ago - 2 comments
Labels: enhancement, elasticsearch

#987 - Update xsoup from 0.3.2 to 0.3.4

Issue - State: closed - Opened by rzo1 almost 2 years ago
Labels: dependency

#986 - Fix starvation and busy waiting of ES StatusUpdaterBolt

Issue - State: closed - Opened by jnioche almost 2 years ago - 2 comments
Labels: enhancement, elasticsearch

#985 - Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module

Pull Request - State: closed - Opened by FelixEngl about 2 years ago
Labels: elasticsearch

#985 - Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module

Pull Request - State: closed - Opened by FelixEngl about 2 years ago
Labels: elasticsearch

#984 - Add unit test basics for URLFrontier.

Pull Request - State: closed - Opened by FelixEngl about 2 years ago - 4 comments
Labels: enhancement, urlfrontier

#983 - Fix starvation and busy waiting of StatusUpdaterBolt.java, add Constants.

Pull Request - State: closed - Opened by FelixEngl about 2 years ago - 11 comments
Labels: enhancement, urlfrontier

#982 - Add ChannelManager for local channel management and constants to Spout.java

Pull Request - State: closed - Opened by FelixEngl about 2 years ago - 5 comments
Labels: urlfrontier

#980 - Overhaul urlfrontier-bolts, add tests, find problem with URLFrontier

Pull Request - State: closed - Opened by FelixEngl about 2 years ago - 7 comments