Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / DigitalPebble/storm-crawler issues and pull requests
#1077 - HttpRobotRulesParser.java is not correctly formatted
Issue -
State: closed - Opened by michaeldinzinger over 1 year ago
- 1 comment
#1076 - Fix flaky test in AdaptiveSchedulerTest.testSchedule
Issue -
State: open - Opened by rzo1 over 1 year ago
Labels: enhancement
#1075 - Add test coverage reports with JaCoCo
Issue -
State: closed - Opened by jnioche over 1 year ago
- 9 comments
Labels: enhancement, core
#1074 - Increase the number of redirects to 5 for Robots.txt fetching
Pull Request -
State: closed - Opened by michaeldinzinger over 1 year ago
- 1 comment
Labels: core
#1073 - Create DeletionBolt.java for Solr. #1050
Pull Request -
State: closed - Opened by syefimov over 1 year ago
- 3 comments
Labels: enhancement, SOLR
#1072 - Batch requests in DeleterBolt
Issue -
State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, OpenSearch
#1071 - mechanism to retrieve more generic value of configuration
Pull Request -
State: closed - Opened by jnioche over 1 year ago
- 1 comment
Labels: enhancement, OpenSearch
#1070 - Add mechanism to retrieve more generic value of configuration if a specific one is not found
Issue -
State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, core, OpenSearch
#1069 - Automatic creation of index definitions should use the bolt type
Pull Request -
State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, OpenSearch
#1068 - Automatic creation of index definitions should use the bolt type
Issue -
State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, OpenSearch
#1067 - Dependency upgrades. fixes #1066
Pull Request -
State: closed - Opened by jnioche over 1 year ago
Labels: dependency
#1066 - Dependency upgrades
Issue -
State: closed - Opened by jnioche over 1 year ago
- 1 comment
#1065 - (Re)separate injection from crawl topologies in *Search archetypes
Issue -
State: closed - Opened by jnioche over 1 year ago
- 1 comment
Labels: elasticsearch, archetype, OpenSearch
#1064 - OpenSearch 2.7.0 + renamed OpenSearchConnection
Pull Request -
State: closed - Opened by jnioche over 1 year ago
Labels: dependency
#1063 - Upgrade to OpenSearch 2.7.0
Issue -
State: closed - Opened by jnioche over 1 year ago
Labels: dependency, OpenSearch
#1062 - BasicURLNormalizer .unmangleQueryString() returns invalid results if "&" symbol in a parents path #1059
Pull Request -
State: closed - Opened by syefimov over 1 year ago
- 2 comments
#1061 - SolrSpout IndexOutOfBoundsException in parsing group query result.
Issue -
State: closed - Opened by syefimov over 1 year ago
- 1 comment
#1060 - Upgrade version of TestContainers
Issue -
State: closed - Opened by jnioche over 1 year ago
Labels: enhancement, external
#1059 - BasicURLNormalizer .unmangleQueryString() returns invalid results if "&" symbol in a parents path
Issue -
State: closed - Opened by syefimov over 1 year ago
- 3 comments
Labels: bug, core
#1058 - Increasing the number of redirects for Robots.txt fetching
Issue -
State: closed - Opened by michaeldinzinger over 1 year ago
Labels: core
#1057 - Cache redirected robots.txt for target host only if path is /robots.txt and query is empty
Pull Request -
State: closed - Opened by sebastian-nagel over 1 year ago
- 1 comment
Labels: bug, core
#1056 - Change HttpProtocol to defer to configured values for retryOnConnectionFailure and followRedirects
Pull Request -
State: closed - Opened by ndtreviv over 1 year ago
- 1 comment
Labels: enhancement, core
#1055 - Issue #1042: Adapt parsing of robots.txt files
Pull Request -
State: closed - Opened by michaeldinzinger over 1 year ago
- 7 comments
Labels: enhancement, core
#1054 - Issue #1043: Fixing problems after restart of Frontier service
Pull Request -
State: closed - Opened by michaeldinzinger over 1 year ago
- 5 comments
Labels: bug, urlfrontier
#1053 - #1049 Replace "Collapse and Expand Results" Solr query with "Result Grouping" query.
Pull Request -
State: closed - Opened by syefimov over 1 year ago
- 8 comments
Labels: bug, SOLR
#1052 - Refactoring
Pull Request -
State: closed - Opened by gabbar23 over 1 year ago
- 2 comments
#1051 - nextFetchDate field in SOLR schema should be optional
Issue -
State: closed - Opened by syefimov over 1 year ago
- 1 comment
Labels: bug, SOLR
#1050 - storm-crawler-solr bug. Missing DeletionBolt bolt code.
Issue -
State: closed - Opened by syefimov over 1 year ago
- 3 comments
Labels: enhancement, SOLR
#1049 - Solr cloud results "Collapse/Expand" bug
Issue -
State: closed - Opened by syefimov over 1 year ago
- 1 comment
Labels: bug, SOLR
#1048 - Bump snakeyaml from 1.33 to 2.0 in /core
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 3 comments
Labels: dependency
#1047 - Fix #1032: Catch the exception inside the loop to avoid breaking if one remote instance is misbehaving
Pull Request -
State: closed - Opened by rzo1 almost 2 years ago
Labels: enhancement
#1046 - Fixes #1045. Remove range syntax from snakeyaml
Pull Request -
State: closed - Opened by rzo1 almost 2 years ago
- 1 comment
Labels: bug, core, dependency
#1045 - Bug: while submitting topology via flux to Aapache Storm
Issue -
State: closed - Opened by msghasan almost 2 years ago
- 4 comments
Labels: bug, dependency
#1044 - WARCHdfsBolt forwarding WARC file path to StatusUpdaterBolt
Issue -
State: open - Opened by michaeldinzinger almost 2 years ago
- 4 comments
#1043 - urlfrontier.StatusUpdaterBolt fails after reconnecting the URL Frontier to the SC
Issue -
State: closed - Opened by michaeldinzinger almost 2 years ago
- 1 comment
Labels: bug, external, urlfrontier
#1042 - Adapting rules for parsing robots.txt file
Issue -
State: closed - Opened by michaeldinzinger almost 2 years ago
- 6 comments
#1041 - Allow override on HttpProtocol's method addHeadersToRequest
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 1 comment
#1040 - okhttp.httpprotocol : Replace User Agent if specified into Metadata
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 4 comments
#1039 - Limit the amount of text to be returned by the text extraction, #1038
Pull Request -
State: closed - Opened by jnioche almost 2 years ago
- 2 comments
#1038 - Limit the amount of text to be returned by the text extraction
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, parser, core
#1037 - Limit number of outlinks while filtering them
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, parser
#1036 - Status ES document id
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 2 comments
Labels: enhancement, elasticsearch, OpenSearch
#1035 - Improvements to URL Filtering
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, core
#1034 - Create method to add SearchHit info to metadata
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 2 comments
Labels: enhancement, core
#1033 - Dependency upgrades
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: dependency
#1032 - RemoteDriverProtocol Throwing exception in catch and this will lead to misbehaviour of the crawler
Issue -
State: closed - Opened by msghasan almost 2 years ago
- 5 comments
Labels: enhancement
#1030 - Fix #1027: Ensure SC can be build with Java 17
Pull Request -
State: closed - Opened by rzo1 almost 2 years ago
- 1 comment
Labels: enhancement
#1029 - Enforce Java 11 in archetypes
Pull Request -
State: closed - Opened by msghasan almost 2 years ago
- 12 comments
#1028 - Indexer ES document id
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 7 comments
Labels: enhancement, elasticsearch, OpenSearch
#1027 - Ensure SC can be build with Java 17 and Maven 3.8.x
Issue -
State: closed - Opened by rzo1 almost 2 years ago
Labels: enhancement
#1026 - JsoupFilter as Interface
Pull Request -
State: closed - Opened by Mikwiss almost 2 years ago
- 1 comment
#1025 - Need Code refactoring for better code reusability in StatusUpdaterBolt
Issue -
State: closed - Opened by msghasan almost 2 years ago
- 11 comments
Labels: invalid
#1024 - Add support for Playwright
Issue -
State: open - Opened by rzo1 almost 2 years ago
- 2 comments
Labels: wish, help wanted
#1023 - Add support for Selenium Grid
Issue -
State: open - Opened by rzo1 almost 2 years ago
- 2 comments
Labels: wish, help wanted
#1022 - If stormcrawler above 2.5 uses Jdk 11 why the archetypes pom are not updated to 11
Issue -
State: closed - Opened by msghasan almost 2 years ago
- 15 comments
Labels: archetype
#1021 - Exclude xml-apis from xerces dependency
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, core
#1020 - Handle single quotes in value of http-equiv="refresh"
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, core
#1019 - Ignore empty fields indexer
Pull Request -
State: closed - Opened by jnioche almost 2 years ago
- 2 comments
Labels: enhancement, indexer
#1018 - Spouts should try to define the mapping for the status index if none exists
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: bug, OpenSearch
#1017 - Add an archetype for crawling with the OpenSearch module
Issue -
State: closed - Opened by jnioche almost 2 years ago
- 1 comment
Labels: enhancement, archetype, OpenSearch
#1016 - Dependency upgrades
Issue -
State: closed - Opened by jnioche almost 2 years ago
Labels: dependency
#1015 - Fix typos in YAML configuration files
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
Labels: enhancement, documentation
#1014 - [WARC] WARC response records missing WARC headers "WARC-IP-Address" and "WARC-Truncated" if protocolMDprefix is not empty
Issue -
State: closed - Opened by sebastian-nagel almost 2 years ago
Labels: bug, warc
#1013 - Usage of FastURLFilter via ES JSONURLFilterWrapper
Issue -
State: closed - Opened by jnioche almost 2 years ago
- 3 comments
Labels: enhancement, elasticsearch
#1011 - Opensearch module
Pull Request -
State: closed - Opened by jnioche almost 2 years ago
Labels: enhancement, external
#1010 - [WARC] Backward compatible storage of HTTP/2 headers
Pull Request -
State: closed - Opened by sebastian-nagel about 2 years ago
- 5 comments
Labels: warc
#1009 - Using URLFrontier in archetype
Pull Request -
State: closed - Opened by jnioche about 2 years ago
- 1 comment
#1008 - Use URLFrontier in archetype
Issue -
State: closed - Opened by jnioche about 2 years ago
Labels: archetype
#1007 - [SECURITY] Fix Temporary File Information Disclosure Vulnerability
Pull Request -
State: closed - Opened by JLLeitschuh about 2 years ago
- 1 comment
#1006 - Added AbstractConfigurable + URLFilter becomes an abstract class. Nam…
Pull Request -
State: closed - Opened by jnioche about 2 years ago
- 2 comments
Labels: core
#1006 - Added AbstractConfigurable + URLFilter becomes an abstract class. Nam…
Pull Request -
State: closed - Opened by jnioche about 2 years ago
- 2 comments
Labels: core
#1003 - Can disable MaxDepthFilter
Pull Request -
State: closed - Opened by Mikwiss about 2 years ago
- 4 comments
Labels: bug
#1003 - Can disable MaxDepthFilter
Pull Request -
State: closed - Opened by Mikwiss about 2 years ago
- 4 comments
Labels: bug
#1002 - Create an AbstractFilter
Pull Request -
State: closed - Opened by Mikwiss about 2 years ago
- 5 comments
#1001 - Fix MalformedURLException in JsoupParserBolt
Pull Request -
State: closed - Opened by Mikwiss about 2 years ago
- 2 comments
Labels: bug
#1000 - Bump jackson-databind from 2.13.3 to 2.13.4.1
Pull Request -
State: closed - Opened by dependabot[bot] about 2 years ago
- 2 comments
Labels: dependency
#999 - JSoupParserBolt improve performance of link extraction
Issue -
State: closed - Opened by jnioche about 2 years ago
Labels: enhancement, parser
#999 - JSoupParserBolt improve performance of link extraction
Issue -
State: closed - Opened by jnioche about 2 years ago
Labels: enhancement, parser
#998 - Dependency upgrades
Issue -
State: closed - Opened by jnioche about 2 years ago
- 1 comment
Labels: dependency
#997 - Bump snakeyaml from 1.30 to 1.31 in /core
Pull Request -
State: closed - Opened by dependabot[bot] about 2 years ago
Labels: dependency
#996 - Blocking fetcher thread
Issue -
State: open - Opened by Mikwiss about 2 years ago
- 4 comments
Labels: bug, fetcher
#995 - storm ui
Issue -
State: closed - Opened by abls1 over 2 years ago
- 2 comments
#994 - Delete redirected pages
Issue -
State: open - Opened by jnioche over 2 years ago
- 1 comment
Labels: core
#994 - Delete redirected pages
Issue -
State: open - Opened by jnioche over 2 years ago
- 1 comment
Labels: core
#993 - HttpProtocol use the md protocol.set-headers to add custom header by url
Pull Request -
State: closed - Opened by Mikwiss over 2 years ago
- 1 comment
Labels: enhancement, core
#993 - HttpProtocol use the md protocol.set-headers to add custom header by url
Pull Request -
State: closed - Opened by Mikwiss over 2 years ago
- 1 comment
Labels: enhancement, core
#992 - ES IndexerBold - Fix behaviour of afterBulk
Issue -
State: open - Opened by FelixEngl over 2 years ago
- 6 comments
#992 - ES IndexerBold - Fix behaviour of afterBulk
Issue -
State: open - Opened by FelixEngl over 2 years ago
- 6 comments
#991 - ConcurrentModificationException thrown by metrics in Fetcher executor
Issue -
State: open - Opened by jnioche over 2 years ago
Labels: bug
#989 - Fix starvation and busy waiting of ES IndexerBolt
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 4 comments
Labels: enhancement, elasticsearch
#988 - Fix starvation and busy waiting of ES StatusUpdaterBolt (Fixes #986)
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 2 comments
Labels: enhancement, elasticsearch
#987 - Update xsoup from 0.3.2 to 0.3.4
Issue -
State: closed - Opened by rzo1 over 2 years ago
Labels: dependency
#986 - Fix starvation and busy waiting of ES StatusUpdaterBolt
Issue -
State: closed - Opened by jnioche over 2 years ago
- 2 comments
Labels: enhancement, elasticsearch
#985 - Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
Labels: elasticsearch
#985 - Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
Labels: elasticsearch
#984 - Add unit test basics for URLFrontier.
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 4 comments
Labels: enhancement, urlfrontier
#983 - Fix starvation and busy waiting of StatusUpdaterBolt.java, add Constants.
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 11 comments
Labels: enhancement, urlfrontier
#982 - Add ChannelManager for local channel management and constants to Spout.java
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 5 comments
Labels: urlfrontier
#981 - [URLFrontier] URLFrontier extension not returning ID preventing Status-ACK making crawling impossible
Issue -
State: closed - Opened by FelixEngl over 2 years ago
- 6 comments
#980 - Overhaul urlfrontier-bolts, add tests, find problem with URLFrontier
Pull Request -
State: closed - Opened by FelixEngl over 2 years ago
- 7 comments