GitHub / apache/nutch issues and pull requests
#857 - NUTCH-3118 Logging pattern missing one argument placeholder
Pull Request -
State: closed - Opened by sebastian-nagel 9 days ago
#855 - NUTCH-3116 Minor dependency upgrades, update license and notice files
Pull Request -
State: closed - Opened by sebastian-nagel 13 days ago
#854 - NUTCH-2976 SitemapProcessor: verify sitemap values added from sitemap to CrawlDB
Pull Request -
State: closed - Opened by sebastian-nagel 14 days ago
#850 - NUTCH-3110 Upgrade to Tika 3.1.0
Pull Request -
State: open - Opened by sebastian-nagel 4 months ago
#849 - NUTCH-3108 Fix SLF4J Class Loader Conflict in language-identifier
Pull Request -
State: closed - Opened by maciejpuzianowski 5 months ago
- 6 comments
#848 - [NUTCH-3103] Fixed custom max intervals for AdaptiveFetchSchedule
Pull Request -
State: open - Opened by martin-djukanovic 6 months ago
#847 - [NUTCH-3106] fix Issue with SSLHandshakeException
Pull Request -
State: open - Opened by tatecn 6 months ago
#846 - Main
Pull Request -
State: closed - Opened by qdzzyb2014 7 months ago
#845 - NUTCH-3087 BasicURLNormalizer to keep userinfo for protocols which might require it
Pull Request -
State: open - Opened by sebastian-nagel 8 months ago
- 1 comment
#844 - NUTCH-3097 fixed dependencies for indexer-elastic
Pull Request -
State: closed - Opened by maciejpuzianowski 8 months ago
- 1 comment
#843 - NUTCH-3094 Github tests to run if build configuration changes
Pull Request -
State: closed - Opened by sebastian-nagel 8 months ago
#842 - NUTCH-3095 Update .gitignore to ignore Hadoop native libraries
Pull Request -
State: closed - Opened by sebastian-nagel 8 months ago
#841 - NUTCH-3094 Github tests to run if build configuration changes
Pull Request -
State: closed - Opened by sebastian-nagel 8 months ago
- 1 comment
#840 - NUTCH-3093 Ant target test-plugins to depend on compile-core-test
Pull Request -
State: closed - Opened by sebastian-nagel 8 months ago
- 1 comment
#839 - NUTCH-3092 Replace all imports of commons-lang by commons-lang3
Pull Request -
State: closed - Opened by sebastian-nagel 8 months ago
- 2 comments
#838 - NUTCH-3085 Augment CI by adding code coverage and code quality reporting
Pull Request -
State: open - Opened by lewismc 9 months ago
- 2 comments
#837 - NUTCH-3079 Dumping a segment fails unless it has been fetched and parsed
Pull Request -
State: closed - Opened by sebastian-nagel 9 months ago
#836 - NUTCH-2771 Tests in nightly builds: skip long runners
Pull Request -
State: closed - Opened by sebastian-nagel 9 months ago
#835 - NUTCH-3086 Consolidate plugin extension names and IDs
Pull Request -
State: closed - Opened by sebastian-nagel 9 months ago
#834 - NUTCH-3083 Add RobotRulesParser to bin/nutch
Pull Request -
State: closed - Opened by sebastian-nagel 9 months ago
#833 - NUTCH-3084 Improve CI by filtering and separating plugin and core test executiion
Pull Request -
State: closed - Opened by lewismc 9 months ago
- 1 comment
#832 - NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads"
Pull Request -
State: open - Opened by sebastian-nagel 9 months ago
- 1 comment
#831 - NUTCH-3075 tld plugin makes injector crash
Pull Request -
State: closed - Opened by sebastian-nagel 9 months ago
#830 - This improves the way Nutch is erroring out, at least for local mode.
Pull Request -
State: open - Opened by HiranChaudhuri 9 months ago
#829 - NUTCH-3078 Unlock database when Injector finishes - regardless of result
Pull Request -
State: closed - Opened by HiranChaudhuri 9 months ago
#828 - NUTCH-3073 Address Java compiler warning
Pull Request -
State: closed - Opened by sebastian-nagel 10 months ago
- 1 comment
#827 - NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved
Pull Request -
State: closed - Opened by sebastian-nagel 10 months ago
- 2 comments
#826 - [NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj
Pull Request -
State: open - Opened by HiranChaudhuri 10 months ago
- 3 comments
#825 - WIP NUTCH-3064 Upgrade com.maxmind.geoip2:geoip2 dependency in geoip-index to v4.2.0
Pull Request -
State: open - Opened by lewismc 11 months ago
#824 - NUTCH-3066 Protocol plugin unit tests fail randomly
Pull Request -
State: closed - Opened by sebastian-nagel 11 months ago
- 1 comment
#823 - NUTCH-3065 Format changelog as markdown
Pull Request -
State: closed - Opened by sebastian-nagel 11 months ago
- 2 comments
#822 - NUTCH-3062 protocol-okhttp: optionally record HTTP and SSL/TLS versions
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
#821 - NUTCH-3061 URL filters to log name of the rules file
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
- 1 comment
#820 - NUTCH-3058 Fetcher: counter for hung threads
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
- 2 comments
#819 - NUTCH-3057 - Fix for index-arbitrary plugin improper retention and us…
Pull Request -
State: closed - Opened by CatChullain about 1 year ago
- 5 comments
#818 - NUTCH-3055 README: fix Github "hub" commands
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
#817 - NUTCH-3054 Address deprecation of Node16 for all GitHub Actions
Pull Request -
State: closed - Opened by lewismc about 1 year ago
#816 - NUTCH-1806 Delegate processing of URL domains to crawler-commons
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
#815 - NUTCH-3044 Generator: NPE when extracting the host part of a URL fails
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
- 3 comments
#814 - NUTCH-3043 Generator: count URLs rejected by URL filters
Pull Request -
State: closed - Opened by sebastian-nagel about 1 year ago
- 3 comments
#813 - NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters
Pull Request -
State: closed - Opened by lewismc over 1 year ago
- 1 comment
#812 - NUTCH-3039 Failure to handle ftp:// URLs
Pull Request -
State: closed - Opened by sebastian-nagel over 1 year ago
#811 - NUTCH-3038 Address issues discovered during 1.20 release management dryrun
Pull Request -
State: closed - Opened by lewismc over 1 year ago
#810 - NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time
Pull Request -
State: closed - Opened by CatChullain over 1 year ago
- 5 comments
#809 - NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
Pull Request -
State: closed - Opened by lewismc over 1 year ago
#808 - NUTCH-3035 Update license and notice file for release of 1.20
Pull Request -
State: closed - Opened by sebastian-nagel over 1 year ago
- 2 comments
#807 - NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i…
Pull Request -
State: closed - Opened by lewismc over 1 year ago
- 3 comments
#806 - NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
Pull Request -
State: closed - Opened by sebastian-nagel over 1 year ago
- 1 comment
#805 - Update Dockerfile / JAVA_HOME - 2nd try
Pull Request -
State: closed - Opened by derhecht over 1 year ago
- 1 comment
#804 - Revert "Update Dockerfile / JAVA_HOME"
Pull Request -
State: closed - Opened by lewismc over 1 year ago
#803 - NUTCH-3033 Upgrade Ivy to v2.5.2
Pull Request -
State: closed - Opened by lewismc over 1 year ago
- 4 comments
#802 - fix for NUTCH-3027 contributed by skehrli
Pull Request -
State: closed - Opened by skehrli over 1 year ago
- 1 comment
#801 - Update Dockerfile / JAVA_HOME
Pull Request -
State: closed - Opened by derhecht over 1 year ago
- 2 comments
#800 - [NUTCH-2834] Update crawl documentation / Fix #557
Pull Request -
State: closed - Opened by derhecht over 1 year ago
- 1 comment
#799 - NUTCH-3026 -- add statusOnly as an indexing option
Pull Request -
State: closed - Opened by tballison over 1 year ago
- 2 comments
#798 - fix for NUTCH-2812 contributed by GabeHaegele
Pull Request -
State: closed - Opened by GabeHaegele over 1 year ago
- 1 comment
#797 - NUTCH-3019 -- update Tika to 2.9.1
Pull Request -
State: closed - Opened by tballison over 1 year ago
- 2 comments
#796 - [NUTCH-3025] urlfilter-fast to filter based on the length of the URL
Pull Request -
State: closed - Opened by jnioche over 1 year ago
- 3 comments
#795 - NUTCH-3024 Remove flaky 'dependency check' target
Pull Request -
State: closed - Opened by lewismc over 1 year ago
#794 - NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag
Pull Request -
State: closed - Opened by tballison over 1 year ago
- 1 comment
#793 - [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3
Pull Request -
State: closed - Opened by jnioche over 1 year ago
- 1 comment
#792 - Allow fast-urlfilter to load from HDFS/S3 and support gzipped input [NUTCH-3017]
Pull Request -
State: closed - Opened by jnioche over 1 year ago
- 1 comment
#791 - NUTCH-2887 Migrate to JUnit 5 Jupiter
Pull Request -
State: open - Opened by lewismc over 1 year ago
- 1 comment
#790 - NUTCH-3015 Add more CI steps to GitHub master-build.yml
Pull Request -
State: closed - Opened by lewismc almost 2 years ago
- 2 comments
#789 - NUTCH-3014 Standardize Job names
Pull Request -
State: closed - Opened by lewismc almost 2 years ago
#788 - NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic
Pull Request -
State: closed - Opened by lewismc almost 2 years ago
#787 - NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unarsed documents
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#786 - NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 1 comment
#785 - NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#784 - NUTCH-2897 Do not supress deprecated API warnings
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#783 - NUTCH-3010 Injector: count unique number of injected URLs
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 1 comment
#782 - NUTCH-3009 Upgrade to Hadoop 3.3.6
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 1 comment
#781 - NUTCH-3007 Fix impossible casts
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#780 - NUTCH-2852 SpotBugs: Method invokes System.exit(...)
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#779 - NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 3 comments
#778 - NUTCH-3004
Pull Request -
State: closed - Opened by tballison almost 2 years ago
#777 - NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 1 comment
#776 - NUTCH-2959 -- upgrade Tika to 2.9.0
Pull Request -
State: closed - Opened by tballison almost 2 years ago
- 35 comments
#775 - NUTCH-2998 -- Remove Any23 from Nutch
Pull Request -
State: closed - Opened by tballison almost 2 years ago
- 1 comment
#774 - NUTCH-3001 - fix logic for grabbing bytes if there's no content type …
Pull Request -
State: closed - Opened by tballison almost 2 years ago
#773 - NUTCH-3000 - the selenium protocol should return the full html, not just the inner body
Pull Request -
State: closed - Opened by tballison almost 2 years ago
#772 - NUTCH-2978 -- upgrade to log4j2 throughout
Pull Request -
State: closed - Opened by tballison almost 2 years ago
- 6 comments
#771 - NUTCH-2999 fix for initial PR
Pull Request -
State: closed - Opened by tballison almost 2 years ago
- 1 comment
#770 - NUTCH-2999 Upgrade Lucene to latest 8.x version throughout
Pull Request -
State: closed - Opened by tballison almost 2 years ago
#769 - NUTCH-2978 -- move to log4j2 logging throughout
Pull Request -
State: closed - Opened by tballison almost 2 years ago
- 1 comment
#768 - NUTCH-2989 -- enable auth in ElasticIndexWriter for https
Pull Request -
State: closed - Opened by tballison almost 2 years ago
#767 - NUTCH-2997 Add Override annotations
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#766 - NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
- 1 comment
#765 - NUTCH-2995 Upgrade to crawler-commons 1.4
Pull Request -
State: closed - Opened by sebastian-nagel almost 2 years ago
#764 - NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern
Pull Request -
State: closed - Opened by sebastian-nagel about 2 years ago
#763 - NUTCH-2991 Support HTTP/S Header Authorization for Solr connections
Pull Request -
State: closed - Opened by sebastian-nagel about 2 years ago
#762 - NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached
Pull Request -
State: closed - Opened by sebastian-nagel about 2 years ago
#761 - NUTCH-2920 -- add an OpenSearchIndexWriter
Pull Request -
State: closed - Opened by tballison over 2 years ago
- 10 comments
#760 - NUTCH-2972 Javadoc build fails using JDK 17
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
- 3 comments
#759 - NUTCH-2985 Disable plugin urlfilter-validator by default
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
#758 - NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
- 2 comments
#757 - NUTCH-2984 Drop test proxy server and benchmark tool
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
#756 - NUTCH-2983 nutch-default.xml improvements
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
#755 - NUTCH-2982 Generator: parameter for URL normalization not passed forward
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
#754 - NUTCH-2982 Generator: parameter for URL normalization not passed forward
Pull Request -
State: closed - Opened by sebastian-nagel over 2 years ago
- 1 comment