An open API service for providing issue and pull request metadata for open source projects.

GitHub / apache/nutch issues and pull requests

#850 - NUTCH-3110 Upgrade to Tika 3.1.0

Pull Request - State: open - Opened by sebastian-nagel 4 months ago

#849 - NUTCH-3108 Fix SLF4J Class Loader Conflict in language-identifier

Pull Request - State: closed - Opened by maciejpuzianowski 5 months ago - 6 comments

#847 - [NUTCH-3106] fix Issue with SSLHandshakeException

Pull Request - State: open - Opened by tatecn 6 months ago

#846 - Main

Pull Request - State: closed - Opened by qdzzyb2014 7 months ago

#844 - NUTCH-3097 fixed dependencies for indexer-elastic

Pull Request - State: closed - Opened by maciejpuzianowski 8 months ago - 1 comment

#841 - NUTCH-3094 Github tests to run if build configuration changes

Pull Request - State: closed - Opened by sebastian-nagel 8 months ago - 1 comment

#840 - NUTCH-3093 Ant target test-plugins to depend on compile-core-test

Pull Request - State: closed - Opened by sebastian-nagel 8 months ago - 1 comment

#839 - NUTCH-3092 Replace all imports of commons-lang by commons-lang3

Pull Request - State: closed - Opened by sebastian-nagel 8 months ago - 2 comments

#838 - NUTCH-3085 Augment CI by adding code coverage and code quality reporting

Pull Request - State: open - Opened by lewismc 9 months ago - 2 comments

#836 - NUTCH-2771 Tests in nightly builds: skip long runners

Pull Request - State: closed - Opened by sebastian-nagel 9 months ago

#835 - NUTCH-3086 Consolidate plugin extension names and IDs

Pull Request - State: closed - Opened by sebastian-nagel 9 months ago

#834 - NUTCH-3083 Add RobotRulesParser to bin/nutch

Pull Request - State: closed - Opened by sebastian-nagel 9 months ago

#833 - NUTCH-3084 Improve CI by filtering and separating plugin and core test executiion

Pull Request - State: closed - Opened by lewismc 9 months ago - 1 comment

#832 - NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads"

Pull Request - State: open - Opened by sebastian-nagel 9 months ago - 1 comment

#831 - NUTCH-3075 tld plugin makes injector crash

Pull Request - State: closed - Opened by sebastian-nagel 9 months ago

#828 - NUTCH-3073 Address Java compiler warning

Pull Request - State: closed - Opened by sebastian-nagel 10 months ago - 1 comment

#827 - NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved

Pull Request - State: closed - Opened by sebastian-nagel 10 months ago - 2 comments

#826 - [NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj

Pull Request - State: open - Opened by HiranChaudhuri 10 months ago - 3 comments

#824 - NUTCH-3066 Protocol plugin unit tests fail randomly

Pull Request - State: closed - Opened by sebastian-nagel 11 months ago - 1 comment

#823 - NUTCH-3065 Format changelog as markdown

Pull Request - State: closed - Opened by sebastian-nagel 11 months ago - 2 comments

#821 - NUTCH-3061 URL filters to log name of the rules file

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago - 1 comment

#820 - NUTCH-3058 Fetcher: counter for hung threads

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago - 2 comments

#819 - NUTCH-3057 - Fix for index-arbitrary plugin improper retention and us…

Pull Request - State: closed - Opened by CatChullain about 1 year ago - 5 comments

#818 - NUTCH-3055 README: fix Github "hub" commands

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago

#817 - NUTCH-3054 Address deprecation of Node16 for all GitHub Actions

Pull Request - State: closed - Opened by lewismc about 1 year ago

#816 - NUTCH-1806 Delegate processing of URL domains to crawler-commons

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago

#815 - NUTCH-3044 Generator: NPE when extracting the host part of a URL fails

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago - 3 comments

#814 - NUTCH-3043 Generator: count URLs rejected by URL filters

Pull Request - State: closed - Opened by sebastian-nagel about 1 year ago - 3 comments

#813 - NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters

Pull Request - State: closed - Opened by lewismc over 1 year ago - 1 comment

#812 - NUTCH-3039 Failure to handle ftp:// URLs

Pull Request - State: closed - Opened by sebastian-nagel over 1 year ago

#809 - NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

Pull Request - State: closed - Opened by lewismc over 1 year ago

#808 - NUTCH-3035 Update license and notice file for release of 1.20

Pull Request - State: closed - Opened by sebastian-nagel over 1 year ago - 2 comments

#807 - NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i…

Pull Request - State: closed - Opened by lewismc over 1 year ago - 3 comments

#806 - NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

Pull Request - State: closed - Opened by sebastian-nagel over 1 year ago - 1 comment

#805 - Update Dockerfile / JAVA_HOME - 2nd try

Pull Request - State: closed - Opened by derhecht over 1 year ago - 1 comment

#804 - Revert "Update Dockerfile / JAVA_HOME"

Pull Request - State: closed - Opened by lewismc over 1 year ago

#803 - NUTCH-3033 Upgrade Ivy to v2.5.2

Pull Request - State: closed - Opened by lewismc over 1 year ago - 4 comments

#802 - fix for NUTCH-3027 contributed by skehrli

Pull Request - State: closed - Opened by skehrli over 1 year ago - 1 comment

#801 - Update Dockerfile / JAVA_HOME

Pull Request - State: closed - Opened by derhecht over 1 year ago - 2 comments

#800 - [NUTCH-2834] Update crawl documentation / Fix #557

Pull Request - State: closed - Opened by derhecht over 1 year ago - 1 comment

#799 - NUTCH-3026 -- add statusOnly as an indexing option

Pull Request - State: closed - Opened by tballison over 1 year ago - 2 comments

#798 - fix for NUTCH-2812 contributed by GabeHaegele

Pull Request - State: closed - Opened by GabeHaegele over 1 year ago - 1 comment

#797 - NUTCH-3019 -- update Tika to 2.9.1

Pull Request - State: closed - Opened by tballison over 1 year ago - 2 comments

#796 - [NUTCH-3025] urlfilter-fast to filter based on the length of the URL

Pull Request - State: closed - Opened by jnioche over 1 year ago - 3 comments

#795 - NUTCH-3024 Remove flaky 'dependency check' target

Pull Request - State: closed - Opened by lewismc over 1 year ago

#794 - NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag

Pull Request - State: closed - Opened by tballison over 1 year ago - 1 comment

#793 - [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3

Pull Request - State: closed - Opened by jnioche over 1 year ago - 1 comment

#792 - Allow fast-urlfilter to load from HDFS/S3 and support gzipped input [NUTCH-3017]

Pull Request - State: closed - Opened by jnioche over 1 year ago - 1 comment

#791 - NUTCH-2887 Migrate to JUnit 5 Jupiter

Pull Request - State: open - Opened by lewismc over 1 year ago - 1 comment

#790 - NUTCH-3015 Add more CI steps to GitHub master-build.yml

Pull Request - State: closed - Opened by lewismc almost 2 years ago - 2 comments

#789 - NUTCH-3014 Standardize Job names

Pull Request - State: closed - Opened by lewismc almost 2 years ago

#788 - NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic

Pull Request - State: closed - Opened by lewismc almost 2 years ago

#784 - NUTCH-2897 Do not supress deprecated API warnings

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago

#783 - NUTCH-3010 Injector: count unique number of injected URLs

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago - 1 comment

#782 - NUTCH-3009 Upgrade to Hadoop 3.3.6

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago - 1 comment

#781 - NUTCH-3007 Fix impossible casts

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago

#780 - NUTCH-2852 SpotBugs: Method invokes System.exit(...)

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago

#779 - NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago - 3 comments

#778 - NUTCH-3004

Pull Request - State: closed - Opened by tballison almost 2 years ago

#776 - NUTCH-2959 -- upgrade Tika to 2.9.0

Pull Request - State: closed - Opened by tballison almost 2 years ago - 35 comments

#775 - NUTCH-2998 -- Remove Any23 from Nutch

Pull Request - State: closed - Opened by tballison almost 2 years ago - 1 comment

#774 - NUTCH-3001 - fix logic for grabbing bytes if there's no content type …

Pull Request - State: closed - Opened by tballison almost 2 years ago

#772 - NUTCH-2978 -- upgrade to log4j2 throughout

Pull Request - State: closed - Opened by tballison almost 2 years ago - 6 comments

#771 - NUTCH-2999 fix for initial PR

Pull Request - State: closed - Opened by tballison almost 2 years ago - 1 comment

#770 - NUTCH-2999 Upgrade Lucene to latest 8.x version throughout

Pull Request - State: closed - Opened by tballison almost 2 years ago

#769 - NUTCH-2978 -- move to log4j2 logging throughout

Pull Request - State: closed - Opened by tballison almost 2 years ago - 1 comment

#768 - NUTCH-2989 -- enable auth in ElasticIndexWriter for https

Pull Request - State: closed - Opened by tballison almost 2 years ago

#767 - NUTCH-2997 Add Override annotations

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago

#766 - NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago - 1 comment

#765 - NUTCH-2995 Upgrade to crawler-commons 1.4

Pull Request - State: closed - Opened by sebastian-nagel almost 2 years ago

#761 - NUTCH-2920 -- add an OpenSearchIndexWriter

Pull Request - State: closed - Opened by tballison over 2 years ago - 10 comments

#760 - NUTCH-2972 Javadoc build fails using JDK 17

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago - 3 comments

#759 - NUTCH-2985 Disable plugin urlfilter-validator by default

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago

#758 - NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago - 2 comments

#757 - NUTCH-2984 Drop test proxy server and benchmark tool

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago

#756 - NUTCH-2983 nutch-default.xml improvements

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago

#754 - NUTCH-2982 Generator: parameter for URL normalization not passed forward

Pull Request - State: closed - Opened by sebastian-nagel over 2 years ago - 1 comment