Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / elastic/crawler issues and pull requests

#214 - Add LATEST tag option to build jobs

Pull Request - State: open - Opened by navarone-feekery 10 days ago

#213 - Update default docker-compose version

Pull Request - State: closed - Opened by navarone-feekery 15 days ago - 2 comments
Labels: auto-backport, v0.2.2

#212 - [0.2] Fix CI pipeline (#211)

Pull Request - State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport

#211 - Fix CI pipeline

Pull Request - State: closed - Opened by navarone-feekery 15 days ago - 1 comment
Labels: auto-backport, v0.2.1

#210 - Update RELEASING.md

Pull Request - State: closed - Opened by navarone-feekery 15 days ago

#209 - [0.2] Add ingest pipeline for 9.x (#203)

Pull Request - State: closed - Opened by navarone-feekery 15 days ago
Labels: backport

#208 - [0.2] Allow for full HTML extraction (#204)

Pull Request - State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport

#205 - [0.2] Make elasticsearch the default value for output_sink (#176)

Pull Request - State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport

#204 - Allow for full HTML extraction

Pull Request - State: closed - Opened by navarone-feekery 15 days ago - 1 comment
Labels: auto-backport, v0.2.1

#203 - Add ingest pipeline for 9.x

Pull Request - State: closed - Opened by navarone-feekery 15 days ago - 2 comments
Labels: auto-backport, v0.2.1

#201 - Crawler attempting to call ent-search-generic-ingestion pipeline which doesn't exist

Issue - State: closed - Opened by jeffvestal 16 days ago
Labels: bug, complexity:low, priority:high

#200 - Update docker files to remove /root/.m2 directory after installation to not distribute build dependencies

Pull Request - State: closed - Opened by artem-shelkovnikov 16 days ago - 1 comment
Labels: auto-backport, v0.2.1

#199 - [0.2] Fix scheduling documentation (#196)

Pull Request - State: closed - Opened by github-actions[bot] 17 days ago
Labels: backport

#199 - [0.2] Fix scheduling documentation (#196)

Pull Request - State: closed - Opened by github-actions[bot] 17 days ago
Labels: backport

#198 - Allow raw HTML to be ingested

Issue - State: closed - Opened by navarone-feekery 17 days ago
Labels: enhancement, complexity:low, priority:high

#197 - Add HTML to Markdown functionality

Issue - State: open - Opened by navarone-feekery 17 days ago - 3 comments
Labels: enhancement, priority:medium, complexity:medium

#196 - Fix scheduling documentation

Pull Request - State: closed - Opened by navarone-feekery 17 days ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0, v0.2.1

#196 - Fix scheduling documentation

Pull Request - State: closed - Opened by navarone-feekery 17 days ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0, v0.2.1

#195 - schedule command causes "TypeError: no implicit conversion of Symbol into Integer"

Issue - State: closed - Opened by djsiroky 18 days ago - 2 comments
Labels: bug, community-driven, priority:medium, complexity:low

#194 - [0.2] Fixing EXTRACTION_RULES link (#193)

Pull Request - State: closed - Opened by github-actions[bot] 21 days ago
Labels: backport

#193 - Fixing EXTRACTION_RULES link

Pull Request - State: closed - Opened by JoseLuisGJ 21 days ago - 1 comment
Labels: documentation, v0.2.0, auto-backport

#192 - Adding ES verification step + explicit best-effort index creation during ES Sink initialization

Pull Request - State: closed - Opened by mattnowzari 22 days ago - 1 comment
Labels: enhancement, auto-backport, v0.2.1

#191 - Update README.md

Pull Request - State: closed - Opened by navarone-feekery 22 days ago
Labels: v0.3.0

#190 - Bump version to 0.2.2

Pull Request - State: closed - Opened by navarone-feekery 22 days ago - 2 comments
Labels: v0.2.2

#188 - Downloading files to a directory

Issue - State: open - Opened by asevillano 23 days ago - 13 comments
Labels: bug

#187 - [SNYK] Bump nokogiri lib

Pull Request - State: closed - Opened by jedrazb 23 days ago
Labels: v0.2.2

#186 - Adding check to ES sink to check if index is present before crawling

Pull Request - State: closed - Opened by mattnowzari 25 days ago - 2 comments
Labels: auto-backport, v0.2.1

#185 - Make ES request settings configurable

Issue - State: open - Opened by navarone-feekery 25 days ago
Labels: enhancement, good first issue, priority:medium, effort:low

#184 - Improve summary counters during- and post-crawl

Issue - State: open - Opened by navarone-feekery 29 days ago
Labels: enhancement, priority:medium, effort:low

#183 - Revamp main README to simplify setup process

Issue - State: open - Opened by navarone-feekery 29 days ago
Labels: documentation, enhancement

#182 - Add CLI command to test current config against a specific URL

Issue - State: open - Opened by navarone-feekery 29 days ago
Labels: enhancement, priority:medium, effort:medium

#181 - Allow optional ingestion of sitemap data

Issue - State: open - Opened by navarone-feekery about 1 month ago
Labels: enhancement, effort:low, priority:low

#180 - Fixes #179 - Omits the pipeline key when pipeline_enabled: false

Pull Request - State: closed - Opened by ugosan about 1 month ago - 3 comments
Labels: auto-backport, v0.2.1

#179 - `pipeline_enabled: false` Sends Pipeline as Empty String

Issue - State: closed - Opened by ugosan about 1 month ago
Labels: bug

#178 - Update documentation references in comment blocks

Issue - State: open - Opened by navarone-feekery about 2 months ago
Labels: housekeeping, effort:low, priority:high

#177 - Failed to fetch the response after downloading 10485760 bytes (hit the response size limit of 10485760)

Issue - State: closed - Opened by ZHLONG-CN 2 months ago - 2 comments
Labels: bug, community-driven

#176 - Make elasticsearch the default value for output_sink

Pull Request - State: closed - Opened by devesh-2002 2 months ago - 14 comments
Labels: auto-backport, v0.2.1

#175 - Bumping rexml

Pull Request - State: closed - Opened by seanstory 2 months ago - 1 comment
Labels: auto-backport, v0.3.0, v0.2.1

#174 - Make `elasticsearch` the default value for `output_sink`

Issue - State: closed - Opened by navarone-feekery 2 months ago - 3 comments
Labels: enhancement, good first issue

#173 - Add timestamps to the system logger

Pull Request - State: closed - Opened by navarone-feekery 3 months ago
Labels: release_note, v0.3.0

#172 - Crawler will not create a new index

Issue - State: closed - Opened by jeffvestal 3 months ago - 3 comments
Labels: bug, good first issue, complexity:low, priority:high

#171 - Increases test coverage for url validator code

Pull Request - State: closed - Opened by bsantanna 3 months ago - 8 comments

#170 - Add a quickstart guide

Pull Request - State: closed - Opened by navarone-feekery 3 months ago

#169 - Publish a `:latest` docker image

Issue - State: open - Opened by navarone-feekery 3 months ago
Labels: enhancement, effort:medium, priority:high

#168 - Scheduled purge of "hasn't been crawled in N days"

Issue - State: open - Opened by seanstory 3 months ago
Labels: enhancement, priority:medium, effort:high

#167 - join_as option that only grabs the first match

Issue - State: open - Opened by seanstory 3 months ago
Labels: enhancement, effort:low, priority:low

#166 - Bump webrick, move to test group

Pull Request - State: closed - Opened by seanstory 3 months ago
Labels: v0.3.0

#165 - Bump nokogiri, tika, remove explicit bouncycastle

Pull Request - State: closed - Opened by seanstory 3 months ago - 1 comment
Labels: v0.3.0

#164 - Update elasticsearch.yml.example

Pull Request - State: closed - Opened by navarone-feekery 4 months ago - 1 comment
Labels: v0.3.0

#163 - Update README.md

Pull Request - State: closed - Opened by navarone-feekery 4 months ago
Labels: v0.3.0

#163 - Update README.md

Pull Request - State: closed - Opened by navarone-feekery 4 months ago
Labels: v0.3.0

#162 - Support custom HTTP headers

Pull Request - State: open - Opened by vidok 4 months ago
Labels: v0.3.0

#162 - Support custom HTTP headers

Pull Request - State: open - Opened by vidok 4 months ago
Labels: v0.3.0

#161 - Support custom HTTP headers

Issue - State: open - Opened by vidok 4 months ago - 4 comments
Labels: enhancement

#161 - Support custom HTTP headers

Issue - State: open - Opened by vidok 4 months ago - 4 comments
Labels: enhancement, priority:medium, effort:medium

#160 - Test and document extra legacy config options

Issue - State: open - Opened by navarone-feekery 4 months ago
Labels: enhancement

#160 - Test and document extra legacy config options

Issue - State: open - Opened by navarone-feekery 4 months ago
Labels: enhancement, effort:low, priority:high

#159 - Elasticsearch config file takes precedence over crawler config

Issue - State: open - Opened by navarone-feekery 4 months ago
Labels: bug, effort:low, priority:high

#159 - Elasticsearch config file takes precedence over crawler config

Issue - State: open - Opened by navarone-feekery 4 months ago
Labels: bug

#158 - [Docker] file permission error when _bulk request fails

Issue - State: open - Opened by seanstory 4 months ago - 1 comment
Labels: bug

#158 - [Docker] file permission error when _bulk request fails

Issue - State: open - Opened by seanstory 4 months ago - 2 comments
Labels: bug, priority:medium, effort:low

#157 - Add support for crawling dynamic content

Pull Request - State: open - Opened by navarone-feekery 4 months ago

#157 - Add support for crawling dynamic content

Pull Request - State: open - Opened by navarone-feekery 4 months ago

#156 - Upgrade rexml to 3.3.9

Pull Request - State: closed - Opened by lhearachel 4 months ago

#156 - Upgrade rexml to 3.3.9

Pull Request - State: closed - Opened by lhearachel 4 months ago

#155 - Replace nokogiri with jsoup

Pull Request - State: open - Opened by navarone-feekery 4 months ago

#154 - Crawler fails when providing ES config in flat format.

Issue - State: open - Opened by vidok 4 months ago
Labels: bug, priority:medium, effort:low

#154 - Crawler fails when providing ES config in flat format.

Issue - State: open - Opened by vidok 4 months ago
Labels: bug

#153 - document volume mounting and docker-compose options

Issue - State: open - Opened by seanstory 4 months ago - 1 comment
Labels: documentation, enhancement, priority:medium, effort:low

#153 - document volume mounting and docker-compose options

Issue - State: open - Opened by seanstory 4 months ago - 1 comment
Labels: documentation, enhancement

#151 - Expand extraction rules to deny content based on selectors

Issue - State: open - Opened by navarone-feekery 5 months ago
Labels: enhancement, priority:medium, effort:high

#150 - Pin rexml version to 3.3.8

Pull Request - State: closed - Opened by navarone-feekery 5 months ago
Labels: v0.3.0

#149 - Add a configuration option to disable SSL for ES connections

Issue - State: open - Opened by navarone-feekery 5 months ago - 4 comments
Labels: enhancement, v0.3.0, priority:medium, effort:medium

#148 - [0.2] Use crawl for the first step vs schedule (#147)

Pull Request - State: closed - Opened by github-actions[bot] 5 months ago
Labels: backport

#147 - Use crawl for the first step vs schedule

Pull Request - State: closed - Opened by dadoonet 5 months ago - 2 comments
Labels: auto-backport, v0.3.0, v0.2.1

#146 - Rename the ingestion-team

Pull Request - State: closed - Opened by tutelaris 5 months ago

#144 - HTML Content Extraction

Issue - State: open - Opened by DasUberLeo 5 months ago - 4 comments
Labels: enhancement, priority:medium, effort:high

#143 - Revert "Update ent-search-eng team to be a search-eng team"

Pull Request - State: closed - Opened by seanstory 5 months ago - 1 comment

#142 - Update ent-search-eng team to be a search-eng team

Pull Request - State: closed - Opened by tutelaris 5 months ago - 2 comments

#141 - Bump product version to `0.2.1`

Pull Request - State: closed - Opened by navarone-feekery 6 months ago
Labels: v0.2.1

#140 - [0.2] Fix usage of in-built `File` lib (#139)

Pull Request - State: closed - Opened by navarone-feekery 6 months ago
Labels: backport, v0.2.1

#139 - Fix usage of in-built `File` lib

Pull Request - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: v0.3.0, v0.2.1

#138 - Output sink type `file` is broken

Issue - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: bug, v0.2.0

#137 - [0.2] Add RELEASING.md (#133)

Pull Request - State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport

#136 - [0.2] Fix crawl result logs (#134)

Pull Request - State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport

#135 - [0.2] Add docs for running official docker image (#132)

Pull Request - State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport

#134 - Fix crawl result logs

Pull Request - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#133 - Add RELEASING.md

Pull Request - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#132 - Add docs for running official docker image

Pull Request - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#131 - Crawl ID remains the same across scheduled crawls

Issue - State: open - Opened by navarone-feekery 6 months ago - 1 comment
Labels: bug, v0.2.0, effort:low, priority:low

#130 - Extraction rule fields applied to unrelated docs

Issue - State: closed - Opened by navarone-feekery 6 months ago - 1 comment
Labels: bug, v0.2.0

#129 - Content that is larger than `elasticsearch.bulk_api.max_size_bytes` is not ingested

Issue - State: closed - Opened by navarone-feekery 6 months ago - 4 comments
Labels: bug, v0.2.0

#128 - Crawl result erroneously logs a failure if there were no docs to purge

Issue - State: closed - Opened by navarone-feekery 6 months ago
Labels: bug, v0.2.0

#127 - [0.2] Add feature comparison table (#117)

Pull Request - State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport

#126 - [0.2] Add CRAWLER_DIRECTIVES.md and purge crawls documentation (#115)

Pull Request - State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport