Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / elastic/crawler issues and pull requests
#214 - Add LATEST tag option to build jobs
Pull Request -
State: open - Opened by navarone-feekery 10 days ago
#213 - Update default docker-compose version
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
- 2 comments
Labels: auto-backport, v0.2.2
#212 - [0.2] Fix CI pipeline (#211)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport
#211 - Fix CI pipeline
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
- 1 comment
Labels: auto-backport, v0.2.1
#210 - Update RELEASING.md
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
#209 - [0.2] Add ingest pipeline for 9.x (#203)
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
Labels: backport
#208 - [0.2] Allow for full HTML extraction (#204)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport
#207 - [0.2] Adding ES verification step + explicit best-effort index creation during ES Sink initialization (#192)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport
#206 - [0.2] Fixes #179 - Omits the pipeline key when pipeline_enabled: false (#180)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
#205 - [0.2] Make elasticsearch the default value for output_sink (#176)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport
#204 - Allow for full HTML extraction
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
- 1 comment
Labels: auto-backport, v0.2.1
#203 - Add ingest pipeline for 9.x
Pull Request -
State: closed - Opened by navarone-feekery 15 days ago
- 2 comments
Labels: auto-backport, v0.2.1
#202 - [0.2] Update docker files to remove /root/.m2 directory after installation to not distribute build dependencies (#200)
Pull Request -
State: closed - Opened by github-actions[bot] 15 days ago
Labels: backport
#201 - Crawler attempting to call ent-search-generic-ingestion pipeline which doesn't exist
Issue -
State: closed - Opened by jeffvestal 16 days ago
Labels: bug, complexity:low, priority:high
#200 - Update docker files to remove /root/.m2 directory after installation to not distribute build dependencies
Pull Request -
State: closed - Opened by artem-shelkovnikov 16 days ago
- 1 comment
Labels: auto-backport, v0.2.1
#199 - [0.2] Fix scheduling documentation (#196)
Pull Request -
State: closed - Opened by github-actions[bot] 17 days ago
Labels: backport
#199 - [0.2] Fix scheduling documentation (#196)
Pull Request -
State: closed - Opened by github-actions[bot] 17 days ago
Labels: backport
#198 - Allow raw HTML to be ingested
Issue -
State: closed - Opened by navarone-feekery 17 days ago
Labels: enhancement, complexity:low, priority:high
#197 - Add HTML to Markdown functionality
Issue -
State: open - Opened by navarone-feekery 17 days ago
- 3 comments
Labels: enhancement, priority:medium, complexity:medium
#196 - Fix scheduling documentation
Pull Request -
State: closed - Opened by navarone-feekery 17 days ago
- 1 comment
Labels: v0.2.0, auto-backport, v0.3.0, v0.2.1
#196 - Fix scheduling documentation
Pull Request -
State: closed - Opened by navarone-feekery 17 days ago
- 1 comment
Labels: v0.2.0, auto-backport, v0.3.0, v0.2.1
#195 - schedule command causes "TypeError: no implicit conversion of Symbol into Integer"
Issue -
State: closed - Opened by djsiroky 18 days ago
- 2 comments
Labels: bug, community-driven, priority:medium, complexity:low
#194 - [0.2] Fixing EXTRACTION_RULES link (#193)
Pull Request -
State: closed - Opened by github-actions[bot] 21 days ago
Labels: backport
#193 - Fixing EXTRACTION_RULES link
Pull Request -
State: closed - Opened by JoseLuisGJ 21 days ago
- 1 comment
Labels: documentation, v0.2.0, auto-backport
#192 - Adding ES verification step + explicit best-effort index creation during ES Sink initialization
Pull Request -
State: closed - Opened by mattnowzari 22 days ago
- 1 comment
Labels: enhancement, auto-backport, v0.2.1
#191 - Update README.md
Pull Request -
State: closed - Opened by navarone-feekery 22 days ago
Labels: v0.3.0
#190 - Bump version to 0.2.2
Pull Request -
State: closed - Opened by navarone-feekery 22 days ago
- 2 comments
Labels: v0.2.2
#189 - [0.2] Adding check to ES sink to check if index is present before crawling (#186)
Pull Request -
State: closed - Opened by github-actions[bot] 22 days ago
Labels: backport
#188 - Downloading files to a directory
Issue -
State: open - Opened by asevillano 23 days ago
- 13 comments
Labels: bug
#187 - [SNYK] Bump nokogiri lib
Pull Request -
State: closed - Opened by jedrazb 23 days ago
Labels: v0.2.2
#186 - Adding check to ES sink to check if index is present before crawling
Pull Request -
State: closed - Opened by mattnowzari 25 days ago
- 2 comments
Labels: auto-backport, v0.2.1
#185 - Make ES request settings configurable
Issue -
State: open - Opened by navarone-feekery 25 days ago
Labels: enhancement, good first issue, priority:medium, effort:low
#184 - Improve summary counters during- and post-crawl
Issue -
State: open - Opened by navarone-feekery 29 days ago
Labels: enhancement, priority:medium, effort:low
#183 - Revamp main README to simplify setup process
Issue -
State: open - Opened by navarone-feekery 29 days ago
Labels: documentation, enhancement
#182 - Add CLI command to test current config against a specific URL
Issue -
State: open - Opened by navarone-feekery 29 days ago
Labels: enhancement, priority:medium, effort:medium
#181 - Allow optional ingestion of sitemap data
Issue -
State: open - Opened by navarone-feekery about 1 month ago
Labels: enhancement, effort:low, priority:low
#180 - Fixes #179 - Omits the pipeline key when pipeline_enabled: false
Pull Request -
State: closed - Opened by ugosan about 1 month ago
- 3 comments
Labels: auto-backport, v0.2.1
#179 - `pipeline_enabled: false` Sends Pipeline as Empty String
Issue -
State: closed - Opened by ugosan about 1 month ago
Labels: bug
#178 - Update documentation references in comment blocks
Issue -
State: open - Opened by navarone-feekery about 2 months ago
Labels: housekeeping, effort:low, priority:high
#177 - Failed to fetch the response after downloading 10485760 bytes (hit the response size limit of 10485760)
Issue -
State: closed - Opened by ZHLONG-CN 2 months ago
- 2 comments
Labels: bug, community-driven
#176 - Make elasticsearch the default value for output_sink
Pull Request -
State: closed - Opened by devesh-2002 2 months ago
- 14 comments
Labels: auto-backport, v0.2.1
#175 - Bumping rexml
Pull Request -
State: closed - Opened by seanstory 2 months ago
- 1 comment
Labels: auto-backport, v0.3.0, v0.2.1
#174 - Make `elasticsearch` the default value for `output_sink`
Issue -
State: closed - Opened by navarone-feekery 2 months ago
- 3 comments
Labels: enhancement, good first issue
#173 - Add timestamps to the system logger
Pull Request -
State: closed - Opened by navarone-feekery 3 months ago
Labels: release_note, v0.3.0
#172 - Crawler will not create a new index
Issue -
State: closed - Opened by jeffvestal 3 months ago
- 3 comments
Labels: bug, good first issue, complexity:low, priority:high
#171 - Increases test coverage for url validator code
Pull Request -
State: closed - Opened by bsantanna 3 months ago
- 8 comments
#170 - Add a quickstart guide
Pull Request -
State: closed - Opened by navarone-feekery 3 months ago
#169 - Publish a `:latest` docker image
Issue -
State: open - Opened by navarone-feekery 3 months ago
Labels: enhancement, effort:medium, priority:high
#168 - Scheduled purge of "hasn't been crawled in N days"
Issue -
State: open - Opened by seanstory 3 months ago
Labels: enhancement, priority:medium, effort:high
#167 - join_as option that only grabs the first match
Issue -
State: open - Opened by seanstory 3 months ago
Labels: enhancement, effort:low, priority:low
#166 - Bump webrick, move to test group
Pull Request -
State: closed - Opened by seanstory 3 months ago
Labels: v0.3.0
#165 - Bump nokogiri, tika, remove explicit bouncycastle
Pull Request -
State: closed - Opened by seanstory 3 months ago
- 1 comment
Labels: v0.3.0
#164 - Update elasticsearch.yml.example
Pull Request -
State: closed - Opened by navarone-feekery 4 months ago
- 1 comment
Labels: v0.3.0
#163 - Update README.md
Pull Request -
State: closed - Opened by navarone-feekery 4 months ago
Labels: v0.3.0
#163 - Update README.md
Pull Request -
State: closed - Opened by navarone-feekery 4 months ago
Labels: v0.3.0
#162 - Support custom HTTP headers
Pull Request -
State: open - Opened by vidok 4 months ago
Labels: v0.3.0
#162 - Support custom HTTP headers
Pull Request -
State: open - Opened by vidok 4 months ago
Labels: v0.3.0
#161 - Support custom HTTP headers
Issue -
State: open - Opened by vidok 4 months ago
- 4 comments
Labels: enhancement
#161 - Support custom HTTP headers
Issue -
State: open - Opened by vidok 4 months ago
- 4 comments
Labels: enhancement, priority:medium, effort:medium
#160 - Test and document extra legacy config options
Issue -
State: open - Opened by navarone-feekery 4 months ago
Labels: enhancement
#160 - Test and document extra legacy config options
Issue -
State: open - Opened by navarone-feekery 4 months ago
Labels: enhancement, effort:low, priority:high
#159 - Elasticsearch config file takes precedence over crawler config
Issue -
State: open - Opened by navarone-feekery 4 months ago
Labels: bug, effort:low, priority:high
#159 - Elasticsearch config file takes precedence over crawler config
Issue -
State: open - Opened by navarone-feekery 4 months ago
Labels: bug
#158 - [Docker] file permission error when _bulk request fails
Issue -
State: open - Opened by seanstory 4 months ago
- 1 comment
Labels: bug
#158 - [Docker] file permission error when _bulk request fails
Issue -
State: open - Opened by seanstory 4 months ago
- 2 comments
Labels: bug, priority:medium, effort:low
#157 - Add support for crawling dynamic content
Pull Request -
State: open - Opened by navarone-feekery 4 months ago
#157 - Add support for crawling dynamic content
Pull Request -
State: open - Opened by navarone-feekery 4 months ago
#156 - Upgrade rexml to 3.3.9
Pull Request -
State: closed - Opened by lhearachel 4 months ago
#156 - Upgrade rexml to 3.3.9
Pull Request -
State: closed - Opened by lhearachel 4 months ago
#155 - Replace nokogiri with jsoup
Pull Request -
State: open - Opened by navarone-feekery 4 months ago
#154 - Crawler fails when providing ES config in flat format.
Issue -
State: open - Opened by vidok 4 months ago
Labels: bug, priority:medium, effort:low
#154 - Crawler fails when providing ES config in flat format.
Issue -
State: open - Opened by vidok 4 months ago
Labels: bug
#153 - document volume mounting and docker-compose options
Issue -
State: open - Opened by seanstory 4 months ago
- 1 comment
Labels: documentation, enhancement, priority:medium, effort:low
#153 - document volume mounting and docker-compose options
Issue -
State: open - Opened by seanstory 4 months ago
- 1 comment
Labels: documentation, enhancement
#151 - Expand extraction rules to deny content based on selectors
Issue -
State: open - Opened by navarone-feekery 5 months ago
Labels: enhancement, priority:medium, effort:high
#150 - Pin rexml version to 3.3.8
Pull Request -
State: closed - Opened by navarone-feekery 5 months ago
Labels: v0.3.0
#149 - Add a configuration option to disable SSL for ES connections
Issue -
State: open - Opened by navarone-feekery 5 months ago
- 4 comments
Labels: enhancement, v0.3.0, priority:medium, effort:medium
#148 - [0.2] Use crawl for the first step vs schedule (#147)
Pull Request -
State: closed - Opened by github-actions[bot] 5 months ago
Labels: backport
#147 - Use crawl for the first step vs schedule
Pull Request -
State: closed - Opened by dadoonet 5 months ago
- 2 comments
Labels: auto-backport, v0.3.0, v0.2.1
#146 - Rename the ingestion-team
Pull Request -
State: closed - Opened by tutelaris 5 months ago
#145 - Revert "Revert "Update ent-search-eng team to be a search-eng team""
Pull Request -
State: closed - Opened by seanstory 5 months ago
#144 - HTML Content Extraction
Issue -
State: open - Opened by DasUberLeo 5 months ago
- 4 comments
Labels: enhancement, priority:medium, effort:high
#143 - Revert "Update ent-search-eng team to be a search-eng team"
Pull Request -
State: closed - Opened by seanstory 5 months ago
- 1 comment
#142 - Update ent-search-eng team to be a search-eng team
Pull Request -
State: closed - Opened by tutelaris 5 months ago
- 2 comments
#141 - Bump product version to `0.2.1`
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
Labels: v0.2.1
#140 - [0.2] Fix usage of in-built `File` lib (#139)
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
Labels: backport, v0.2.1
#139 - Fix usage of in-built `File` lib
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: v0.3.0, v0.2.1
#138 - Output sink type `file` is broken
Issue -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: bug, v0.2.0
#137 - [0.2] Add RELEASING.md (#133)
Pull Request -
State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport
#136 - [0.2] Fix crawl result logs (#134)
Pull Request -
State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport
#135 - [0.2] Add docs for running official docker image (#132)
Pull Request -
State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport
#134 - Fix crawl result logs
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: v0.2.0, auto-backport, v0.3.0
#133 - Add RELEASING.md
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: v0.2.0, auto-backport, v0.3.0
#132 - Add docs for running official docker image
Pull Request -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: v0.2.0, auto-backport, v0.3.0
#131 - Crawl ID remains the same across scheduled crawls
Issue -
State: open - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: bug, v0.2.0, effort:low, priority:low
#130 - Extraction rule fields applied to unrelated docs
Issue -
State: closed - Opened by navarone-feekery 6 months ago
- 1 comment
Labels: bug, v0.2.0
#129 - Content that is larger than `elasticsearch.bulk_api.max_size_bytes` is not ingested
Issue -
State: closed - Opened by navarone-feekery 6 months ago
- 4 comments
Labels: bug, v0.2.0
#128 - Crawl result erroneously logs a failure if there were no docs to purge
Issue -
State: closed - Opened by navarone-feekery 6 months ago
Labels: bug, v0.2.0
#127 - [0.2] Add feature comparison table (#117)
Pull Request -
State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport
#126 - [0.2] Add CRAWLER_DIRECTIVES.md and purge crawls documentation (#115)
Pull Request -
State: closed - Opened by github-actions[bot] 6 months ago
Labels: backport