Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / elastic/crawler issues and pull requests

#144 - HTML Content Extraction

Issue - State: open - Opened by DasUberLeo 5 days ago - 4 comments
Labels: enhancement

#143 - Revert "Update ent-search-eng team to be a search-eng team"

Pull Request - State: closed - Opened by seanstory 5 days ago - 1 comment

#142 - Update ent-search-eng team to be a search-eng team

Pull Request - State: closed - Opened by tutelaris 5 days ago - 2 comments

#141 - Bump product version to `0.2.1`

Pull Request - State: closed - Opened by navarone-feekery 16 days ago
Labels: v0.2.1

#140 - [0.2] Fix usage of in-built `File` lib (#139)

Pull Request - State: closed - Opened by navarone-feekery 16 days ago
Labels: backport, v0.2.1

#139 - Fix usage of in-built `File` lib

Pull Request - State: closed - Opened by navarone-feekery 16 days ago - 1 comment
Labels: v0.3.0, v0.2.1

#138 - Output sink type `file` is broken

Issue - State: closed - Opened by navarone-feekery 16 days ago - 1 comment
Labels: bug, v0.2.0

#137 - [0.2] Add RELEASING.md (#133)

Pull Request - State: closed - Opened by github-actions[bot] 18 days ago
Labels: backport

#136 - [0.2] Fix crawl result logs (#134)

Pull Request - State: closed - Opened by github-actions[bot] 18 days ago
Labels: backport

#135 - [0.2] Add docs for running official docker image (#132)

Pull Request - State: closed - Opened by github-actions[bot] 18 days ago
Labels: backport

#134 - Fix crawl result logs

Pull Request - State: closed - Opened by navarone-feekery 18 days ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#133 - Add RELEASING.md

Pull Request - State: closed - Opened by navarone-feekery 18 days ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#132 - Add docs for running official docker image

Pull Request - State: closed - Opened by navarone-feekery 18 days ago - 1 comment
Labels: v0.2.0, auto-backport, v0.3.0

#131 - Crawl ID remains the same across scheduled crawls

Issue - State: open - Opened by navarone-feekery 18 days ago - 1 comment
Labels: bug, v0.2.0

#130 - Extraction rule fields applied to unrelated docs

Issue - State: closed - Opened by navarone-feekery 18 days ago - 1 comment
Labels: bug, v0.2.0

#129 - Content that is larger than `elasticsearch.bulk_api.max_size_bytes` is not ingested

Issue - State: closed - Opened by navarone-feekery 18 days ago - 4 comments
Labels: bug, v0.2.0

#128 - Crawl result erroneously logs a failure if there were no docs to purge

Issue - State: closed - Opened by navarone-feekery 18 days ago
Labels: bug, v0.2.0

#127 - [0.2] Add feature comparison table (#117)

Pull Request - State: closed - Opened by github-actions[bot] 19 days ago
Labels: backport

#126 - [0.2] Add CRAWLER_DIRECTIVES.md and purge crawls documentation (#115)

Pull Request - State: closed - Opened by github-actions[bot] 19 days ago
Labels: backport

#125 - [0.2] Add CHANGELOG.md and upgrade to beta (#121)

Pull Request - State: closed - Opened by navarone-feekery 21 days ago
Labels: backport

#124 - Flaky spec for bulk queue thread-locking

Issue - State: open - Opened by navarone-feekery 21 days ago
Labels: bug, v0.2.0, v0.3.0, flaky-spec

#123 - Update `.backportrc.json`

Pull Request - State: closed - Opened by navarone-feekery 23 days ago
Labels: v0.3.0

#122 - Bump version to 0.3.0

Pull Request - State: closed - Opened by navarone-feekery 23 days ago
Labels: v0.3.0

#121 - Add CHANGELOG.md and upgrade to beta

Pull Request - State: closed - Opened by navarone-feekery 23 days ago - 2 comments
Labels: v0.2.0, auto-backport

#120 - [0.1] Add docker publishing scripts and pipeline (#103)

Pull Request - State: closed - Opened by github-actions[bot] 23 days ago
Labels: backport

#119 - [0.1] Misc fixes to the Wolfi-based Dockerfile (#114)

Pull Request - State: closed - Opened by github-actions[bot] 24 days ago
Labels: backport

#118 - Add schedule command to CLI docs

Pull Request - State: closed - Opened by navarone-feekery 24 days ago
Labels: v0.2.0

#117 - Add feature comparison table

Pull Request - State: closed - Opened by navarone-feekery 24 days ago - 5 comments
Labels: v0.2.0, auto-backport

#116 - Clean up config docs

Pull Request - State: closed - Opened by navarone-feekery 24 days ago
Labels: v0.2.0

#115 - Add CRAWLER_DIRECTIVES.md and purge crawls documentation

Pull Request - State: closed - Opened by navarone-feekery 25 days ago - 2 comments
Labels: v0.2.0, auto-backport

#114 - Misc fixes to the Wolfi-based Dockerfile

Pull Request - State: closed - Opened by acrewdson 25 days ago - 1 comment
Labels: v0.1.1, v0.2.0, auto-backport

#113 - Add documentation for binary content extraction and ingest pipelines

Pull Request - State: closed - Opened by navarone-feekery 25 days ago - 1 comment
Labels: v0.2.0, release_note

#112 - Add scheduling CLI command

Pull Request - State: closed - Opened by navarone-feekery 25 days ago
Labels: v0.2.0, release_note

#110 - Update docker.elastic.co/wolfi/jdk Docker tag to openjdk-21.35-r1-dev

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] 26 days ago - 1 comment
Labels: v0.1.1, v0.2.0, auto-backport

#109 - [0.1] Add Dockerfile.wolfi (#106)

Pull Request - State: closed - Opened by github-actions[bot] 26 days ago
Labels: backport

#108 - Add extraction rules examples

Pull Request - State: closed - Opened by navarone-feekery 26 days ago
Labels: v0.2.0

#107 - [0.1] Update catalog-info.yaml for docker publishing (#104)

Pull Request - State: closed - Opened by navarone-feekery 26 days ago - 1 comment
Labels: backport

#106 - Add Dockerfile.wolfi

Pull Request - State: closed - Opened by navarone-feekery about 1 month ago - 6 comments
Labels: v0.1.1, v0.2.0, auto-backport

#105 - Add option to not crawl URLs already crawled in an index

Issue - State: open - Opened by jtele2 about 1 month ago - 2 comments
Labels: enhancement, community-driven

#104 - Update catalog-info.yaml for docker publishing

Pull Request - State: closed - Opened by navarone-feekery about 1 month ago - 4 comments
Labels: v0.1.1, v0.2.0, auto-backport

#103 - Add docker publishing scripts and pipeline

Pull Request - State: closed - Opened by navarone-feekery about 1 month ago - 5 comments
Labels: v0.1.1, v0.2.0, auto-backport

#102 - Make purge crawl pagination lazy

Issue - State: open - Opened by navarone-feekery about 1 month ago
Labels: enhancement

#101 - test, ignore

Pull Request - State: closed - Opened by acrewdson about 1 month ago - 1 comment

#100 - did I get the slug name wrong?

Pull Request - State: closed - Opened by seanstory about 1 month ago

#99 - Create pull-requests.json

Pull Request - State: closed - Opened by seanstory about 1 month ago

#98 - align with connector pipeline settings

Pull Request - State: closed - Opened by seanstory about 1 month ago

#97 - Update renovate to only consider chainguard

Pull Request - State: closed - Opened by seanstory about 1 month ago
Labels: v0.2.0

#96 - Update dependency bson to v4.15.0

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#95 - Update dependency json to v2.7.2

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#94 - Update dependency bigdecimal to v3.1.8

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#93 - added timestamps to the system logger

Pull Request - State: open - Opened by yashathwani about 1 month ago - 13 comments

#92 - Update dependency elasticsearch to '~> 8.15.0'

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#91 - Pin gems and set some platforms as jruby

Pull Request - State: closed - Opened by navarone-feekery about 1 month ago
Labels: v0.2.0

#90 - update webmock

Pull Request - State: closed - Opened by seanstory about 1 month ago - 4 comments
Labels: v0.2.0

#89 - main was missing some "make install" diffs

Pull Request - State: closed - Opened by seanstory about 1 month ago
Labels: v0.2.0

#88 - Update dependency concurrent-ruby to '~> 1.3.0'

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#87 - Update dependency bson to '~> 4.15.0'

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#86 - Update jruby Docker tag to v9.4.8.0

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago

#85 - Update dependency webmock to v3.23.1

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 2 comments

#84 - Update dependency thread_safe to v0.3.6

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#83 - Update dependency rack to '~> 2.2.9'

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#82 - Update dependency pry to v0.14.2

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 2 comments

#81 - Update dependency json-schema to v4.3.1

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#80 - Update dependency addressable to v2.8.7

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#79 - Update dependency activesupport to v6.1.7.8

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#78 - Dependency Dashboard

Issue - State: open - Opened by elastic-renovate-prod[bot] about 1 month ago

#77 - Update juliangruber/read-file-action digest to 386973d

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago - 1 comment

#76 - Pin dependencies

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago

#75 - Configure Renovate

Pull Request - State: closed - Opened by elastic-renovate-prod[bot] about 1 month ago

#74 - Add binary content extraction

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago - 1 comment
Labels: v0.2.0, release_note

#73 - [0.1] Bump rexml to 3.3.4 (#72)

Pull Request - State: closed - Opened by github-actions[bot] about 2 months ago
Labels: backport

#72 - Bump rexml to 3.3.4

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago - 1 comment
Labels: v0.1.1, v0.2.0, auto-backport

#71 - Enable quick storage of PDF file size and name using the web cralwer

Issue - State: closed - Opened by serenachou about 2 months ago - 1 comment
Labels: enhancement, v0.2.0

#70 - Rename fatal error to internal error

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago - 3 comments
Labels: v0.2.0

#69 - Add timestamps to system logger

Issue - State: open - Opened by navarone-feekery about 2 months ago - 4 comments
Labels: enhancement, good first issue

#68 - Update crawler.yml.example

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago
Labels: v0.2.0

#67 - Update docs

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago
Labels: v0.2.0

#66 - Add tool to re-attempt failed bulk index payloads

Issue - State: open - Opened by navarone-feekery about 2 months ago
Labels: enhancement

#65 - Add purge crawl feature

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago - 1 comment
Labels: v0.2.0, release_note

#64 - Refactor ES classes

Pull Request - State: closed - Opened by navarone-feekery about 2 months ago
Labels: v0.2.0

#63 - Fix redirect and error crawl result handling

Pull Request - State: closed - Opened by navarone-feekery 2 months ago - 1 comment
Labels: v0.2.0

#62 - Add crawl rules

Pull Request - State: closed - Opened by navarone-feekery 2 months ago
Labels: v0.2.0, release_note

#61 - Add URL content extraction

Pull Request - State: closed - Opened by navarone-feekery 2 months ago - 1 comment
Labels: v0.2.0

#60 - Crawler trying to turn redirects into documents

Issue - State: closed - Opened by jeffvestal 2 months ago
Labels: bug

#59 - Change field body_content to body

Pull Request - State: closed - Opened by navarone-feekery 2 months ago - 9 comments
Labels: v0.2.0, release_note

#58 - Add content extraction by rules

Pull Request - State: closed - Opened by navarone-feekery 2 months ago - 1 comment
Labels: v0.2.0, release_note

#57 - Add extraction rules config classes

Pull Request - State: closed - Opened by navarone-feekery 3 months ago - 3 comments
Labels: v0.2.0

#56 - ML inference to ingested document is not working

Issue - State: open - Opened by mikecalizo 3 months ago - 1 comment
Labels: bug

#55 - Update domains format in crawler config.yml

Pull Request - State: closed - Opened by navarone-feekery 3 months ago
Labels: v0.2.0, release_note

#54 - Add configuration parameters to limit crawled paths (allow / disallow)

Issue - State: closed - Opened by simonhearne 3 months ago - 2 comments
Labels: enhancement, v0.2.0

#53 - Confirm ES connection before starting crawls

Issue - State: open - Opened by navarone-feekery 3 months ago
Labels: enhancement, v0.2.0, community-driven

#52 - Add "depth" field

Issue - State: open - Opened by YazdanJahedi 3 months ago - 4 comments
Labels: enhancement, community-driven

#51 - [0.1] Improve setup docs and add CLI docs (#44)

Pull Request - State: closed - Opened by github-actions[bot] 3 months ago
Labels: backport

#50 - [0.1] Lock bulk queue while processing indexing request (#45)

Pull Request - State: closed - Opened by github-actions[bot] 3 months ago
Labels: backport

#49 - Add a general delay between crawl requests

Issue - State: open - Opened by navarone-feekery 3 months ago
Labels: enhancement

#48 - change "body_content" field name to "body"

Issue - State: closed - Opened by pezzking 3 months ago - 5 comments
Labels: enhancement, good first issue, v0.2.0, community-driven

#47 - Separating of Heading HTML tags

Issue - State: open - Opened by YazdanJahedi 3 months ago
Labels: enhancement, community-driven

#46 - Abide by crawl delays found in robots.txt

Issue - State: open - Opened by navarone-feekery 3 months ago
Labels: enhancement, v0.2.0