Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / esmero/strawberry_runners issues and pull requests

#104 - Tesseract might output multiple HOCR bodies when a TIFF is multilayered

Issue - State: open - Opened by DiegoPino 19 days ago
Labels: bug, Solr Indexing, Post processor Plugins, ocrhighlight, External Bug

#103 - SBR temporary files are not always being properly composted

Issue - State: open - Opened by DiegoPino 20 days ago
Labels: bug, queue, Post processor Plugins

#102 - ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections

Issue - State: open - Opened by DiegoPino 22 days ago
Labels: enhancement, Solr Indexing, Post processor Plugins, ML

#101 - ISSUE-97: Time out management and better entity listing handling

Pull Request - State: open - Opened by DiegoPino about 1 month ago

#100 - ISSUE-99: Update HOCR to MINIOCR to treat ocr_texfloat as line

Pull Request - State: closed - Opened by DiegoPino 4 months ago
Labels: enhancement, Post processor Plugins

#99 - HOCR, process also as lines "ocr_textfloat"

Issue - State: closed - Opened by DiegoPino 4 months ago - 1 comment
Labels: enhancement, Post processor Plugins

#98 - 0.8.0:sync with release

Pull Request - State: closed - Opened by DiegoPino 5 months ago - 2 comments

#97 - Timeout (alternate way via timeout command) + handle better failed queue entries so Cron never takes on them

Issue - State: open - Opened by DiegoPino 5 months ago
Labels: enhancement, Solr Indexing, Logging, Post processor Plugins

#96 - ML related improvements

Issue - State: open - Opened by DiegoPino 6 months ago
Labels: enhancement, Post processor Plugins, ML

#95 - ISSUE-94: Objective Input (JSON) coming from chained processors (e.g OCR -> Sentence Transformer)

Pull Request - State: closed - Opened by DiegoPino 8 months ago
Labels: bug, Post processor Plugins, Release duties

#94 - JSON data passed between processors differs in structure if "forced" versus recycled from a parent that had it already

Issue - State: closed - Opened by DiegoPino 8 months ago - 1 comment
Labels: bug, Post processor Plugins, Release duties

#93 - Sync main to 0.7.0

Pull Request - State: closed - Opened by DiegoPino 9 months ago

#92 - ISSUE-91:ML processors

Pull Request - State: closed - Opened by DiegoPino 9 months ago - 5 comments

#91 - Add Image ML Processor

Issue - State: closed - Opened by DiegoPino 10 months ago - 1 comment
Labels: Post processor Plugins, Future

#90 - ISSUE-87: a bug

Pull Request - State: closed - Opened by DiegoPino 10 months ago

#89 - ISSUE-87: VTT Processor to MiniOCR with time/space transformation

Pull Request - State: closed - Opened by DiegoPino 10 months ago
Labels: enhancement, Post processor Plugins, ocrhighlight, Working Group's 💜

#88 - Provide a Preview of the output for Processors

Issue - State: open - Opened by DiegoPino 10 months ago
Labels: enhancement, Post processor Plugins, Working Group's 💜

#87 - Process VTT as fake X/Y (time mutation) MiniOCR

Issue - State: closed - Opened by DiegoPino 11 months ago - 5 comments
Labels: enhancement, Solr Indexing, Post processor Plugins, ocrhighlight, Future

#86 - Allow Post processors to give up on attempts + check on GS processing time on OCR

Issue - State: open - Opened by DiegoPino 11 months ago - 2 comments
Labels: enhancement, help wanted, queue, Logging, Post processor Plugins

#85 - ISSUE-82: Drupal 10 basics + VBO 4.2 changes

Pull Request - State: closed - Opened by DiegoPino over 1 year ago
Labels: Release duties, Working Group's 💜, Drupal 10, VBO

#84 - Macro issue, update to Drupal 10, PHP 8.1

Issue - State: closed - Opened by DiegoPino over 1 year ago - 1 comment
Labels: Release duties, Working Group's 💜, Drupal 10, VBO

#83 - Ingested in-place files on S3 are queued for composting

Issue - State: closed - Opened by patdunlavey over 1 year ago - 5 comments

#82 - Processor Parent (config) might get corrupted during drag and drop

Issue - State: open - Opened by DiegoPino over 1 year ago
Labels: bug, Post processor Plugins

#81 - Pure Text extraction from HOCR is HTML entity encoded

Issue - State: open - Opened by DiegoPino over 1 year ago - 2 comments
Labels: enhancement, help wanted, Solr Indexing, Post processor Plugins

#80 - ISSUE-79: Use flv:exif as fallback for Image to HOCR matching

Pull Request - State: closed - Opened by DiegoPino almost 2 years ago - 1 comment
Labels: enhancement, Post processor Plugins

#79 - Use flv:exif as fallback when comparing sizes for imported HOCR

Issue - State: closed - Opened by DiegoPino almost 2 years ago - 1 comment
Labels: enhancement, Post processor Plugins

#78 - ISSUE-77: Fix binary detection for Text extraction

Pull Request - State: closed - Opened by DiegoPino about 2 years ago
Labels: bug, enhancement, Post processor Plugins

#77 - Fix Binary detection

Issue - State: closed - Opened by DiegoPino about 2 years ago - 1 comment
Labels: bug, enhancement, Post processor Plugins

#75 - OCR timeout can cause infinite loop?

Issue - State: open - Opened by patdunlavey about 2 years ago - 8 comments
Labels: bug, queue, Post processor Plugins

#74 - PDFALTO non fatal errors breaking OCR

Issue - State: closed - Opened by DiegoPino about 2 years ago - 2 comments
Labels: bug, queue, Post processor Plugins

#73 - ISSUE-72: Entity ID typo for postprocess action and stray DS_Store file

Pull Request - State: closed - Opened by aksm over 2 years ago

#72 - Entity ID typo for postprocess action and stray DS_Store file

Issue - State: closed - Opened by aksm over 2 years ago - 1 comment

#71 - ISSUE-70: Improves process logic and cleans up atomic/processor generated file/garbage

Pull Request - State: closed - Opened by DiegoPino over 2 years ago
Labels: enhancement, queue, Solr Indexing, Logging, Post processor Plugins, ocrhighlight

#70 - Make sure stalled Tesseract processes are killed

Issue - State: closed - Opened by aksm over 2 years ago - 1 comment

#69 - Check if temp images generated during OCR are being cleaned up

Issue - State: closed - Opened by aksm over 2 years ago - 1 comment
Labels: bug

#68 - Build EZID integration

Issue - State: open - Opened by DiegoPino over 2 years ago - 2 comments
Labels: enhancement, question, queue, Post processor Plugins, Digital Preservation

#67 - ISSUE-66: Allow ap:nopost to skip processor IDs (machine name)

Pull Request - State: closed - Opened by DiegoPino over 2 years ago - 2 comments
Labels: enhancement, Post processor Plugins

#66 - Add a NO POST PROCESSING json key (exception) to skip on a one-by-one level a certain post processor(s)

Issue - State: closed - Opened by DiegoPino over 2 years ago - 1 comment
Labels: enhancement, Post processor Plugins

#65 - Sequence incorrect for OCR on multiple-image object

Issue - State: closed - Opened by patdunlavey over 2 years ago - 4 comments

#64 - ISSUE-57: VBO action + Pure text extraction from "anything" + File based HOCR with Image matching

Pull Request - State: closed - Opened by DiegoPino over 2 years ago - 5 comments
Labels: enhancement, queue, Solr Indexing, Post processor Plugins

#63 - ISSUE-62: Move ADO Tools into strawberryfield

Pull Request - State: closed - Opened by aksm over 2 years ago

#62 - Move ADO Tools into strawberryfield

Issue - State: open - Opened by aksm over 2 years ago

#61 - Improve messages generated by SBR enqueued items

Issue - State: closed - Opened by DiegoPino over 2 years ago - 1 comment
Labels: enhancement, queue, Logging, Post processor Plugins

#59 - Fatal error when using NLP

Issue - State: closed - Opened by patdunlavey over 2 years ago - 15 comments
Labels: Post processor Plugins, External Bug

#58 - Kitchen Door for strawberry_runners_postprocessor plugins

Issue - State: open - Opened by DiegoPino over 2 years ago
Labels: enhancement, queue, API flavor endpoints, Post processor Plugins, Digital Preservation

#57 - VBO action to re-trigger SBRs

Issue - State: closed - Opened by DiegoPino over 2 years ago - 1 comment
Labels: documentation, enhancement, help wanted, queue, Post processor Plugins

#56 - 0.4.0 Release Sync to main

Pull Request - State: closed - Opened by DiegoPino over 2 years ago
Labels: Release duties

#55 - ISSUE-54: Fixes broken NLPClient and adds FastText as default

Pull Request - State: closed - Opened by DiegoPino over 2 years ago
Labels: bug, enhancement, Solr Indexing, API flavor endpoints, Post processor Plugins, External Bug

#54 - Fix Web64 NlpClient to deal with HTTP 1/1 or 2/0 if that is a thing

Issue - State: closed - Opened by DiegoPino over 2 years ago - 2 comments
Labels: enhancement, API flavor endpoints, Post processor Plugins, Release duties, External Bug

#53 - ISSUE-52: add extraPages.jsonl to the Waco chain

Pull Request - State: closed - Opened by DiegoPino over 2 years ago

#52 - Make WACZ processor aware of Browsertrix -Crawler extraPages.jsonl

Issue - State: closed - Opened by DiegoPino over 2 years ago - 1 comment
Labels: enhancement, Datapackage / Frictionless, Post processor Plugins

#51 - Fix typo in ALTOtoMiniOCR

Pull Request - State: closed - Opened by giancarlobi over 2 years ago

#50 - Typo makes ALTO to Miniocr not working

Issue - State: closed - Opened by giancarlobi over 2 years ago - 1 comment

#49 - Allow System Binary Processor to use URLs instead of only making local files availability required

Issue - State: open - Opened by DiegoPino almost 3 years ago - 1 comment
Labels: enhancement, Post processor Plugins

#48 - Indexing OCR fails with multibyte characters

Issue - State: closed - Opened by patdunlavey almost 3 years ago - 4 comments

#47 - ISSUE-46-OCR: Provide the ability to OCR image files

Pull Request - State: closed - Opened by patdunlavey almost 3 years ago - 10 comments

#46 - tesseract OCR only takes pdf files as input

Issue - State: closed - Opened by patdunlavey almost 3 years ago - 11 comments

#45 - 0.2.0:Release push to main

Pull Request - State: closed - Opened by DiegoPino over 3 years ago
Labels: Release duties

#44 - HOCR per image page(s)

Issue - State: closed - Opened by alliomeria over 3 years ago - 2 comments
Labels: enhancement, Post processor Plugins

#43 - Post processor Plugin for Zip Files

Issue - State: open - Opened by alliomeria over 3 years ago
Labels: enhancement, Post processor Plugins

#42 - ADO Tools Permissions & Display

Issue - State: open - Opened by alliomeria over 3 years ago - 1 comment
Labels: enhancement

#41 - Implement alto capabilities in 0.2.0

Pull Request - State: closed - Opened by giancarlobi over 3 years ago - 9 comments

#40 - ISSUE-39: NLP for Webpages/OCR and some improvements for SBFlavor docs (more data)

Pull Request - State: closed - Opened by DiegoPino over 3 years ago
Labels: enhancement, Solr Indexing, Datapackage / Frictionless, Post processor Plugins

#39 - Make a Frictionless Data package Post processor

Issue - State: open - Opened by DiegoPino almost 4 years ago
Labels: enhancement, Solr Indexing, Datapackage / Frictionless, Post processor Plugins

#38 - ISSUE-37: Fix broken coordinates check to make Tesseract HOCR work again

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug, Post processor Plugins

#37 - RC2 Bug in OCR Processor. Lines may have more than just 5 elements. Breaks Tesseract

Issue - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug

#36 - ISSUE-34: Solve issue when coord value is zero

Pull Request - State: closed - Opened by giancarlobi almost 4 years ago

#35 - ISSUE-34: If no words, return a constant XML

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago
Labels: bug, enhancement, Post processor Plugins, ocrhighlight

#34 - Blank page with empty hOCR hangs last version of ocrhighlight plugin

Issue - State: closed - Opened by giancarlobi almost 4 years ago - 16 comments
Labels: bug, enhancement, Solr Indexing, Post processor Plugins, ocrhighlight

#33 - ISSUE-32: Fulltext and Plaintext, best friends.

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: enhancement, Solr Indexing, Post processor Plugins

#32 - Add plaintext and Total Sequence Count to Search API indexable OCR processor

Issue - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: enhancement, Solr Indexing, Datapackage / Frictionless, Post processor Plugins

#31 - Mini ocr add space

Pull Request - State: closed - Opened by giancarlobi almost 4 years ago - 1 comment

#30 - ISSUE-29: Fix broken dependencies after Frictionless Package update

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug, Datapackage / Frictionless, Post processor Plugins

#29 - We need a str_replace_first function

Issue - State: closed - Opened by giancarlobi almost 4 years ago - 1 comment

#28 - ISSUE-27: Removed Debug statement

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug, enhancement

#27 - Remove debug statement

Issue - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug, enhancement

#26 - ISSUE-25: Ensure that ADO cache tags are cleared when indexing OCR content

Pull Request - State: closed - Opened by pcambra almost 4 years ago - 1 comment

#25 - Ensure that ADO cache tags are cleared when indexing OCR content

Issue - State: closed - Opened by pcambra almost 4 years ago - 4 comments

#24 - ISSUE-23: Ensure better and safer File derivative generation and persistence

Pull Request - State: closed - Opened by DiegoPino almost 4 years ago - 2 comments
Labels: bug, enhancement, queue, Post processor Plugins, Digital Preservation

#23 - Discard multiple same processor enqueued items after one was persisted

Issue - State: closed - Opened by DiegoPino almost 4 years ago - 1 comment
Labels: bug, enhancement, queue, Post processor Plugins, Digital Preservation

#22 - Post processor Plugin for Archivematica/AIP

Issue - State: open - Opened by DiegoPino about 4 years ago - 2 comments
Labels: queue, Post processor Plugins, Digital Preservation

#21 - ISSUE-20: Catch missing file Exception and return bool.

Pull Request - State: closed - Opened by DiegoPino about 4 years ago
Labels: bug, queue

#20 - If a file referenced in a queue does not exist anymore the lease should be remove

Issue - State: closed - Opened by DiegoPino about 4 years ago - 7 comments

#19 - ISSUE-5: Clean Up and make sure all is send via loggers

Pull Request - State: closed - Opened by DiegoPino about 4 years ago - 3 comments
Labels: enhancement, Logging

#18 - ISSUE-3:small update to match 1.0.0-RC1 SBF

Pull Request - State: closed - Opened by DiegoPino about 4 years ago - 2 comments

#17 - Remove temporary files and make functions private

Pull Request - State: closed - Opened by giancarlobi about 4 years ago - 7 comments

#16 - Add executable settings and queue option

Pull Request - State: closed - Opened by giancarlobi about 4 years ago - 1 comment

#15 - Add djvu2hocr for pdf searchable WIP

Pull Request - State: closed - Opened by giancarlobi about 4 years ago - 2 comments

#14 - Push Solr Indexed Flavors into a Frictionless data package

Issue - State: open - Opened by DiegoPino about 4 years ago - 3 comments
Labels: enhancement, Solr Indexing, Datapackage / Frictionless

#13 - Similar to "add a file" we need a "just add JSON" logic

Issue - State: open - Opened by DiegoPino about 4 years ago
Labels: enhancement, queue, Datapackage / Frictionless

#12 - Mainloop and enqueue logic

Issue - State: closed - Opened by giancarlobi about 4 years ago - 5 comments

#11 - ISSUE-3: OCR specific Processor and new features/processing option

Pull Request - State: closed - Opened by DiegoPino about 4 years ago - 6 comments

#10 - Allow other queues to be fed

Issue - State: open - Opened by DiegoPino over 4 years ago - 1 comment
Labels: enhancement, queue, API flavor endpoints, Future

#9 - Fix UpperCase LowerCase issue with Entity Annotation List Entity

Issue - State: closed - Opened by DiegoPino over 4 years ago - 1 comment
Labels: bug

#8 - Pandadoc: Any document into any other document (transmute)

Issue - State: open - Opened by DiegoPino almost 5 years ago - 3 comments
Labels: enhancement, help wanted, question

#7 - Make Processor Plugins hierarchical

Issue - State: closed - Opened by DiegoPino almost 5 years ago - 11 comments
Labels: enhancement

#6 - Make a first EXIF only Post processor Plugin

Issue - State: open - Opened by DiegoPino almost 5 years ago - 7 comments
Labels: enhancement

#5 - ISSUE-4: First pass on SBF runners plugin system

Pull Request - State: closed - Opened by DiegoPino almost 5 years ago - 2 comments
Labels: documentation, enhancement, help wanted, question