ocrmypdf/OCRmyPDF issues and pull requests

#1124 - [Bug]: `pdfa-image-compression=auto` behaviour violates the principle of least surprise w.r.t. lossy/lossless optimisations

Issue - State: open - Opened by Atemu over 1 year ago - 2 comments
Labels: need test file

#1123 - does ocrmypdf create an invisible text layer?

Issue - State: closed - Opened by lbr991 over 1 year ago - 11 comments

#1122 - Confused about --unpaper-args

Issue - State: closed - Opened by al1coch over 1 year ago - 4 comments
Labels: bug

#1121 - [Feature]: Parameter to automatically remove blank pages

Issue - State: closed - Opened by GrabbenD over 1 year ago - 2 comments

#1120 - orcmypdf not working in HTML/browser

Issue - State: closed - Opened by Prabal1902 over 1 year ago - 4 comments
Labels: bug

#1119 - [Bug]: Can not transfer image into editable text in pdf

Issue - State: closed - Opened by ericosmic over 1 year ago - 1 comment
Labels: bug

#1118 - [Bug]: PDF/A-3B files generated with a widely used commercial encoder generate garbage OCR content

Issue - State: closed - Opened by jce-zz over 1 year ago - 20 comments
Labels: bug

#1117 - Allow title, subject, author, and keywords to be unset with an empty string argument

Pull Request - State: closed - Opened by f-hansen over 1 year ago - 1 comment

#1116 - [Bug]: Problem when OCR heavy PDFs - freezes at 0%

Issue - State: closed - Opened by dariofilipe over 1 year ago - 2 comments
Labels: bug

#1115 - Problem when OCR heavy PDFs - freezes at 0%

Issue - State: closed - Opened by dariofilipe over 1 year ago - 1 comment

#1114 - do OCR if text boxs of minimum 15

Pull Request - State: closed - Opened by pkrsreddy over 1 year ago - 2 comments

#1113 - Fix randomly ordered languages from set()

Pull Request - State: closed - Opened by abwiersma over 1 year ago

#1112 - [Bug]: Inconsistent language order in tesseract calls

Issue - State: closed - Opened by abwiersma over 1 year ago - 2 comments
Labels: bug

#1111 - [Feature]: just curious/wondering about Tesseract 5 support

Issue - State: closed - Opened by alejohern over 1 year ago - 1 comment
Labels: enhancement

#1110 - [Feature]: OCR on pages with multiple text rotations

Issue - State: open - Opened by matthuszagh over 1 year ago - 2 comments
Labels: enhancement

#1109 - 鉴于很多使用者不会配置环境，我们在OCRmyPDF的基础上，集成了所需环境，并使用Electron开发了桌面端 [Electron version of OCRmyPDF]

Issue - State: closed - Opened by FanQinFred over 1 year ago - 2 comments
Labels: enhancement, question

#1108 - [BUG] Frequently seeing `Syntax Error (91811): Too few (2) args to 'cm' operator`

Issue - State: closed - Opened by deexpabada over 1 year ago - 2 comments

#1107 - Would be nice to be able to choose the temporary directory

Issue - State: closed - Opened by al1coch over 1 year ago - 4 comments

#1106 - Support for PDF-A/4

Issue - State: open - Opened by rafaelfcmaria over 1 year ago - 1 comment
Labels: enhancement, third party issue

#1105 - OCRmyPDF not rotating the file correctly using the version 14.2.1

Issue - State: closed - Opened by gilsonbergamine over 1 year ago - 1 comment

#1104 - [BUG] 'DecompressionBombError' on a ACM PDF - need resolution limit on high DPI

Issue - State: closed - Opened by gwern almost 2 years ago - 7 comments

#1103 - [BUG] Bold font in PDF is replaced by black bars

Issue - State: closed - Opened by tobox almost 2 years ago - 2 comments

#1102 - [BUG] ghostscript fails due to small resolution value

Issue - State: open - Opened by neurolabs almost 2 years ago - 3 comments

#1101 - How to get the deskew angle

Issue - State: closed - Opened by GoN49 almost 2 years ago - 1 comment

#1100 - Replace text from original PDF with OCR'd Text

Issue - State: closed - Opened by FrancisBaileyH almost 2 years ago - 2 comments

#1099 - Unknown .defaultpapersize: (A4). / Unrecoverable error: rangecheck in .putdeviceprops / SubprocessOutputError: Ghostscript PDF/A rendering failed

Issue - State: closed - Opened by klartext almost 2 years ago - 5 comments
Labels: third party issue

#1098 - Remove image layer after OCR?

Issue - State: closed - Opened by Frooodle almost 2 years ago - 2 comments

#1097 - WSL support

Issue - State: closed - Opened by pinballelectronica almost 2 years ago - 3 comments

#1096 - [BUG] AttributeError: module 'PIL.Image' has no attribute 'Resampling' on running script

Issue - State: closed - Opened by acarl123 almost 2 years ago - 3 comments

#1095 - [BUG] deletes most of a page

Issue - State: closed - Opened by gwern almost 2 years ago - 3 comments

#1094 - Feature Request: Provide for downloading of language models

Issue - State: closed - Opened by simsong almost 2 years ago - 1 comment

#1093 - Feature Request: Provide for usage with cloud-based OCR engines

Issue - State: closed - Opened by simsong almost 2 years ago - 4 comments

#1092 - How to handle already ocred files efficiently?

Issue - State: closed - Opened by drnicolas almost 2 years ago - 1 comment

#1091 - [HELP] Inconsistent Reading order

Issue - State: closed - Opened by emtee14 almost 2 years ago - 2 comments

#1090 - Snap package shouldn't ship all of the Tesseract OCR language files

Issue - State: open - Opened by brlin-tw almost 2 years ago - 1 comment
Labels: help wanted

#1089 - Fix snap package building (#1082)

Pull Request - State: closed - Opened by brlin-tw almost 2 years ago - 3 comments

#1088 - [BUG] #addopts = pytest -n "auto" no option?

Issue - State: closed - Opened by shaynababe almost 2 years ago - 2 comments

#1087 - Fix typos

Pull Request - State: closed - Opened by kianmeng almost 2 years ago - 1 comment

#1086 - Only generate text files without generating PDF files

Issue - State: open - Opened by rodrigomorales1 almost 2 years ago - 11 comments

#1085 - Use Github Releases for notifications

Issue - State: closed - Opened by fabiante almost 2 years ago - 2 comments

#1084 - ocrmypdf generating white patch in output pdf?

Issue - State: closed - Opened by gogineniravikumar almost 2 years ago - 1 comment

#1083 - Improve PDF rasterisation safety

Pull Request - State: closed - Opened by sihil almost 2 years ago - 1 comment

#1082 - [BUG] Snap Package not Working

Issue - State: closed - Opened by lhhel9l3 almost 2 years ago - 6 comments

#1081 - Correct way to deskew PDF already processed by OCRmyPDF?

Issue - State: open - Opened by pimlottc almost 2 years ago - 7 comments

#1080 - PDFs not created with fast web view

Issue - State: open - Opened by dklinger almost 2 years ago - 1 comment

#1079 - [BUG] Pathological output: PDF expands to 50x size after half an hour of processing

Issue - State: closed - Opened by gwern almost 2 years ago - 4 comments

#1078 - [BUG] pikepdf warning about missing decoders

Issue - State: closed - Opened by ajweber almost 2 years ago - 3 comments

#1077 - JBIG2 not legally secure in many countries

Issue - State: closed - Opened by dklinger almost 2 years ago - 2 comments

#1076 - [BUG] PIL.Image.DecompressionBombError

Issue - State: closed - Opened by JohnLockeG almost 2 years ago - 1 comment

#1075 - [BUG] crashes with `TypeError: 'NoneType' object is not subscriptable`

Issue - State: closed - Opened by frrad almost 2 years ago - 1 comment

#1074 - [BUG] cannot ocr the numbers on left side of page

Issue - State: closed - Opened by sushmitxo almost 2 years ago - 1 comment

#1073 - Optimize images with SMask

Issue - State: open - Opened by benbro almost 2 years ago - 3 comments

#1072 - Use paddleocr instead of tesseract

Issue - State: closed - Opened by aymenmtibaa almost 2 years ago - 1 comment

#1071 - Feature Request: GPU OCR pipeline e.g. via EasyOCR

Issue - State: closed - Opened by systemofapwne about 2 years ago - 4 comments
Labels: enhancement

#1070 - [BUG] Wrong optimize ratio and savings

Issue - State: closed - Opened by homocomputeris about 2 years ago - 6 comments

#1069 - [BUG] Possible to force OCR without losing vector data?

Issue - State: closed - Opened by moksamedia about 2 years ago - 2 comments

#1068 - Avoid deleting /dev/null when run as root

Pull Request - State: closed - Opened by jbarlow83 about 2 years ago

#1067 - [BUG] /dev/null gets deleted when run as root (inside a Docker container)

Issue - State: closed - Opened by andymwood about 2 years ago - 1 comment

#1066 - handle case when candidate is None

Pull Request - State: closed - Opened by frrad about 2 years ago - 1 comment

#1065 - [QUESTION] Render hocr with python

Issue - State: closed - Opened by jcuenod about 2 years ago - 2 comments

#1064 - tesseract-osd is also required on fedora

Pull Request - State: closed - Opened by white-gecko about 2 years ago

#1063 - added setting RETRIES_LOADING_FILE to watcher.py

Pull Request - State: closed - Opened by comzine about 2 years ago

#1062 - [BUG] tesseract returns SIGFPE Signal

Issue - State: closed - Opened by C0D3D3V about 2 years ago - 4 comments

#1060 - Error processing shell script on file

Issue - State: closed - Opened by danilichti about 2 years ago - 2 comments

#1059 - Allow title, subject, author, and keywords to be unset with an empty string argument

Pull Request - State: closed - Opened by f-hansen about 2 years ago - 3 comments

#1058 - substitute broken link (#1057)

Pull Request - State: closed - Opened by LucasLarson about 2 years ago

#1057 - [BUG] docs: links to brewformulas.org no longer work

Issue - State: closed - Opened by LucasLarson about 2 years ago

#1056 - output JSON format

Issue - State: closed - Opened by emresaracoglu about 2 years ago - 1 comment

#1055 - Is it possible to add paddleocr as an option for ocr?

Issue - State: closed - Opened by nissansz about 2 years ago - 4 comments

#1054 - [BUG] ValueError: invalid arguments: (pikepdf._qpdf._ObjectList([]),)

Issue - State: open - Opened by dli7319 about 2 years ago

#1053 - fix crash on PDF

Pull Request - State: closed - Opened by frrad about 2 years ago - 1 comment

#1052 - [BUG] crash when trying to process a pdf

Issue - State: closed - Opened by frrad about 2 years ago - 1 comment

#1051 - Feature request: Ask user what likely-incorrect words are

Issue - State: closed - Opened by mattention about 2 years ago - 1 comment
Labels: enhancement

#1050 - Is it possible to capture Tesseract messages and suggestions either as exceptions or exit codes?

Issue - State: closed - Opened by sergeyyurkov1 about 2 years ago - 2 comments

#1049 - [BUG] `--deskew` not compatible with blank pages or with tesseract_timeout = 0

Issue - State: closed - Opened by deexpabada about 2 years ago - 6 comments

#1048 - Fixed the source installation instructions

Pull Request - State: closed - Opened by yasoob about 2 years ago - 1 comment

#1047 - Fix tesseract documentation url

Pull Request - State: closed - Opened by CGarces about 2 years ago

#1046 - Memory leak ocrmypdf.ocr vs subprocess.run

Issue - State: closed - Opened by CGarces about 2 years ago - 5 comments

#1045 - Fixed some wording

Pull Request - State: closed - Opened by yasoob about 2 years ago

#1044 - log completion message

Pull Request - State: closed - Opened by drinckes about 2 years ago

#1043 - Way to test PDF to see if there is any text?

Issue - State: closed - Opened by spedinfargo about 2 years ago - 1 comment

#1042 - OCR for Comic Book PDFs -- Possible Solution

Issue - State: closed - Opened by yosamsimiti about 2 years ago - 2 comments

#1041 - Spaces in Japanese

Issue - State: closed - Opened by KajiyaOokami about 2 years ago - 4 comments
Labels: third party issue

#1040 - Ignore Digital Signed Documents

Issue - State: closed - Opened by flaviobrunopereira about 2 years ago - 2 comments
Labels: need test file

#1039 - Fixed interchanged words

Pull Request - State: closed - Opened by yasoob about 2 years ago - 1 comment

#1038 - Draw/Blanking on wrong spot

Issue - State: open - Opened by emre1e about 2 years ago - 1 comment

#1037 - read_params_file: Can't open pdf/txt -- new issue -- help!

Issue - State: closed - Opened by yosamsimiti about 2 years ago - 4 comments

#1036 - --redo-ocr does all the work but doesn't save the new OCR text layer in the output pdf, leaving the old OCR text

Issue - State: open - Opened by Shoresh613 about 2 years ago - 1 comment

#1035 - Garbled order of OCR'ed contents

Issue - State: open - Opened by rkevk over 2 years ago - 6 comments

#1034 - ocrmypdf cannot convert pages with watermarks.

Issue - State: closed - Opened by marlarius over 2 years ago - 3 comments

#1033 - Pyinstaller OCRmyPDF pikepdf packagenotfound error and failed to determine version

Issue - State: closed - Opened by DerDoktorFaust over 2 years ago - 2 comments

#1032 - Remove blank page without recognizable characters of the ocr

Issue - State: open - Opened by gitmors over 2 years ago

#1031 - Question: multiple import folders possible?

Issue - State: closed - Opened by Maximus48p over 2 years ago - 1 comment

#1030 - Ghostscript error + OCRmyPDF puts a space after every letter in the output.pdf file

Issue - State: closed - Opened by moritz1000 over 2 years ago - 3 comments

#1029 - How to reduce ram usage

Issue - State: closed - Opened by alirf81 over 2 years ago - 1 comment

#1024 - Issue packaging with pyinstaller

Issue - State: open - Opened by kiyros over 2 years ago - 5 comments

#1019 - Debian maintainer requested for OCRmyPDF and pikepdf

Issue - State: closed - Opened by jbarlow83 over 2 years ago - 1 comment
Labels: help wanted

GitHub / ocrmypdf/OCRmyPDF issues and pull requests