ocrmypdf/OCRmyPDF issues and pull requests

#1009 - OCR picks up all the text, but alignment is off

Issue - State: closed - Opened by nchammas over 2 years ago - 2 comments

#1004 - OCRmyPDF assumes really large DPI for native PDF when rasterizing as image

Issue - State: closed - Opened by fabiante over 2 years ago - 3 comments

#1003 - How to keep source file time, date, metadata.... etc for Target File?

Issue - State: closed - Opened by limopc over 2 years ago - 4 comments

#977 - optimize.py doesn't process images with subtype Form

Issue - State: closed - Opened by imz over 2 years ago - 4 comments
Labels: enhancement

#966 - ocrmypdf --tesseract-timeout=0 --deskew blocks deskewing - was: Using ocrmypdf to correct the deviation has no effect No errors were reported

Issue - State: closed - Opened by wss1801 almost 3 years ago - 8 comments
Labels: bug

#964 - Compression artifacts

Issue - State: closed - Opened by cloakedch almost 3 years ago - 3 comments

#961 - "--force-ocr" switch increases size of pdf by factor 25

Issue - State: open - Opened by wildgruber almost 3 years ago - 4 comments

#948 - Double to quadruple file size and worse quality with --deskew --clean-final (due to mask?)

Issue - State: open - Opened by bllngr almost 3 years ago - 2 comments
Labels: bug

#944 - "remove-background not implemented"

Issue - State: closed - Opened by bouboulov almost 3 years ago - 6 comments

#942 - Creating txt file without an output pdf. Examples missing for correct syntax.

Issue - State: closed - Opened by gevezex almost 3 years ago - 3 comments

#931 - `--redo-ocr` adds extra text to the PDF

Issue - State: closed - Opened by DUOLabs333 almost 3 years ago - 1 comment

#906 - support monochromatic conversion

Issue - State: closed - Opened by jknockaert about 3 years ago - 6 comments

#897 - --redo-ocr doesn't remove previous ocr-text layer made by ocrmypdf

Issue - State: open - Opened by Mark-Joy about 3 years ago - 2 comments

#884 - --remove-background options currently not implemented?

Issue - State: closed - Opened by Perangelot about 3 years ago - 3 comments

#872 - cannot run under python 3.10

Issue - State: closed - Opened by starsareintherose about 3 years ago - 5 comments

#868 - Blank pages cause the process to crash due to tesseract

Issue - State: closed - Opened by philayres about 3 years ago - 3 comments

#836 - ValueError: integer out of range converting 10585497845 from a 8-byte signed type to a 4-byte signed type

Issue - State: closed - Opened by gumbolastima over 3 years ago - 16 comments

#827 - ocrmypdf --redo-ocr fails with DecompressionBombError on small PDF

Issue - State: closed - Opened by nicolasguinot over 3 years ago - 4 comments

#814 - Hanging on Random Files

Issue - State: closed - Opened by jgforbes over 3 years ago - 24 comments

#807 - Hebrew text seems to be reversed(whole line) on OCR-ed pdf

Issue - State: open - Opened by Kors1981 over 3 years ago - 5 comments
Labels: user config

#781 - Correcting recognition errors - possible with sidecar option?

Issue - State: closed - Opened by jdescelliers almost 4 years ago - 5 comments

#771 - Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata

Issue - State: closed - Opened by alicanidas almost 4 years ago - 4 comments

#766 - [ENHANCEMENT] Google Colab notebook

Issue - State: closed - Opened by louispaulet almost 4 years ago - 3 comments

#760 - Use Multi-threading instead of multi-processing on platforms not supporting it

Issue - State: closed - Opened by MrAdityaAlok almost 4 years ago - 11 comments

#748 - Jbig2 dependency on windows

Issue - State: closed - Opened by mortang2410 almost 4 years ago - 25 comments

#721 - --force-ocr converts JBIG2 images to 24-bit

Issue - State: closed - Opened by alawvt about 4 years ago - 6 comments
Labels: bug

#715 - extra space in the result pdf when the input pdf is in Chinese

Issue - State: open - Opened by Eyxxxxx about 4 years ago - 20 comments
Labels: third party issue

#659 - Improving Windows with PyInstaller - Ocrmypdf Distribution Not Found

Issue - State: open - Opened by gabemorris12 over 4 years ago - 14 comments
Labels: enhancement

#631 - liblept-5.dll load fails on Windows 10 (OSError 0x7F)

Issue - State: closed - Opened by Suyash458 over 4 years ago - 14 comments
Labels: bug, third party issue

#623 - Can you tell me what docker command I should run in order to make the docker image work?

Issue - State: closed - Opened by 5aumy4 over 4 years ago - 5 comments
Labels: question

#595 - Azure ocr with ocrmypdf

Issue - State: open - Opened by sandipan1 over 4 years ago - 16 comments
Labels: enhancement

#590 - Pass existing OCR-Data in ALTO-Format

Issue - State: closed - Opened by M3ssman over 4 years ago - 3 comments
Labels: enhancement

#551 - "--force-ocr" mangles some pages in a pdf

Issue - State: open - Opened by wojciechbielecki almost 5 years ago - 5 comments
Labels: bug

#550 - --threshold-final

Issue - State: open - Opened by femifrak almost 5 years ago - 4 comments
Labels: enhancement

#542 - Documentation Bug: Mention of consulting inquiries includes no contact information

Issue - State: closed - Opened by andylippitt almost 5 years ago - 1 comment

#541 - Introduce a way to radically reduce the output file size (sacrificing image quality)

Issue - State: closed - Opened by heinrich-ulbricht almost 5 years ago - 95 comments
Labels: enhancement

#539 - Chocolately package for Windows

Issue - State: closed - Opened by jbarlow83 almost 5 years ago - 5 comments
Labels: enhancement, help wanted

#528 - Add support for PDF/A-2u or PDF/A-2a

Issue - State: closed - Opened by frederictobiasc almost 5 years ago - 1 comment
Labels: enhancement

#514 - Error: File did not complete the page properly and may be damaged.

Issue - State: open - Opened by tice17 almost 5 years ago - 8 comments

#495 - Searching math equations

Issue - State: closed - Opened by karasjoh000 almost 5 years ago - 4 comments
Labels: enhancement

#488 - cx_Freeze support - packaging on Windows

Issue - State: closed - Opened by Faisalsouz about 5 years ago - 2 comments
Labels: help wanted, third party issue

#487 - Command line option deskew not found but d is available

Issue - State: closed - Opened by paazmaya about 5 years ago - 6 comments
Labels: user config

#483 - Provide tsv and hocr output files

Issue - State: closed - Opened by ArlindNocaj about 5 years ago - 2 comments

#460 - fatal error: qpdf/Constants.h - pip3 install ocrmypdf and pikepdf on ubuntu win10 subsystem failed

Issue - State: closed - Opened by mhechthz about 5 years ago - 27 comments
Labels: user config

#458 - deskew and roate but skip ocr?

Issue - State: closed - Opened by barrars about 5 years ago - 4 comments

#453 - hocr import / export

Issue - State: closed - Opened by aalmir over 5 years ago - 38 comments
Labels: enhancement

#450 - Text layer not aligned with original document

Issue - State: open - Opened by wpzdm over 5 years ago - 8 comments
Labels: need test file

#446 - Pdf error with tables

Issue - State: open - Opened by miguelgarces123 over 5 years ago - 20 comments

#445 - Support for JPEG2000, jp2 output

Issue - State: closed - Opened by aalmir over 5 years ago - 4 comments

#443 - Implement optional downsampling as part of preprocessing

Issue - State: closed - Opened by jbarlow83 over 5 years ago - 5 comments
Labels: enhancement

#437 - support converting multiple images

Issue - State: closed - Opened by grexe over 5 years ago - 3 comments
Labels: enhancement

#428 - Check if OCR images would be >2^31 bytes

Issue - State: closed - Opened by jbarlow83 over 5 years ago - 5 comments

#410 - Error: unable to find trailer dictionary while recovering damaged file

Issue - State: closed - Opened by fuzihaofzh over 5 years ago - 7 comments
Labels: need test file

#364 - Create an AppImage for ocrmypdf

Issue - State: closed - Opened by jbarlow83 almost 6 years ago - 6 comments
Labels: help wanted

#351 - Feature request: additional post-processing options

Issue - State: closed - Opened by WillemJansen about 6 years ago - 1 comment
Labels: enhancement

#345 - Bug: Using --clean or --treshhold options break the --tesseract-pagesegmode option

Issue - State: closed - Opened by gabrielmongefranco about 6 years ago - 3 comments
Labels: need test file

#318 - AttributeError: 'str' object has no attribute 'option_strings'

Issue - State: closed - Opened by jbarlow83 about 6 years ago - 2 comments

#316 - Output PDF is getting distorted on each ocrmypdf command.

Issue - State: closed - Opened by DEEPAK-KESWANI about 6 years ago - 15 comments

#293 - file size increase for pdf/a

Issue - State: closed - Opened by femifrak over 6 years ago - 11 comments

#261 - Larger than input file using "--output-type pdf" with "--deskew --clean --remove-background"

Issue - State: closed - Opened by jmrichardson almost 7 years ago - 13 comments

#258 - Best way to handle PDF with mixed content ?

Issue - State: closed - Opened by guldil almost 7 years ago - 13 comments

#242 - PDF/A without MuPDF deletes bookmarks

Issue - State: closed - Opened by jbarlow83 almost 7 years ago - 1 comment

#237 - Excessive file size growth with --force-ocr

Issue - State: closed - Opened by jbarlow83 almost 7 years ago - 5 comments

#209 - ERROR: [tesseract] read_params_file: Can't open txt

Issue - State: closed - Opened by dev-code-davis about 7 years ago - 9 comments

#202 - NixOS packaging issues

Issue - State: closed - Opened by sjau about 7 years ago - 28 comments

#177 - Add HOCR output as a sidecar option

Issue - State: closed - Opened by parkerhancock over 7 years ago - 7 comments
Labels: enhancement

#139 - Just saying thanks!!!

Issue - State: open - Opened by ericmjl almost 8 years ago - 14 comments

#125 - Output PDFs have decreased quality

Issue - State: closed - Opened by Wikinaut about 8 years ago - 13 comments

#115 - Reduce memory usage for very large files (high page count and large file size)

Issue - State: open - Opened by jbarlow83 about 8 years ago - 7 comments

#66 - ocr corrections

Issue - State: closed - Opened by femifrak almost 9 years ago - 8 comments

#36 - Adapt to Google Vision API

Issue - State: closed - Opened by shaunc869 about 9 years ago - 15 comments
Labels: enhancement

#12 - Option to remove blank pages

Issue - State: open - Opened by OCRmyPDF-issuebot over 9 years ago - 19 comments
Labels: enhancement

GitHub / ocrmypdf/OCRmyPDF issues and pull requests