Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / documentcloud/docsplit issues and pull requests

#160 - "Error: source file could not be loaded"

Issue - State: open - Opened by Nakilon 3 months ago

#159 - Fix deprecated method File.exists? to File.exist?

Pull Request - State: open - Opened by tuttiq about 1 year ago - 7 comments

#158 - ruby 3.2 compatibility

Issue - State: open - Opened by jwoodrow over 1 year ago

#157 - Documentaion link correction

Pull Request - State: closed - Opened by anujaware over 2 years ago - 1 comment

#155 - Remove tmporary folder for gs conversion

Pull Request - State: open - Opened by eriko-de about 3 years ago

#152 - Docsplit.extract_text generates a String with a null byte

Issue - State: open - Opened by cedricpim almost 5 years ago

#151 - diskspace leak when extracting text from pdf

Issue - State: open - Opened by KHMtravel over 5 years ago - 1 comment

#150 - Add quick and dirty way to add options to tesseract

Pull Request - State: closed - Opened by mgontav almost 6 years ago

#149 - Fix escaping when extracting text using OCR

Pull Request - State: open - Opened by floehopper about 6 years ago

#148 - Add page_size to InfoExtractor

Pull Request - State: open - Opened by dan-jensen about 6 years ago - 2 comments

#147 - Added support for layout and nopgbrk options when using pdftotext

Pull Request - State: closed - Opened by prasadsurase about 6 years ago

#146 - update pdftk installer URL for Mac

Pull Request - State: closed - Opened by tkimnguyen about 6 years ago

#145 - Different behavior on mac and linux

Issue - State: open - Opened by jbmyid over 6 years ago

#142 - temp fix for output filenames

Pull Request - State: closed - Opened by deuxshaish over 8 years ago - 1 comment

#140 - Downsampling has gotten worse in the last year

Issue - State: open - Opened by reefdog over 8 years ago

#138 - Can any one please tell me how to pass file path as url to Docsplit ?

Issue - State: closed - Opened by jogiranjith over 8 years ago - 2 comments

#136 - Horizontal / table formatted text

Issue - State: open - Opened by nofxx over 8 years ago

#135 - rails invalid byte sequence in UTF-8

Issue - State: open - Opened by fjcaro almost 9 years ago - 1 comment

#134 - Clean pdffonts output to avoid invalid UTF-8 characters

Pull Request - State: open - Opened by tbk303 almost 9 years ago

#133 - encoding issue

Issue - State: closed - Opened by dfang almost 9 years ago - 1 comment

#132 - Add layout option to keep layout during text extraction

Pull Request - State: closed - Opened by scarfacedeb about 9 years ago - 7 comments

#131 - Fix page parsing for command line usage

Pull Request - State: open - Opened by xavriley about 9 years ago

#129 - Percent sign in filenames isn't escaped properly

Issue - State: open - Opened by jeremybmerrill over 9 years ago

#128 - Break PDFs into chunks bigger than 1 page?

Issue - State: open - Opened by AbeHandler over 9 years ago - 3 comments

#127 - Extract Link (URL, Goto, etc)

Issue - State: closed - Opened by dglunz over 9 years ago - 2 comments

#124 - Add parallel processing to OCR text extraction of full documents

Pull Request - State: open - Opened by ntodd over 9 years ago - 2 comments

#122 - Corrupted pdf file from Chinese docx

Issue - State: closed - Opened by intellisense over 9 years ago - 2 comments

#121 - Encoding issue - invalid byte sequence in US-ASCII (ArgumentError)

Issue - State: open - Opened by intellisense over 9 years ago - 13 comments

#120 - Orientation

Pull Request - State: closed - Opened by AbeHandler over 9 years ago - 1 comment

#118 - Windows: "%d" is always escaped to "\%d".

Pull Request - State: closed - Opened by ypxing over 9 years ago

#117 - Add section to documentation regarding the "--language" flag

Issue - State: closed - Opened by nathanstitt over 9 years ago - 1 comment

#116 - German umlauts are replaced by ? after OCR

Issue - State: closed - Opened by tbk303 over 9 years ago - 6 comments

#115 - No such file or directory @ rb_sysopen - example.doc (Errno::ENOENT)

Issue - State: open - Opened by jhonc33 almost 10 years ago - 3 comments

#114 - added functionality to pass pdftotext options

Pull Request - State: open - Opened by narutosanjiv almost 10 years ago - 1 comment

#113 - *** glibc detected *** gm: realloc(): invalid next size: 0x00007f4b7e88e0c0 ***

Issue - State: closed - Opened by lordfinal about 10 years ago - 3 comments

#112 - Scrape data from a pdf document into CSV using docsplit ?

Issue - State: closed - Opened by anil-insonix about 10 years ago - 1 comment

#111 - Using environment vars to allow graphicsmagick and imagemagick

Pull Request - State: closed - Opened by augustf about 10 years ago - 2 comments
Labels: wontfix, change

#109 - libreoffice path in FreeBSD

Issue - State: open - Opened by danniculescu about 10 years ago

#108 - making magic number-based detection of PDFs encoding-friendly, with tests

Pull Request - State: closed - Opened by jonoterc over 10 years ago - 3 comments

#107 - undefined method `strip' for nil:NilClass

Issue - State: closed - Opened by singhkishan over 10 years ago - 1 comment

#106 - "Invalid byte sequence error" on master.

Issue - State: closed - Opened by KurtPreston over 10 years ago - 4 comments

#104 - Add office search path to check vendor folder for use with Heroku and libreoffice buildpack

Pull Request - State: closed - Opened by serene over 10 years ago - 2 comments

#103 - Minor changes

Pull Request - State: open - Opened by tmaier over 10 years ago

#102 - Check if file is PDF by magic number. Closes #98

Pull Request - State: closed - Opened by tmaier over 10 years ago - 3 comments

#101 - Add Gemfile

Pull Request - State: closed - Opened by tmaier over 10 years ago - 5 comments
Labels: enhancement, wontfix

#100 - Allow use of imagemagick with docsplit

Pull Request - State: closed - Opened by augustf over 10 years ago

#97 - Fix for Issue #83: Leading Zeros

Pull Request - State: open - Opened by theredcoder over 10 years ago - 2 comments

#96 - Extracting images from PDF hogs 100% CPU

Issue - State: closed - Opened by tvsignal over 10 years ago - 2 comments

#95 - conversion to PDF mangles non-ASCII characters in docx on Linux

Issue - State: closed - Opened by bobmyers over 10 years ago - 4 comments

#93 - Error converting to images, in 0.7.2 , but works in 0.6.3

Issue - State: closed - Opened by michelson almost 11 years ago - 1 comment

#92 - Enable specification of a config file, and generate hocr output if option set

Pull Request - State: open - Opened by jhosteny almost 11 years ago - 4 comments

#91 - Pad them digits

Pull Request - State: closed - Opened by dannguyen almost 11 years ago - 5 comments

#90 - Extract image or pdf on windows platform bugfix

Pull Request - State: open - Opened by eastxing almost 11 years ago - 1 comment

#88 - Add /usr/lib64 to office_search_paths

Pull Request - State: closed - Opened by elia almost 11 years ago

#86 - extract_text doesn't work for pdf files with Tesseract

Issue - State: closed - Opened by chintanparikh almost 11 years ago - 12 comments

#85 - Default 64-bit installation paths

Pull Request - State: closed - Opened by vanderhoorn about 11 years ago

#84 - Detect page orientation and rotate when necessary

Issue - State: closed - Opened by lukerosiak about 11 years ago - 5 comments
Labels: enhancement

#82 - Clean text without Iconv to support Ruby 2.0

Pull Request - State: closed - Opened by leknarf about 11 years ago - 1 comment

#81 - Add option to generate hOCR output instead of raw text when performing OCR via tesseract

Pull Request - State: closed - Opened by jhosteny about 11 years ago - 4 comments

#77 - Deploy to heroku

Issue - State: open - Opened by josal about 11 years ago - 1 comment

#76 - Unable to extract images using docsplit 0.7.2 in cygwin

Issue - State: open - Opened by bjayaram about 11 years ago - 1 comment

#75 - Add another possible LibreOffice executable path

Pull Request - State: closed - Opened by va7map about 11 years ago - 1 comment

#74 - typo fix for win

Pull Request - State: closed - Opened by sumkincpp about 11 years ago - 2 comments

#73 - Couldn't open file '/tmp/docsplit/filename.pdf': No such file or directory.

Issue - State: closed - Opened by luccasmaso about 11 years ago - 5 comments

#72 - Not saving Unicode (UTF8) characters (accents in other languages)

Issue - State: closed - Opened by robertour over 11 years ago - 4 comments
Labels: question

#70 - No error output

Issue - State: closed - Opened by patroy over 11 years ago - 2 comments
Labels: bug

#68 - new libreoffice has --version

Pull Request - State: closed - Opened by senner over 11 years ago - 4 comments

#65 - Accept non-ascii characters in pdf headers

Pull Request - State: closed - Opened by amalagaura over 11 years ago - 4 comments

#63 - Fix issue 62

Pull Request - State: open - Opened by hderms over 11 years ago - 11 comments

#61 - Ghostscript is needed to use docsplit with PDF files

Pull Request - State: closed - Opened by evanj over 11 years ago - 2 comments

#57 - extract_pages does not use page range "pages" parameter

Issue - State: open - Opened by rajington over 11 years ago - 3 comments

#51 - TextCleaner garbels german umlauts in recognized text

Issue - State: closed - Opened by marcboeker almost 12 years ago - 3 comments

#50 - Various improvements

Pull Request - State: closed - Opened by trevorturk almost 12 years ago

#49 - Spaces in installation path

Pull Request - State: closed - Opened by ineiti almost 12 years ago - 1 comment

#48 - Detect PDF files without .pdf extension using magic number

Pull Request - State: closed - Opened by jeremybmerrill almost 12 years ago - 4 comments
Labels: fixed, change

#46 - Added ability to extract all metadata at once

Pull Request - State: closed - Opened by rajington almost 12 years ago - 1 comment

#45 - Adds recommendation to install poppler-data

Pull Request - State: closed - Opened by alindeman about 12 years ago

#35 - Accept non-ascii characters in pdf headers

Pull Request - State: closed - Opened by stuartf over 12 years ago - 6 comments

#34 - Timeout for PDF extraction from OpenOffice supported document format.

Pull Request - State: open - Opened by vrybas over 12 years ago - 4 comments

#29 - qpdf decryption

Pull Request - State: closed - Opened by palewire over 12 years ago - 1 comment

#28 - OWNER PASSWORD REQUIRED ERROR

Issue - State: closed - Opened by palewire over 12 years ago - 7 comments
Labels: question

#24 - Feature multiple languages

Pull Request - State: closed - Opened by crutch almost 13 years ago - 1 comment

#23 - Make the tests portable

Pull Request - State: closed - Opened by kremso almost 13 years ago

#21 - File --mime-type option unrecognized on CentOS

Pull Request - State: closed - Opened by simeonwillbanks almost 13 years ago

#19 - Add `brew` command to installation part of gh-page.

Pull Request - State: closed - Opened by edtsech almost 13 years ago - 2 comments
Labels: enhancement, fixed

#13 - Inspect file mime-type

Pull Request - State: closed - Opened by simeonwillbanks almost 13 years ago - 4 comments
Labels: change