Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / documentcloud/docsplit issues and pull requests
#160 - "Error: source file could not be loaded"
Issue -
State: open - Opened by Nakilon 3 months ago
#159 - Fix deprecated method File.exists? to File.exist?
Pull Request -
State: open - Opened by tuttiq about 1 year ago
- 7 comments
#158 - ruby 3.2 compatibility
Issue -
State: open - Opened by jwoodrow over 1 year ago
#157 - Documentaion link correction
Pull Request -
State: closed - Opened by anujaware over 2 years ago
- 1 comment
#156 - Docsplit::ExtractionFailed: gm convert: Unable to open file (/tmp/docsplit/58371.pdf) [No such file or directory]
Issue -
State: open - Opened by thanhtoan1196 almost 3 years ago
#155 - Remove tmporary folder for gs conversion
Pull Request -
State: open - Opened by eriko-de about 3 years ago
#154 - Docsplit working on Dev, Staging server but not on Production.
Issue -
State: open - Opened by mumarkhan about 3 years ago
#153 - Docsplit.extract_images(path) => bin/rails: No such file or directory - file
Issue -
State: open - Opened by crusadergo over 4 years ago
- 2 comments
#152 - Docsplit.extract_text generates a String with a null byte
Issue -
State: open - Opened by cedricpim almost 5 years ago
#151 - diskspace leak when extracting text from pdf
Issue -
State: open - Opened by KHMtravel over 5 years ago
- 1 comment
#150 - Add quick and dirty way to add options to tesseract
Pull Request -
State: closed - Opened by mgontav almost 6 years ago
#149 - Fix escaping when extracting text using OCR
Pull Request -
State: open - Opened by floehopper about 6 years ago
#148 - Add page_size to InfoExtractor
Pull Request -
State: open - Opened by dan-jensen about 6 years ago
- 2 comments
#147 - Added support for layout and nopgbrk options when using pdftotext
Pull Request -
State: closed - Opened by prasadsurase about 6 years ago
#146 - update pdftk installer URL for Mac
Pull Request -
State: closed - Opened by tkimnguyen about 6 years ago
#145 - Different behavior on mac and linux
Issue -
State: open - Opened by jbmyid over 6 years ago
#144 - Email address contains more than three special chars(punctuation) is removed by Docsplit.clean_text method
Issue -
State: open - Opened by mraj-rpx over 6 years ago
#143 - Docsplit.extract_text auto orientation detection 'detect_orientation: true' param does not work.
Issue -
State: open - Opened by michaeltranlong almost 7 years ago
#142 - temp fix for output filenames
Pull Request -
State: closed - Opened by deuxshaish over 8 years ago
- 1 comment
#141 - Error "MAGICK_TEMPDIR" no se reconoce como comando interno o externo.
Issue -
State: open - Opened by andresfccy over 8 years ago
#140 - Downsampling has gotten worse in the last year
Issue -
State: open - Opened by reefdog over 8 years ago
#139 - Docsplit::TextExtractor#extract_text should return the path of the output text file?
Issue -
State: open - Opened by nruth over 8 years ago
- 2 comments
#138 - Can any one please tell me how to pass file path as url to Docsplit ?
Issue -
State: closed - Opened by jogiranjith over 8 years ago
- 2 comments
#137 - Executable filename issue with latest version (5.0.4) of LibreOffice on RHEL
Issue -
State: open - Opened by neilneyman over 8 years ago
- 1 comment
#136 - Horizontal / table formatted text
Issue -
State: open - Opened by nofxx over 8 years ago
#135 - rails invalid byte sequence in UTF-8
Issue -
State: open - Opened by fjcaro almost 9 years ago
- 1 comment
#134 - Clean pdffonts output to avoid invalid UTF-8 characters
Pull Request -
State: open - Opened by tbk303 almost 9 years ago
#133 - encoding issue
Issue -
State: closed - Opened by dfang almost 9 years ago
- 1 comment
#132 - Add layout option to keep layout during text extraction
Pull Request -
State: closed - Opened by scarfacedeb about 9 years ago
- 7 comments
#131 - Fix page parsing for command line usage
Pull Request -
State: open - Opened by xavriley about 9 years ago
#130 - "undefined method `strip' for nil:NilClass" occurs when attempting "Docsplit.extract_pdf"
Issue -
State: closed - Opened by mrmanishs about 9 years ago
- 8 comments
#129 - Percent sign in filenames isn't escaped properly
Issue -
State: open - Opened by jeremybmerrill over 9 years ago
#128 - Break PDFs into chunks bigger than 1 page?
Issue -
State: open - Opened by AbeHandler over 9 years ago
- 3 comments
#127 - Extract Link (URL, Goto, etc)
Issue -
State: closed - Opened by dglunz over 9 years ago
- 2 comments
#126 - Adding fail-through to Poppler in ImageExtractor to handle a failing PDF with Quartz annotations
Pull Request -
State: closed - Opened by sergeyk over 9 years ago
#125 - Converting to .doc and .xls files in Plone does not work with latest Libreoffice
Issue -
State: open - Opened by gregory-zero over 9 years ago
#124 - Add parallel processing to OCR text extraction of full documents
Pull Request -
State: open - Opened by ntodd over 9 years ago
- 2 comments
#123 - PDFtk dependency issues with CentOS-7/RHEL-7 | Build Fails | Dependencies libgc Unavailable
Issue -
State: open - Opened by riker1 over 9 years ago
- 47 comments
#122 - Corrupted pdf file from Chinese docx
Issue -
State: closed - Opened by intellisense over 9 years ago
- 2 comments
#121 - Encoding issue - invalid byte sequence in US-ASCII (ArgumentError)
Issue -
State: open - Opened by intellisense over 9 years ago
- 13 comments
#120 - Orientation
Pull Request -
State: closed - Opened by AbeHandler over 9 years ago
- 1 comment
#119 - Fixes bug on empty result for command "#{office_executable} -h 2>#{null}...
Pull Request -
State: closed - Opened by burisu over 9 years ago
#118 - Windows: "%d" is always escaped to "\%d".
Pull Request -
State: closed - Opened by ypxing over 9 years ago
#117 - Add section to documentation regarding the "--language" flag
Issue -
State: closed - Opened by nathanstitt over 9 years ago
- 1 comment
#116 - German umlauts are replaced by ? after OCR
Issue -
State: closed - Opened by tbk303 over 9 years ago
- 6 comments
#115 - No such file or directory @ rb_sysopen - example.doc (Errno::ENOENT)
Issue -
State: open - Opened by jhonc33 almost 10 years ago
- 3 comments
#114 - added functionality to pass pdftotext options
Pull Request -
State: open - Opened by narutosanjiv almost 10 years ago
- 1 comment
#113 - *** glibc detected *** gm: realloc(): invalid next size: 0x00007f4b7e88e0c0 ***
Issue -
State: closed - Opened by lordfinal about 10 years ago
- 3 comments
#112 - Scrape data from a pdf document into CSV using docsplit ?
Issue -
State: closed - Opened by anil-insonix about 10 years ago
- 1 comment
#111 - Using environment vars to allow graphicsmagick and imagemagick
Pull Request -
State: closed - Opened by augustf about 10 years ago
- 2 comments
Labels: wontfix, change
#110 - Docsplit images command - Added the ability to specify the page number delimiter via a command line option
Pull Request -
State: open - Opened by BrandonNoad about 10 years ago
#109 - libreoffice path in FreeBSD
Issue -
State: open - Opened by danniculescu about 10 years ago
#108 - making magic number-based detection of PDFs encoding-friendly, with tests
Pull Request -
State: closed - Opened by jonoterc over 10 years ago
- 3 comments
#107 - undefined method `strip' for nil:NilClass
Issue -
State: closed - Opened by singhkishan over 10 years ago
- 1 comment
#106 - "Invalid byte sequence error" on master.
Issue -
State: closed - Opened by KurtPreston over 10 years ago
- 4 comments
#104 - Add office search path to check vendor folder for use with Heroku and libreoffice buildpack
Pull Request -
State: closed - Opened by serene over 10 years ago
- 2 comments
#103 - Minor changes
Pull Request -
State: open - Opened by tmaier over 10 years ago
#102 - Check if file is PDF by magic number. Closes #98
Pull Request -
State: closed - Opened by tmaier over 10 years ago
- 3 comments
#101 - Add Gemfile
Pull Request -
State: closed - Opened by tmaier over 10 years ago
- 5 comments
Labels: enhancement, wontfix
#100 - Allow use of imagemagick with docsplit
Pull Request -
State: closed - Opened by augustf over 10 years ago
#97 - Fix for Issue #83: Leading Zeros
Pull Request -
State: open - Opened by theredcoder over 10 years ago
- 2 comments
#96 - Extracting images from PDF hogs 100% CPU
Issue -
State: closed - Opened by tvsignal over 10 years ago
- 2 comments
#95 - conversion to PDF mangles non-ASCII characters in docx on Linux
Issue -
State: closed - Opened by bobmyers over 10 years ago
- 4 comments
#93 - Error converting to images, in 0.7.2 , but works in 0.6.3
Issue -
State: closed - Opened by michelson almost 11 years ago
- 1 comment
#92 - Enable specification of a config file, and generate hocr output if option set
Pull Request -
State: open - Opened by jhosteny almost 11 years ago
- 4 comments
#91 - Pad them digits
Pull Request -
State: closed - Opened by dannguyen almost 11 years ago
- 5 comments
#90 - Extract image or pdf on windows platform bugfix
Pull Request -
State: open - Opened by eastxing almost 11 years ago
- 1 comment
#88 - Add /usr/lib64 to office_search_paths
Pull Request -
State: closed - Opened by elia almost 11 years ago
#86 - extract_text doesn't work for pdf files with Tesseract
Issue -
State: closed - Opened by chintanparikh almost 11 years ago
- 12 comments
#85 - Default 64-bit installation paths
Pull Request -
State: closed - Opened by vanderhoorn about 11 years ago
#84 - Detect page orientation and rotate when necessary
Issue -
State: closed - Opened by lukerosiak about 11 years ago
- 5 comments
Labels: enhancement
#82 - Clean text without Iconv to support Ruby 2.0
Pull Request -
State: closed - Opened by leknarf about 11 years ago
- 1 comment
#81 - Add option to generate hOCR output instead of raw text when performing OCR via tesseract
Pull Request -
State: closed - Opened by jhosteny about 11 years ago
- 4 comments
#77 - Deploy to heroku
Issue -
State: open - Opened by josal about 11 years ago
- 1 comment
#76 - Unable to extract images using docsplit 0.7.2 in cygwin
Issue -
State: open - Opened by bjayaram about 11 years ago
- 1 comment
#75 - Add another possible LibreOffice executable path
Pull Request -
State: closed - Opened by va7map about 11 years ago
- 1 comment
#74 - typo fix for win
Pull Request -
State: closed - Opened by sumkincpp about 11 years ago
- 2 comments
#73 - Couldn't open file '/tmp/docsplit/filename.pdf': No such file or directory.
Issue -
State: closed - Opened by luccasmaso about 11 years ago
- 5 comments
#72 - Not saving Unicode (UTF8) characters (accents in other languages)
Issue -
State: closed - Opened by robertour over 11 years ago
- 4 comments
Labels: question
#70 - No error output
Issue -
State: closed - Opened by patroy over 11 years ago
- 2 comments
Labels: bug
#68 - new libreoffice has --version
Pull Request -
State: closed - Opened by senner over 11 years ago
- 4 comments
#65 - Accept non-ascii characters in pdf headers
Pull Request -
State: closed - Opened by amalagaura over 11 years ago
- 4 comments
#63 - Fix issue 62
Pull Request -
State: open - Opened by hderms over 11 years ago
- 11 comments
#61 - Ghostscript is needed to use docsplit with PDF files
Pull Request -
State: closed - Opened by evanj over 11 years ago
- 2 comments
#57 - extract_pages does not use page range "pages" parameter
Issue -
State: open - Opened by rajington over 11 years ago
- 3 comments
#51 - TextCleaner garbels german umlauts in recognized text
Issue -
State: closed - Opened by marcboeker almost 12 years ago
- 3 comments
#50 - Various improvements
Pull Request -
State: closed - Opened by trevorturk almost 12 years ago
#49 - Spaces in installation path
Pull Request -
State: closed - Opened by ineiti almost 12 years ago
- 1 comment
#48 - Detect PDF files without .pdf extension using magic number
Pull Request -
State: closed - Opened by jeremybmerrill almost 12 years ago
- 4 comments
Labels: fixed, change
#46 - Added ability to extract all metadata at once
Pull Request -
State: closed - Opened by rajington almost 12 years ago
- 1 comment
#45 - Adds recommendation to install poppler-data
Pull Request -
State: closed - Opened by alindeman about 12 years ago
#35 - Accept non-ascii characters in pdf headers
Pull Request -
State: closed - Opened by stuartf over 12 years ago
- 6 comments
#34 - Timeout for PDF extraction from OpenOffice supported document format.
Pull Request -
State: open - Opened by vrybas over 12 years ago
- 4 comments
#29 - qpdf decryption
Pull Request -
State: closed - Opened by palewire over 12 years ago
- 1 comment
#28 - OWNER PASSWORD REQUIRED ERROR
Issue -
State: closed - Opened by palewire over 12 years ago
- 7 comments
Labels: question
#24 - Feature multiple languages
Pull Request -
State: closed - Opened by crutch almost 13 years ago
- 1 comment
#23 - Make the tests portable
Pull Request -
State: closed - Opened by kremso almost 13 years ago
#21 - File --mime-type option unrecognized on CentOS
Pull Request -
State: closed - Opened by simeonwillbanks almost 13 years ago
#19 - Add `brew` command to installation part of gh-page.
Pull Request -
State: closed - Opened by edtsech almost 13 years ago
- 2 comments
Labels: enhancement, fixed
#13 - Inspect file mime-type
Pull Request -
State: closed - Opened by simeonwillbanks almost 13 years ago
- 4 comments
Labels: change