GitHub / aws-samples/amazon-textract-textractor issues and pull requests
#432 - Fix typo in `InputError` raised by `Textractor` constructor
Pull Request -
State: open - Opened by dross20 about 1 month ago
#431 - Typo in `InputError` raised by `Textractor` constructor
Issue -
State: open - Opened by dross20 about 1 month ago
#430 - [textractprettyprinter] does not return the last row of a table when using get_text_from_layout_json
Issue -
State: open - Opened by abouberthe about 1 month ago
#428 - Inconsistent New Line Character Reading [Windows VS Linux] (table.to_pandas())
Issue -
State: open - Opened by storesace-jorgelopes 2 months ago
#427 - blank page contained in a document not handled by `get_layout_csv_from_trp2`
Issue -
State: open - Opened by fdejax90 3 months ago
#426 - Prebuilt Lambda layer with Pandas too large
Issue -
State: open - Opened by Smef 3 months ago
#425 - Ignore KV elements in LAYOUT_LIST
Pull Request -
State: closed - Opened by Belval 3 months ago
#424 - Reproducible failure of to_markdown() - object has no attribute 'reading_order'
Issue -
State: open - Opened by rstrahan 4 months ago
#423 - Fix s3 client instantiation in _get_document_images_from_path
Pull Request -
State: closed - Opened by Belval 4 months ago
#422 - Intermittent failure of to_markdown() in lambda
Issue -
State: open - Opened by samiam376 4 months ago
- 5 comments
#421 - Unable to set endpoint_url for clients
Issue -
State: closed - Opened by couryrr 4 months ago
- 6 comments
#420 - Landscape tables result in jumbled text when using extractor.start_document_analysis with TextractFeatures.TABLES
Issue -
State: open - Opened by gertct 5 months ago
- 3 comments
Labels: bug
#419 - Replace logging calls with module logger
Pull Request -
State: closed - Opened by Belval 5 months ago
#418 - Fix short ids using long ids and vice-versa
Pull Request -
State: closed - Opened by Belval 5 months ago
#417 - Add function to split larger element at table insertion
Pull Request -
State: closed - Opened by Belval 5 months ago
#416 - "Request has invalid parameters" when query strings contain any double quotes
Issue -
State: open - Opened by rarifin-134 5 months ago
#415 - Use iou algorithm to find most likely table
Pull Request -
State: open - Opened by k-agau 6 months ago
#414 - Question: extract only one language content in a dual language document using aws textract
Issue -
State: open - Opened by SpAcY001 7 months ago
- 1 comment
#413 - Incorrect id Assignment in `add_id_to_html_tag` Function
Issue -
State: closed - Opened by dr-rafis 7 months ago
- 1 comment
#412 - [textract-pretty-printer] add null check to get_text_from_layout_json parsing
Pull Request -
State: open - Opened by neil-sola 8 months ago
#411 - `get_text_from_layout_json` throws `'NoneType' object is not subscriptable` for a specific PDF
Issue -
State: open - Opened by neil-sola 8 months ago
- 1 comment
#410 - Bounding box is incorrect for text converted from Markdown.
Issue -
State: open - Opened by dharnieshraja 8 months ago
#409 - error: Textractor.detect_document_text() got an unexpected keyword argument 's3_output_path'
Issue -
State: closed - Opened by elbbub 8 months ago
#408 - Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can
Issue -
State: open - Opened by arsher-b 8 months ago
- 1 comment
#407 - add sep to kv_to_csv
Pull Request -
State: closed - Opened by DGarbs51 9 months ago
- 2 comments
#406 - KeyError: 'Relationships'
Issue -
State: closed - Opened by lucio-xelda 9 months ago
- 6 comments
Labels: bug
#405 - new feature: export_kv_to_pandas
Pull Request -
State: closed - Opened by Chuukwudi 9 months ago
- 1 comment
#404 - Add check for None bounding boxes for AnalyzeExpense
Pull Request -
State: closed - Opened by Belval 9 months ago
#403 - Remove broken builds for Py38, Py39, Py310, Py311
Pull Request -
State: closed - Opened by Belval 9 months ago
#402 - Allow Custom Separator in `Document.export_kv_to_csv()`
Pull Request -
State: closed - Opened by Chuukwudi 9 months ago
- 1 comment
#401 - analyze_expense error: 'NoneType' object has no attribute 'spatial_object'
Issue -
State: open - Opened by arsher-b 9 months ago
#399 - lambda layers builds are broker
Issue -
State: closed - Opened by gauravthadani 10 months ago
- 2 comments
#398 - Fix linearize layout when Block Entity Types are `None`
Pull Request -
State: closed - Opened by BPDanek 10 months ago
- 2 comments
#397 - The invoice number won’t be detected if there is no space between the label and the value
Issue -
State: closed - Opened by arsher-b 10 months ago
- 1 comment
#396 - Fix list content duplication
Pull Request -
State: open - Opened by adityachandak287 11 months ago
Labels: pretty-printer
#395 - Fix invalid escape in BoundingBox docstring
Pull Request -
State: open - Opened by simonschmidt 11 months ago
#394 - Update analyze_document type hint
Pull Request -
State: closed - Opened by ryangamble 11 months ago
- 1 comment
#393 - Added config parameter
Pull Request -
State: open - Opened by akhilnarayanan1 11 months ago
- 2 comments
#392 - Add figure
Pull Request -
State: open - Opened by ENsu 11 months ago
Labels: pretty-printer
#391 - [textractprettyprinter] List contents are duplicated when generating text output using `get_text_from_layout_json`
Issue -
State: open - Opened by adityachandak287 11 months ago
- 5 comments
#390 - Support for `NotificationChannel` in Textract Caller's Async Methods
Issue -
State: closed - Opened by azucker99 11 months ago
- 2 comments
#389 - Incorrect order of text layouts due to compare_bounding_box() used in group_elements_horizontally()
Issue -
State: open - Opened by keitaf 11 months ago
- 3 comments
Labels: need repro
#388 - issue with ordering in extractions, markdown and gettext methods
Issue -
State: open - Opened by red-sky17 12 months ago
- 13 comments
#387 - Escape html output
Pull Request -
State: closed - Opened by Belval 12 months ago
#386 - Id in html output
Pull Request -
State: closed - Opened by Belval 12 months ago
#385 - Detected in EXPENSE_ROW but not as ITEM
Issue -
State: open - Opened by arsher-b 12 months ago
- 1 comment
#384 - Trouble replicating markdown output
Issue -
State: open - Opened by bvbg1 about 1 year ago
- 9 comments
Labels: need repro
#382 - Save image doesn't work with S3 path - TypeError: Invalid input type 'bytearray'
Issue -
State: closed - Opened by steffeng about 1 year ago
- 3 comments
Labels: bug
#381 - Fix .to_markdown() raising an exception on missing local config
Pull Request -
State: closed - Opened by Belval about 1 year ago
#380 - issue regarding .to_markdown() method
Issue -
State: closed - Opened by red-sky17 about 1 year ago
- 4 comments
#379 - fix(expense): Expenses with no summary fields
Pull Request -
State: closed - Opened by athewsey about 1 year ago
#378 - Replace region mismatch with invalid S3 object
Pull Request -
State: closed - Opened by Belval about 1 year ago
#377 - Improve error message that identified InvalidS3ObjectException as RegionMismatch
Pull Request -
State: closed - Opened by Belval about 1 year ago
#376 - Use pypdfium2 for PDF rasterizing when possible
Pull Request -
State: closed - Opened by Belval about 1 year ago
#375 - Allow PDF in for DetectDocumentText and AnalyzeDocument
Pull Request -
State: closed - Opened by Belval about 1 year ago
#374 - Improve HTML linearization
Pull Request -
State: closed - Opened by Belval about 1 year ago
#373 - Lambda layers for Python 3.12 PDF raising an exception on missing libpng16.so.16
Issue -
State: open - Opened by Viajante80 about 1 year ago
- 3 comments
Labels: bug
#372 - Lambda layers for Python 3.12 raising an exception on missing libopenjp2.so.7
Issue -
State: closed - Opened by Belval about 1 year ago
#371 - Is search_words() broken?
Issue -
State: open - Opened by ttruong-gilead about 1 year ago
- 2 comments
#370 - Empty expense_documents on analyze_expense
Issue -
State: closed - Opened by arsher-b about 1 year ago
- 3 comments
Labels: bug
#369 - Incorrect table cell word and line order
Issue -
State: open - Opened by wessens about 1 year ago
- 3 comments
Labels: bug, enhancement
#368 - 'NoneType' object has no attribute 'spatial_object' on Expense Analysis results
Issue -
State: open - Opened by HarryTSaban about 1 year ago
#367 - Use module name for logger instead of Root Logger
Issue -
State: open - Opened by michaelshum321 about 1 year ago
- 7 comments
Labels: enhancement
#366 - pdf2image is required even though save_image=False
Issue -
State: open - Opened by vdefeo-caylent about 1 year ago
#365 - prefix and suffix for footer layout is not available
Issue -
State: open - Opened by LeoHemamou about 1 year ago
#364 - Exception handling is hiding the underlying issue of the error.
Issue -
State: open - Opened by vdefeo-caylent about 1 year ago
- 2 comments
Labels: bug
#363 - Add confidence scores at the DocumentEntity level
Pull Request -
State: closed - Opened by Belval about 1 year ago
#362 - Add figure layout prefix and suffix
Pull Request -
State: closed - Opened by Belval about 1 year ago
#361 - feature request: add query alias parameter
Issue -
State: closed - Opened by parad0x96 about 1 year ago
- 2 comments
Labels: documentation
#360 - Fix missing figures
Pull Request -
State: closed - Opened by Belval over 1 year ago
#359 - Access Non-Axis-Aligned Bounding Boxes
Issue -
State: open - Opened by zkalson over 1 year ago
- 2 comments
Labels: enhancement
#358 - Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output
Issue -
State: open - Opened by raidken over 1 year ago
- 1 comment
Labels: bug
#357 - Update function doc and return type
Pull Request -
State: closed - Opened by andrewkowalik over 1 year ago
- 1 comment
#356 - issue with extraction, get_text_fromlayout_json function
Issue -
State: closed - Opened by red-sky17 over 1 year ago
- 1 comment
Labels: question
#355 - cell content extraction error
Issue -
State: open - Opened by Larbo53 over 1 year ago
- 2 comments
Labels: question
#354 - Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
Pull Request -
State: closed - Opened by Belval over 1 year ago
#353 - [Feature Request] Simplified batch processing CLI
Issue -
State: open - Opened by athewsey over 1 year ago
- 1 comment
Labels: enhancement
#352 - Cryptic CLI error in SageMaker Studio (and probably other role-based environments?)
Issue -
State: open - Opened by athewsey over 1 year ago
- 1 comment
Labels: bug
#351 - Python Support for Column Headers
Issue -
State: open - Opened by Belval over 1 year ago
#349 - ensure cell block has text element
Pull Request -
State: closed - Opened by qeternity over 1 year ago
- 4 comments
#348 - KeyError in get_lines_string
Issue -
State: open - Opened by sbui-dev over 1 year ago
#347 - Exporting text+tables while maintaining layout
Issue -
State: open - Opened by austinmw over 1 year ago
- 1 comment
#346 - Fixes issue #345 : S3 path parser
Pull Request -
State: closed - Opened by anjanvb over 1 year ago
#345 - S3 path parsing for textractcaller is not robust enough
Issue -
State: open - Opened by anjanvb over 1 year ago
#344 - GH issue #343: Added key check
Pull Request -
State: closed - Opened by dzmitry-kankalovich over 1 year ago
- 7 comments
Labels: pretty-printer
#343 - KeyError: 'Text' - on documents with tables
Issue -
State: closed - Opened by dzmitry-kankalovich over 1 year ago
- 2 comments
#342 - Set JPEG compression parameters
Pull Request -
State: closed - Opened by Belval over 1 year ago
#341 - JPEG conversion in `analyze_document` significantly impacts table predictions
Issue -
State: open - Opened by Belval over 1 year ago
- 1 comment
Labels: bug
#340 - Handle None bounding box when parsing Queries
Pull Request -
State: closed - Opened by Belval over 1 year ago
#339 - Handle null EntityTypes
Pull Request -
State: closed - Opened by Belval over 1 year ago
#338 - Textractor import error
Issue -
State: closed - Opened by umaaaaaaaaa over 1 year ago
- 1 comment
#337 - Large PDF response processing is slow
Issue -
State: open - Opened by Belval over 1 year ago
Labels: enhancement, latency
#336 - Proper way of getting cell content?
Issue -
State: open - Opened by ttruong-gilead over 1 year ago
- 5 comments
#335 - Parsing response from a start_document_analysis()
Issue -
State: closed - Opened by ttruong-gilead over 1 year ago
- 2 comments
#334 - Add missing entities in docs
Pull Request -
State: closed - Opened by Belval over 1 year ago
#333 - Error in get_layout_text_from_json in textractprettyprinter
Issue -
State: open - Opened by gwynethguo over 1 year ago
#332 - Add CITATION.cff
Pull Request -
State: closed - Opened by Belval over 1 year ago
#331 - Missing CITATION.cff file for repo
Issue -
State: closed - Opened by mhucka over 1 year ago
- 1 comment
#330 - Convert image to RGB in EntityList for Jupyter compatibility
Pull Request -
State: closed - Opened by Belval over 1 year ago
- 1 comment
#329 - Return query and query answer with get_text()
Pull Request -
State: open - Opened by Belval over 1 year ago