GitHub / docling-project/docling issues and pull requests
#1699 - Incorrectly populated ProvenanceItem in ReadingOrderModel
Issue -
State: open - Opened by wallrothm about 2 months ago
Labels: bug, reading_order
#1698 - test: mark flaky test
Pull Request -
State: closed - Opened by vagenas about 2 months ago
- 2 comments
#1697 - Is there a way to plug in my function into the pipeline before it separates everything?
Issue -
State: closed - Opened by patelMaharshii about 2 months ago
- 2 comments
Labels: question
#1696 - Image descriptions not generated for small images in PDF-to-Markdown conversion, but work with PNG inputs
Issue -
State: closed - Opened by frogFred about 2 months ago
- 1 comment
Labels: question
#1695 - Are pictures automatically embedded?
Issue -
State: closed - Opened by patelMaharshii about 2 months ago
- 1 comment
Labels: question
#1694 - Support for macOS x86_64
Issue -
State: open - Opened by dolfim-ibm about 2 months ago
Labels: enhancement
#1693 - How do I add GPU parallelization for the do_formula_enrichment model?
Issue -
State: open - Opened by shahmeer99 about 2 months ago
- 1 comment
Labels: question
#1692 - "Strict" DOCX conversion does not work
Issue -
State: open - Opened by barseghyanartur about 2 months ago
Labels: bug
#1691 - test: ensure utf-8 in test data utils
Pull Request -
State: closed - Opened by vagenas about 2 months ago
- 2 comments
#1690 - Sideways Pages and Rotated Tables - page rotation for correct table orientation.
Issue -
State: open - Opened by Barrsum about 2 months ago
Labels: question
#1689 - fix: docx text box extraction
Pull Request -
State: closed - Opened by AndrewTsai0406 about 2 months ago
- 5 comments
#1688 - fix/docx_text_box_extraction
Pull Request -
State: closed - Opened by AndrewTsai0406 about 2 months ago
- 1 comment
#1687 - allow to change cli default settings
Issue -
State: closed - Opened by wenboown about 2 months ago
- 2 comments
Labels: enhancement
#1686 - Stride - chunk overlap
Issue -
State: open - Opened by BEN-50763 about 2 months ago
- 1 comment
Labels: enhancement, chunker
#1685 - ModuleNotFoundError: No module named 'docling_parse.pdf_parser' with docling 2.34.0 and docling-parse 4.0.1
Issue -
State: open - Opened by RagnorkRnA about 2 months ago
- 2 comments
Labels: bug, pdf parsing
#1684 - Update the layout-model to use D-FINE
Pull Request -
State: open - Opened by nikos-livathinos about 2 months ago
- 1 comment
#1683 - Update the layout-model to use RT-DETRv2
Pull Request -
State: open - Opened by nikos-livathinos about 2 months ago
- 1 comment
#1682 - feature: add support for google docs/files urls
Issue -
State: open - Opened by vtempest about 2 months ago
- 1 comment
Labels: enhancement
#1681 - docling crashes with a list index out of range exception : format = self._get_format_from_run(c.runs[0])
Issue -
State: open - Opened by Poojitha-Mutteneni about 2 months ago
- 2 comments
Labels: bug
#1680 - Coversion of the document contains false positive classified tables
Issue -
State: open - Opened by fifibanana about 2 months ago
- 1 comment
Labels: bug
#1679 - Integration of SLANet-1M draft
Pull Request -
State: closed - Opened by dimitri009 about 2 months ago
- 1 comment
#1678 - Incorrect Table Columns
Issue -
State: open - Opened by fifibanana about 2 months ago
Labels: bug
#1677 - enrich-formula Exception: data did not match any variant of untagged enum ModelWrapper
Issue -
State: open - Opened by wenboown about 2 months ago
- 5 comments
Labels: bug
#1676 - docs: fix typo in index.md
Pull Request -
State: closed - Opened by edi9999 about 2 months ago
- 4 comments
#1675 - How to Avoid Duplicate Table Content in Text Extraction with Docling
Issue -
State: closed - Opened by Sam120204 about 2 months ago
- 1 comment
Labels: question
#1673 - fix: guess HTML content starting with script tag
Pull Request -
State: closed - Opened by ceberam about 2 months ago
- 2 comments
Labels: bug, html
#1669 - GPU-Accelerated Batching for pages of a PDF during Inference
Issue -
State: open - Opened by parin1995 about 2 months ago
- 6 comments
Labels: question
#1668 - Legitimate duplicate text in textbox in docx is being unexpectedly removed
Issue -
State: closed - Opened by xenv about 2 months ago
- 2 comments
Labels: bug
#1664 - fix: pptx line break and space handling
Pull Request -
State: closed - Opened by mawi12345 about 2 months ago
- 2 comments
#1663 - feat: Add visualization of bbox on page with html export.
Pull Request -
State: open - Opened by PeterStaar-IBM about 2 months ago
- 3 comments
#1662 - Chunking and serialization for document formulas
Issue -
State: open - Opened by Haoyuan-L about 2 months ago
Labels: question, chunker
#1661 - Controlled requests to external inference provider.
Issue -
State: closed - Opened by swtb3-ryder about 2 months ago
- 2 comments
Labels: enhancement
#1660 - chore: fix or ignore runtime and deprecation warnings
Pull Request -
State: open - Opened by ceberam about 2 months ago
- 2 comments
#1659 - feat(html): Support in-line anchor tags in HTML texts
Pull Request -
State: open - Opened by krrome 2 months ago
- 3 comments
#1658 - fix: pptx shape order
Pull Request -
State: open - Opened by mawi12345 2 months ago
- 3 comments
#1657 - Runtime Error since v.2.34.0 related to OSD detection
Issue -
State: open - Opened by simonschoe 2 months ago
- 3 comments
Labels: bug
#1656 - vlm support dolphin
Issue -
State: closed - Opened by chenshuichao 2 months ago
- 3 comments
Labels: enhancement, vlm-pipeline
#1655 - HfHubHTTPError when calling DoclingLoader with a pdf file.
Issue -
State: open - Opened by mosharof24 2 months ago
Labels: question
#1654 - Docling getting killed when i feed bigger pdf files which have 900+ pages
Issue -
State: open - Opened by Greatz08 2 months ago
- 7 comments
Labels: question
#1653 - Tables recognized as images
Issue -
State: open - Opened by philipplelidis 2 months ago
Labels: bug, layout
#1652 - Extracted list but the sequence and result incorrect in PDF
Issue -
State: open - Opened by ayowu1981 2 months ago
Labels: bug
#1650 - Apparently simple pdf file totally destroyed by docling
Issue -
State: open - Opened by caa24 2 months ago
Labels: bug
#1649 - CUDA support
Issue -
State: open - Opened by kalle07 2 months ago
- 6 comments
Labels: bug, accelerators
#1648 - Mean of Empty Slice During Conversion of PDF
Issue -
State: closed - Opened by swtb3-ryder 2 months ago
- 12 comments
Labels: bug
#1647 - Ollama Remote VLM Raises Validation Errors
Issue -
State: open - Opened by swtb3-ryder 2 months ago
- 3 comments
Labels: bug
#1646 - How to get tables from chunks?
Issue -
State: closed - Opened by redvedev 2 months ago
- 1 comment
Labels: question
#1645 - KeyError and RuntimeError occurred when opening a document(docx)
Issue -
State: open - Opened by chaos798 2 months ago
- 2 comments
Labels: bug, docx
#1644 - Rounded OCR boudingbox and strict intersection judgement causes dropping OCR Textcells
Issue -
State: open - Opened by Bill-XU 2 months ago
- 3 comments
Labels: bug
#1643 - Wrongly assigned indices of TextCells in RapidOcrModel cause dropping celles in LayoutPostProcessor
Issue -
State: open - Opened by Bill-XU 2 months ago
- 4 comments
Labels: bug, layout
#1642 - RuntimeWarninig
Issue -
State: open - Opened by Sam120204 2 months ago
- 5 comments
Labels: bug
#1641 - []
Issue -
State: closed - Opened by redvedev 2 months ago
Labels: question
#1640 - Extracted Formula not correct for PDF
Issue -
State: open - Opened by ayowu1981 2 months ago
Labels: bug
#1639 - ConversionError: Input document with "°"in its name not valid.
Issue -
State: closed - Opened by theauAg 2 months ago
- 2 comments
Labels: bug
#1638 - Downloading detection model, please wait. This may take several minutes depending upon your network connection.
Issue -
State: open - Opened by Fanzaijun 2 months ago
- 3 comments
Labels: bug
#1637 - Legacy .doc support
Issue -
State: open - Opened by 07pepa 2 months ago
Labels: enhancement
#1636 - fix: fix ZeroDivisionError for cell_bbox.area()
Pull Request -
State: closed - Opened by Saidgurbuz 2 months ago
- 2 comments
#1635 - RapidOcr causes merging of text while parsing.
Issue -
State: open - Opened by vishaldasnewtide 2 months ago
- 3 comments
Labels: bug
#1634 - Dependency problem in pip instali
Issue -
State: closed - Opened by heyday097 2 months ago
- 2 comments
Labels: bug
#1633 - cell_bbox.intersection_area_with(bbox) / cell_bbox.area() ZeroDivisionError: float division by zero
Issue -
State: closed - Opened by Sam120204 2 months ago
- 1 comment
Labels: bug
#1631 - Parsing arxiv papers got "<!-- formula-not-decoded -->" in the chunk
Issue -
State: closed - Opened by Haoyuan-L 2 months ago
- 1 comment
Labels: question
#1630 - How to stop downloading the models automatically while installing docling in the build pipeline step/docker image/locally?
Issue -
State: closed - Opened by Sudhakar17 2 months ago
- 3 comments
Labels: question
#1628 - max_tokens parameter ignored when using default tokenizer.
Issue -
State: closed - Opened by ckanaar 2 months ago
- 1 comment
Labels: bug, chunker
#1627 - How good are the layout models?
Issue -
State: closed - Opened by Ulipenitz 2 months ago
- 1 comment
Labels: question
#1626 - Is it possible to change the Layout model? DocLayout-YOLO_ft outperforms
Issue -
State: open - Opened by dtau00 2 months ago
Labels: question
#1625 - Footer Text Interferes with Main PDF Content During Parsing
Issue -
State: open - Opened by sumittagadiya 2 months ago
- 3 comments
Labels: question
#1624 - Poor performance on pages with many elements
Issue -
State: open - Opened by avp-temp 2 months ago
- 1 comment
Labels: enhancement, table structure, performance
#1622 - [Feature Request] Add ByteDance/Dolphin model for Docling
Issue -
State: open - Opened by NeroHin 2 months ago
- 8 comments
Labels: enhancement, vlm-pipeline
#1620 - The VLM example in the documentation doesn't work
Issue -
State: closed - Opened by DavidNemeskey 2 months ago
- 5 comments
Labels: bug
#1618 - EasyDOC, Tesseract and RapidOCR do not have the same (more or less) docling json output. EasyDoc misses text.
Issue -
State: closed - Opened by PietFourie 2 months ago
- 3 comments
Labels: bug, ocr
#1614 - Refined layout parsing
Issue -
State: open - Opened by geoHeil 2 months ago
- 8 comments
Labels: enhancement, layout
#1613 - Difficulty Extracting Nested Tables Within Table Cells
Issue -
State: open - Opened by ricdurvin 2 months ago
Labels: bug
#1612 - Partial or no data extracted from single-row tables without headers
Issue -
State: open - Opened by ricdurvin 2 months ago
Labels: bug
#1611 - Incorrect Line Splitting in Table Cell Extraction
Issue -
State: open - Opened by ricdurvin 2 months ago
Labels: bug
#1610 - fix(msword_backend): Identify text in the same line after an image #1425
Pull Request -
State: open - Opened by mkrssg 2 months ago
- 15 comments
#1609 - fix: Fix issue with detecting docx files, and files with upper case extensions
Pull Request -
State: open - Opened by MoheyEl-DinBadr 2 months ago
- 2 comments
#1608 - Image descriptions into markdown file
Issue -
State: open - Opened by mmb78 2 months ago
- 1 comment
Labels: question
#1607 - Detect icon ‘@'
Issue -
State: open - Opened by carminoplata 2 months ago
- 1 comment
Labels: question, pdf parsing
#1606 - TABs instead of whitespace
Issue -
State: open - Opened by mmb78 2 months ago
Labels: question
#1604 - Bold list entries are not converted correctly
Issue -
State: open - Opened by sabotrax 2 months ago
- 1 comment
Labels: bug
#1603 - Installation MAC M1 faiiling
Issue -
State: open - Opened by MalekJabri 2 months ago
- 2 comments
Labels: bug
#1602 - Performance issue for PDF when running inside Docker container
Issue -
State: open - Opened by vikasr111 2 months ago
- 1 comment
Labels: bug
#1601 - References to Exported Tables
Issue -
State: open - Opened by spencerd42 2 months ago
- 1 comment
Labels: question
#1600 - fix: handles line breaks in pptx table cells to markdown
Pull Request -
State: closed - Opened by georgehgfonseca 2 months ago
- 8 comments
#1599 - export_to_markdown seems not working
Issue -
State: closed - Opened by gsantopaolo 2 months ago
- 1 comment
Labels: bug
#1597 - Error while deserializing header: HeaderTooLarge using docling==2.28.4
Issue -
State: open - Opened by abhishekpandey3 2 months ago
Labels: bug
#1596 - chore: fix chunking example data link
Pull Request -
State: closed - Opened by vagenas 2 months ago
- 2 comments
#1595 - Allow set fixed image size instead of scale in ApiVlmOptions
Issue -
State: open - Opened by shkarupa-alex 2 months ago
Labels: enhancement
#1594 - On Rocm/HIP the automated MultiScaleDeformableAttention build attempt fails
Issue -
State: closed - Opened by Iiiiiiigor 2 months ago
- 1 comment
Labels: bug
#1593 - feat: add fallback_lang support in TesseractOcrCliModel
Pull Request -
State: closed - Opened by IoannisMaras 2 months ago
- 8 comments
Labels: ocr
#1592 - InternVL3 Unrecognized configuration class (Hugging Face VLM)
Issue -
State: closed - Opened by JohannKaspar 2 months ago
- 2 comments
Labels: bug
#1591 - Support NumPy 2.x in docling and its dependencies
Issue -
State: closed - Opened by karimmila 2 months ago
- 3 comments
Labels: bug
#1590 - Install issue with conda forge
Issue -
State: open - Opened by bextra 2 months ago
- 1 comment
Labels: bug
#1589 - docs: add advanced chunking & serialization example
Pull Request -
State: closed - Opened by vagenas 2 months ago
- 2 comments
#1587 - feat: Picture description using context with surrounding text
Pull Request -
State: open - Opened by rafaeltuelho 2 months ago
- 1 comment
#1586 - Failed to convert text that appears in automatic numbering format in .docx
Issue -
State: open - Opened by alexshmmy 2 months ago
- 4 comments
Labels: bug, docx
#1584 - Inconsistent Markdown Output with generate_multimodal_pages Method in Docling
Issue -
State: open - Opened by Yash8745 2 months ago
Labels: bug
#1583 - feat: table enrichments - description and indexing
Pull Request -
State: open - Opened by shivanikabu 2 months ago
- 6 comments
#1582 - fix: click_ dependency and update lock file
Pull Request -
State: closed - Opened by dolfim-ibm 2 months ago
- 2 comments
#1581 - Pdf2parquet improvement and optimisation
Issue -
State: open - Opened by ShiroYasha18 2 months ago
Labels: enhancement
#1580 - No texts are extracted
Issue -
State: closed - Opened by oprince 2 months ago
- 1 comment
Labels: bug, html