An open API service for providing issue and pull request metadata for open source projects.

GitHub / docling-project/docling issues and pull requests

#1699 - Incorrectly populated ProvenanceItem in ReadingOrderModel

Issue - State: open - Opened by wallrothm about 2 months ago
Labels: bug, reading_order

#1698 - test: mark flaky test

Pull Request - State: closed - Opened by vagenas about 2 months ago - 2 comments

#1697 - Is there a way to plug in my function into the pipeline before it separates everything?

Issue - State: closed - Opened by patelMaharshii about 2 months ago - 2 comments
Labels: question

#1696 - Image descriptions not generated for small images in PDF-to-Markdown conversion, but work with PNG inputs

Issue - State: closed - Opened by frogFred about 2 months ago - 1 comment
Labels: question

#1695 - Are pictures automatically embedded?

Issue - State: closed - Opened by patelMaharshii about 2 months ago - 1 comment
Labels: question

#1694 - Support for macOS x86_64

Issue - State: open - Opened by dolfim-ibm about 2 months ago
Labels: enhancement

#1693 - How do I add GPU parallelization for the do_formula_enrichment model?

Issue - State: open - Opened by shahmeer99 about 2 months ago - 1 comment
Labels: question

#1692 - "Strict" DOCX conversion does not work

Issue - State: open - Opened by barseghyanartur about 2 months ago
Labels: bug

#1691 - test: ensure utf-8 in test data utils

Pull Request - State: closed - Opened by vagenas about 2 months ago - 2 comments

#1690 - Sideways Pages and Rotated Tables - page rotation for correct table orientation.

Issue - State: open - Opened by Barrsum about 2 months ago
Labels: question

#1689 - fix: docx text box extraction

Pull Request - State: closed - Opened by AndrewTsai0406 about 2 months ago - 5 comments

#1688 - fix/docx_text_box_extraction

Pull Request - State: closed - Opened by AndrewTsai0406 about 2 months ago - 1 comment

#1687 - allow to change cli default settings

Issue - State: closed - Opened by wenboown about 2 months ago - 2 comments
Labels: enhancement

#1686 - Stride - chunk overlap

Issue - State: open - Opened by BEN-50763 about 2 months ago - 1 comment
Labels: enhancement, chunker

#1685 - ModuleNotFoundError: No module named 'docling_parse.pdf_parser' with docling 2.34.0 and docling-parse 4.0.1

Issue - State: open - Opened by RagnorkRnA about 2 months ago - 2 comments
Labels: bug, pdf parsing

#1684 - Update the layout-model to use D-FINE

Pull Request - State: open - Opened by nikos-livathinos about 2 months ago - 1 comment

#1683 - Update the layout-model to use RT-DETRv2

Pull Request - State: open - Opened by nikos-livathinos about 2 months ago - 1 comment

#1682 - feature: add support for google docs/files urls

Issue - State: open - Opened by vtempest about 2 months ago - 1 comment
Labels: enhancement

#1680 - Coversion of the document contains false positive classified tables

Issue - State: open - Opened by fifibanana about 2 months ago - 1 comment
Labels: bug

#1679 - Integration of SLANet-1M draft

Pull Request - State: closed - Opened by dimitri009 about 2 months ago - 1 comment

#1678 - Incorrect Table Columns

Issue - State: open - Opened by fifibanana about 2 months ago
Labels: bug

#1677 - enrich-formula Exception: data did not match any variant of untagged enum ModelWrapper

Issue - State: open - Opened by wenboown about 2 months ago - 5 comments
Labels: bug

#1676 - docs: fix typo in index.md

Pull Request - State: closed - Opened by edi9999 about 2 months ago - 4 comments

#1675 - How to Avoid Duplicate Table Content in Text Extraction with Docling

Issue - State: closed - Opened by Sam120204 about 2 months ago - 1 comment
Labels: question

#1673 - fix: guess HTML content starting with script tag

Pull Request - State: closed - Opened by ceberam about 2 months ago - 2 comments
Labels: bug, html

#1669 - GPU-Accelerated Batching for pages of a PDF during Inference

Issue - State: open - Opened by parin1995 about 2 months ago - 6 comments
Labels: question

#1668 - Legitimate duplicate text in textbox in docx is being unexpectedly removed

Issue - State: closed - Opened by xenv about 2 months ago - 2 comments
Labels: bug

#1664 - fix: pptx line break and space handling

Pull Request - State: closed - Opened by mawi12345 about 2 months ago - 2 comments

#1663 - feat: Add visualization of bbox on page with html export.

Pull Request - State: open - Opened by PeterStaar-IBM about 2 months ago - 3 comments

#1662 - Chunking and serialization for document formulas

Issue - State: open - Opened by Haoyuan-L about 2 months ago
Labels: question, chunker

#1661 - Controlled requests to external inference provider.

Issue - State: closed - Opened by swtb3-ryder about 2 months ago - 2 comments
Labels: enhancement

#1660 - chore: fix or ignore runtime and deprecation warnings

Pull Request - State: open - Opened by ceberam about 2 months ago - 2 comments

#1659 - feat(html): Support in-line anchor tags in HTML texts

Pull Request - State: open - Opened by krrome 2 months ago - 3 comments

#1658 - fix: pptx shape order

Pull Request - State: open - Opened by mawi12345 2 months ago - 3 comments

#1657 - Runtime Error since v.2.34.0 related to OSD detection

Issue - State: open - Opened by simonschoe 2 months ago - 3 comments
Labels: bug

#1656 - vlm support dolphin

Issue - State: closed - Opened by chenshuichao 2 months ago - 3 comments
Labels: enhancement, vlm-pipeline

#1655 - HfHubHTTPError when calling DoclingLoader with a pdf file.

Issue - State: open - Opened by mosharof24 2 months ago
Labels: question

#1654 - Docling getting killed when i feed bigger pdf files which have 900+ pages

Issue - State: open - Opened by Greatz08 2 months ago - 7 comments
Labels: question

#1653 - Tables recognized as images

Issue - State: open - Opened by philipplelidis 2 months ago
Labels: bug, layout

#1652 - Extracted list but the sequence and result incorrect in PDF

Issue - State: open - Opened by ayowu1981 2 months ago
Labels: bug

#1650 - Apparently simple pdf file totally destroyed by docling

Issue - State: open - Opened by caa24 2 months ago
Labels: bug

#1649 - CUDA support

Issue - State: open - Opened by kalle07 2 months ago - 6 comments
Labels: bug, accelerators

#1648 - Mean of Empty Slice During Conversion of PDF

Issue - State: closed - Opened by swtb3-ryder 2 months ago - 12 comments
Labels: bug

#1647 - Ollama Remote VLM Raises Validation Errors

Issue - State: open - Opened by swtb3-ryder 2 months ago - 3 comments
Labels: bug

#1646 - How to get tables from chunks?

Issue - State: closed - Opened by redvedev 2 months ago - 1 comment
Labels: question

#1645 - KeyError and RuntimeError occurred when opening a document(docx)

Issue - State: open - Opened by chaos798 2 months ago - 2 comments
Labels: bug, docx

#1644 - Rounded OCR boudingbox and strict intersection judgement causes dropping OCR Textcells

Issue - State: open - Opened by Bill-XU 2 months ago - 3 comments
Labels: bug

#1643 - Wrongly assigned indices of TextCells in RapidOcrModel cause dropping celles in LayoutPostProcessor

Issue - State: open - Opened by Bill-XU 2 months ago - 4 comments
Labels: bug, layout

#1642 - RuntimeWarninig

Issue - State: open - Opened by Sam120204 2 months ago - 5 comments
Labels: bug

#1641 - []

Issue - State: closed - Opened by redvedev 2 months ago
Labels: question

#1640 - Extracted Formula not correct for PDF

Issue - State: open - Opened by ayowu1981 2 months ago
Labels: bug

#1639 - ConversionError: Input document with "°"in its name not valid.

Issue - State: closed - Opened by theauAg 2 months ago - 2 comments
Labels: bug

#1637 - Legacy .doc support

Issue - State: open - Opened by 07pepa 2 months ago
Labels: enhancement

#1636 - fix: fix ZeroDivisionError for cell_bbox.area()

Pull Request - State: closed - Opened by Saidgurbuz 2 months ago - 2 comments

#1635 - RapidOcr causes merging of text while parsing.

Issue - State: open - Opened by vishaldasnewtide 2 months ago - 3 comments
Labels: bug

#1634 - Dependency problem in pip instali

Issue - State: closed - Opened by heyday097 2 months ago - 2 comments
Labels: bug

#1631 - Parsing arxiv papers got "<!-- formula-not-decoded -->" in the chunk

Issue - State: closed - Opened by Haoyuan-L 2 months ago - 1 comment
Labels: question

#1628 - max_tokens parameter ignored when using default tokenizer.

Issue - State: closed - Opened by ckanaar 2 months ago - 1 comment
Labels: bug, chunker

#1627 - How good are the layout models?

Issue - State: closed - Opened by Ulipenitz 2 months ago - 1 comment
Labels: question

#1626 - Is it possible to change the Layout model? DocLayout-YOLO_ft outperforms

Issue - State: open - Opened by dtau00 2 months ago
Labels: question

#1625 - Footer Text Interferes with Main PDF Content During Parsing

Issue - State: open - Opened by sumittagadiya 2 months ago - 3 comments
Labels: question

#1624 - Poor performance on pages with many elements

Issue - State: open - Opened by avp-temp 2 months ago - 1 comment
Labels: enhancement, table structure, performance

#1622 - [Feature Request] Add ByteDance/Dolphin model for Docling

Issue - State: open - Opened by NeroHin 2 months ago - 8 comments
Labels: enhancement, vlm-pipeline

#1620 - The VLM example in the documentation doesn't work

Issue - State: closed - Opened by DavidNemeskey 2 months ago - 5 comments
Labels: bug

#1614 - Refined layout parsing

Issue - State: open - Opened by geoHeil 2 months ago - 8 comments
Labels: enhancement, layout

#1613 - Difficulty Extracting Nested Tables Within Table Cells

Issue - State: open - Opened by ricdurvin 2 months ago
Labels: bug

#1612 - Partial or no data extracted from single-row tables without headers

Issue - State: open - Opened by ricdurvin 2 months ago
Labels: bug

#1611 - Incorrect Line Splitting in Table Cell Extraction

Issue - State: open - Opened by ricdurvin 2 months ago
Labels: bug

#1610 - fix(msword_backend): Identify text in the same line after an image #1425

Pull Request - State: open - Opened by mkrssg 2 months ago - 15 comments

#1608 - Image descriptions into markdown file

Issue - State: open - Opened by mmb78 2 months ago - 1 comment
Labels: question

#1607 - Detect icon ‘@'

Issue - State: open - Opened by carminoplata 2 months ago - 1 comment
Labels: question, pdf parsing

#1606 - TABs instead of whitespace

Issue - State: open - Opened by mmb78 2 months ago
Labels: question

#1604 - Bold list entries are not converted correctly

Issue - State: open - Opened by sabotrax 2 months ago - 1 comment
Labels: bug

#1603 - Installation MAC M1 faiiling

Issue - State: open - Opened by MalekJabri 2 months ago - 2 comments
Labels: bug

#1602 - Performance issue for PDF when running inside Docker container

Issue - State: open - Opened by vikasr111 2 months ago - 1 comment
Labels: bug

#1601 - References to Exported Tables

Issue - State: open - Opened by spencerd42 2 months ago - 1 comment
Labels: question

#1600 - fix: handles line breaks in pptx table cells to markdown

Pull Request - State: closed - Opened by georgehgfonseca 2 months ago - 8 comments

#1599 - export_to_markdown seems not working

Issue - State: closed - Opened by gsantopaolo 2 months ago - 1 comment
Labels: bug

#1596 - chore: fix chunking example data link

Pull Request - State: closed - Opened by vagenas 2 months ago - 2 comments

#1595 - Allow set fixed image size instead of scale in ApiVlmOptions

Issue - State: open - Opened by shkarupa-alex 2 months ago
Labels: enhancement

#1594 - On Rocm/HIP the automated MultiScaleDeformableAttention build attempt fails

Issue - State: closed - Opened by Iiiiiiigor 2 months ago - 1 comment
Labels: bug

#1593 - feat: add fallback_lang support in TesseractOcrCliModel

Pull Request - State: closed - Opened by IoannisMaras 2 months ago - 8 comments
Labels: ocr

#1592 - InternVL3 Unrecognized configuration class (Hugging Face VLM)

Issue - State: closed - Opened by JohannKaspar 2 months ago - 2 comments
Labels: bug

#1591 - Support NumPy 2.x in docling and its dependencies

Issue - State: closed - Opened by karimmila 2 months ago - 3 comments
Labels: bug

#1590 - Install issue with conda forge

Issue - State: open - Opened by bextra 2 months ago - 1 comment
Labels: bug

#1589 - docs: add advanced chunking & serialization example

Pull Request - State: closed - Opened by vagenas 2 months ago - 2 comments

#1587 - feat: Picture description using context with surrounding text

Pull Request - State: open - Opened by rafaeltuelho 2 months ago - 1 comment

#1586 - Failed to convert text that appears in automatic numbering format in .docx

Issue - State: open - Opened by alexshmmy 2 months ago - 4 comments
Labels: bug, docx

#1583 - feat: table enrichments - description and indexing

Pull Request - State: open - Opened by shivanikabu 2 months ago - 6 comments

#1582 - fix: click_ dependency and update lock file

Pull Request - State: closed - Opened by dolfim-ibm 2 months ago - 2 comments

#1581 - Pdf2parquet improvement and optimisation

Issue - State: open - Opened by ShiroYasha18 2 months ago
Labels: enhancement

#1580 - No texts are extracted

Issue - State: closed - Opened by oprince 2 months ago - 1 comment
Labels: bug, html