79682435

Date: 2025-06-27 19:59:33
Score: 0.5
Natty:
Report link

After further investigation I found that spacy-layout release v0.0.11 introduced the ability to pass a DoclingDocument to spaCyLayout.__call__. For reference, the simple way to process a document with Docling and then pass it to spacy-layout would be something like the following:

import spacy
from spacy_layout import spaCyLayout
from docling.document_converter import DocumentConverter

# Setup spaCy pipeline
nlp = spacy.load("en_core_web_sm")
layout = spaCyLayout(nlp)

# Convert a document with Docling
source = "./starcraft.pdf"
converter = DocumentConverter()
docling_result = converter.convert(source)

# Verify Docling conversion to markdown
print(docling_result.document.export_to_markdown())

# Pass Docling document to spacy-layout
doc = layout(docling_result.document)

# Examine spacy-layout spans
for span in doc.spans["layout"]:
    # Document section and token and character offsets into the text
    print(f"{span.label_}: {span.text}")
Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Virtual Architectures