After further investigation I found that spacy-layout release v0.0.11 introduced the ability to pass a DoclingDocument
to spaCyLayout.__call__
. For reference, the simple way to process a document with Docling and then pass it to spacy-layout would be something like the following:
import spacy
from spacy_layout import spaCyLayout
from docling.document_converter import DocumentConverter
# Setup spaCy pipeline
nlp = spacy.load("en_core_web_sm")
layout = spaCyLayout(nlp)
# Convert a document with Docling
source = "./starcraft.pdf"
converter = DocumentConverter()
docling_result = converter.convert(source)
# Verify Docling conversion to markdown
print(docling_result.document.export_to_markdown())
# Pass Docling document to spacy-layout
doc = layout(docling_result.document)
# Examine spacy-layout spans
for span in doc.spans["layout"]:
# Document section and token and character offsets into the text
print(f"{span.label_}: {span.text}")