LangChain Docling

2.0.0 verified Fri May 01 auth: no python

Integrates Docling document conversion capabilities into LangChain, enabling loading and chunking of documents (PDF, DOCX, PPTX, images) with native Deep Search or hybrid chunking. Current version is 2.0.0, requiring Python >=3.10 and <4. The library is actively maintained with regular releases.

pip install langchain-docling

Common errors

error ImportError: cannot import name 'DoclingLoader' from 'langchain_docling' ↓

cause Symbol is not exposed at package root; must import from submodule.

fix

Use 'from langchain_docling.loader import DoclingLoader'.

error AttributeError: 'DoclingLoader' object has no attribute 'load' ↓

cause DoclingLoader might have been instantiated incorrectly or version mismatch.

fix

Ensure you are using the correct import path and version 2.0.0+. Call loader.load() after proper instantiation.

error TypeError: DoclingLoader.__init__() got an unexpected keyword argument 'converter' ↓

cause The 'converter' argument was renamed or removed in older versions (pre-2.0.0).

fix

Upgrade to langchain-docling >=2.0.0 and pass converter as keyword argument.

error ValueError: Document must have page_content to be chunked ↓

cause Attempting to chunk a document that is not produced by Docling's converter.

fix

Ensure documents are created by DoclingLoader or DoclingDocumentConverter.

Warnings

breaking In version 2.0.0, the import paths changed. Previously, symbols were importable directly from 'langchain_docling'; now they are in submodules (loader, chunking, converter). Update all imports accordingly. ↓

fix Use 'from langchain_docling.loader import DoclingLoader' instead of 'from langchain_docling import DoclingLoader'.

breaking DoclingLoader no longer accepts a 'converter' argument as positional; it must be passed as a keyword argument or use the default converter. ↓

fix Pass converter as keyword: DoclingLoader(file_path='doc.pdf', converter=my_converter).

breaking The 'mode' parameter in DoclingLoader has been removed. Use the converter's pipeline options instead. ↓

fix Configure pipeline settings via DoclingDocumentConverter, e.g., converter = DoclingDocumentConverter(pipeline_options=...).

gotcha DoclingChunker requires the document objects to have a 'page_content' attribute. Ensure you use the docling-converted documents, not raw LangChain documents. ↓

fix Always use DoclingLoader or DoclingDocumentConverter to generate documents for chunking.

deprecated The 'DoclingImageLoader' has been deprecated in favor of using DoclingLoader with an image pipeline. It may be removed in a future version. ↓

fix Migrate to DoclingLoader with ImagePipelineOptions.

Imports

DoclingLoader
wrong
```
from langchain_docling import DoclingLoader
```
correct
```
from langchain_docling.loader import DoclingLoader
```
In v2.0.0 the canonical import path is from the loader submodule; direct import from package root may not work.

DoclingChunker

wrong

from langchain_docling.chunker import DoclingChunker

correct

from langchain_docling.chunking import DoclingChunker

Submodule is named 'chunking' (with 'ing'), not 'chunker'. This is a common mistake.

DoclingDocumentConverter

from langchain_docling.converter import DoclingDocumentConverter

Quickstart

Load a PDF document with DoclingLoader and split into chunks using DoclingChunker.

from langchain_docling.loader import DoclingLoader
from langchain_docling.chunking import DoclingChunker
from langchain_docling.converter import DoclingDocumentConverter
from langchain_core.documents import Document

# Initialize converter with desired pipeline options
converter = DoclingDocumentConverter()

# Example: loading a document from a file path
loader = DoclingLoader(file_path="example.pdf", converter=converter)
docs = loader.load()
print(docs[0].page_content[:200])

# Example: chunking documents
chunker = DoclingChunker(chunk_size=512, chunk_overlap=50)
chunks = chunker.split_documents(docs)
print(f"Number of chunks: {len(chunks)}")