LangChain Docling
raw JSON → 2.0.0 verified Fri May 01 auth: no python
Integrates Docling document conversion capabilities into LangChain, enabling loading and chunking of documents (PDF, DOCX, PPTX, images) with native Deep Search or hybrid chunking. Current version is 2.0.0, requiring Python >=3.10 and <4. The library is actively maintained with regular releases.
pip install langchain-docling Common errors
error ImportError: cannot import name 'DoclingLoader' from 'langchain_docling' ↓
cause Symbol is not exposed at package root; must import from submodule.
fix
Use 'from langchain_docling.loader import DoclingLoader'.
error AttributeError: 'DoclingLoader' object has no attribute 'load' ↓
cause DoclingLoader might have been instantiated incorrectly or version mismatch.
fix
Ensure you are using the correct import path and version 2.0.0+. Call loader.load() after proper instantiation.
error TypeError: DoclingLoader.__init__() got an unexpected keyword argument 'converter' ↓
cause The 'converter' argument was renamed or removed in older versions (pre-2.0.0).
fix
Upgrade to langchain-docling >=2.0.0 and pass converter as keyword argument.
error ValueError: Document must have page_content to be chunked ↓
cause Attempting to chunk a document that is not produced by Docling's converter.
fix
Ensure documents are created by DoclingLoader or DoclingDocumentConverter.
Warnings
breaking In version 2.0.0, the import paths changed. Previously, symbols were importable directly from 'langchain_docling'; now they are in submodules (loader, chunking, converter). Update all imports accordingly. ↓
fix Use 'from langchain_docling.loader import DoclingLoader' instead of 'from langchain_docling import DoclingLoader'.
breaking DoclingLoader no longer accepts a 'converter' argument as positional; it must be passed as a keyword argument or use the default converter. ↓
fix Pass converter as keyword: DoclingLoader(file_path='doc.pdf', converter=my_converter).
breaking The 'mode' parameter in DoclingLoader has been removed. Use the converter's pipeline options instead. ↓
fix Configure pipeline settings via DoclingDocumentConverter, e.g., converter = DoclingDocumentConverter(pipeline_options=...).
gotcha DoclingChunker requires the document objects to have a 'page_content' attribute. Ensure you use the docling-converted documents, not raw LangChain documents. ↓
fix Always use DoclingLoader or DoclingDocumentConverter to generate documents for chunking.
deprecated The 'DoclingImageLoader' has been deprecated in favor of using DoclingLoader with an image pipeline. It may be removed in a future version. ↓
fix Migrate to DoclingLoader with ImagePipelineOptions.
Imports
- DoclingLoader wrong
from langchain_docling import DoclingLoadercorrectfrom langchain_docling.loader import DoclingLoader - DoclingChunker wrong
from langchain_docling.chunker import DoclingChunkercorrectfrom langchain_docling.chunking import DoclingChunker - DoclingDocumentConverter
from langchain_docling.converter import DoclingDocumentConverter
Quickstart
from langchain_docling.loader import DoclingLoader
from langchain_docling.chunking import DoclingChunker
from langchain_docling.converter import DoclingDocumentConverter
from langchain_core.documents import Document
# Initialize converter with desired pipeline options
converter = DoclingDocumentConverter()
# Example: loading a document from a file path
loader = DoclingLoader(file_path="example.pdf", converter=converter)
docs = loader.load()
print(docs[0].page_content[:200])
# Example: chunking documents
chunker = DoclingChunker(chunk_size=512, chunk_overlap=50)
chunks = chunker.split_documents(docs)
print(f"Number of chunks: {len(chunks)}")