{"id":7354,"library":"langchain-unstructured","title":"LangChain Unstructured Integration","description":"langchain-unstructured is an integration package connecting the LangChain framework with Unstructured, a library for parsing and processing unstructured documents. It provides document loaders to extract text and metadata from various file types (PDFs, images, HTML, etc.) for use in LangChain applications. The current version is 1.0.1, with a release cadence that has recently seen a rapid transition to 1.x and subsequent minor updates.","status":"active","version":"1.0.1","language":"en","source_language":"en","source_url":"https://github.com/langchain-ai/langchain-unstructured","tags":["langchain","unstructured","document-processing","llm-tools","document-loaders"],"install":[{"cmd":"pip install langchain-unstructured unstructured","lang":"bash","label":"Basic Installation"},{"cmd":"pip install \"langchain-unstructured[pdf]\" \"unstructured[pdf]\"","lang":"bash","label":"With PDF Support"}],"dependencies":[{"reason":"Core dependency for local document processing. Requires various optional dependencies for different file types (e.g., 'unstructured[pdf]').","package":"unstructured","optional":false},{"reason":"Fundamental LangChain dependency, crucial for loader functionality.","package":"langchain-core","optional":false}],"imports":[{"symbol":"UnstructuredFileLoader","correct":"from langchain_unstructured.document_loaders import UnstructuredFileLoader"},{"note":"As of v1.x, these loaders are part of the `langchain_unstructured` package, not `langchain` directly.","wrong":"from langchain.document_loaders import UnstructuredAPIFileLoader","symbol":"UnstructuredAPIFileLoader","correct":"from langchain_unstructured.document_loaders import UnstructuredAPIFileLoader"}],"quickstart":{"code":"import os\nfrom langchain_unstructured.document_loaders import UnstructuredFileLoader\n\n# Create a dummy file for demonstration\nwith open(\"example.txt\", \"w\") as f:\n    f.write(\"This is a test document.\\n\")\n    f.write(\"It contains some sample text to be loaded.\")\n\n# Instantiate the loader for a local file\nloader = UnstructuredFileLoader(\"example.txt\")\n\n# Load the document(s)\ndocs = loader.load()\n\n# Print the content of the first loaded document\nif docs:\n    print(f\"Loaded {len(docs)} document(s).\")\n    print(f\"Page content: {docs[0].page_content[:50]}...\")\n    print(f\"Metadata: {docs[0].metadata}\")\n\n# Clean up the dummy file\nos.remove(\"example.txt\")","lang":"python","description":"This quickstart demonstrates how to use `UnstructuredFileLoader` to load text from a local file. For more complex file types like PDFs or images, ensure you have the necessary `unstructured` extra dependencies installed (e.g., `pip install \"unstructured[pdf]\"`). For `UnstructuredAPIFileLoader`, ensure the `UNSTRUCTURED_API_KEY` environment variable is set."},"warnings":[{"fix":"Update imports from `from langchain.document_loaders import ...` to `from langchain_unstructured.document_loaders import ...` and ensure `langchain-unstructured` is installed.","message":"Migration from `langchain` document loaders to `langchain-unstructured` package.","severity":"breaking","affected_versions":"0.x to 1.x"},{"fix":"Ensure `langchain-core` is updated to a compatible version (e.g., `pip install --upgrade langchain-core`) to avoid dependency conflicts or unexpected behavior.","message":"Upgrade of `langchain-core` dependency version in `langchain-unstructured` v1.0.0.","severity":"breaking","affected_versions":"1.0.0+"},{"fix":"For parsing PDFs, install `unstructured[pdf]` (e.g., `pip install \"unstructured[pdf]\"`). For images, `unstructured[image]` (which requires Tesseract OCR). Check the Unstructured documentation for specific requirements.","message":"Unstructured requires additional dependencies for specific file types.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set the `UNSTRUCTURED_API_KEY` environment variable before instantiating `UnstructuredAPIFileLoader`, or pass it directly via `unstructured_api_key` parameter.","message":"Using `UnstructuredAPIFileLoader` requires an API key.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install langchain-unstructured`.","cause":"The `langchain-unstructured` package is not installed.","error":"ModuleNotFoundError: No module named 'langchain_unstructured'"},{"fix":"Run `pip install unstructured`. For specific file types, consider `pip install \"unstructured[pdf]\"` or other extras.","cause":"The core `unstructured` library, a dependency, is missing.","error":"ModuleNotFoundError: No module named 'unstructured'"},{"fix":"Change the import to `from langchain_unstructured.document_loaders import UnstructuredFileLoader`.","cause":"Attempting to import `UnstructuredFileLoader` from the old `langchain` package path.","error":"ImportError: cannot import name 'UnstructuredFileLoader' from 'langchain.document_loaders'"},{"fix":"Set the environment variable: `export UNSTRUCTURED_API_KEY='your_api_key'` or pass it directly: `UnstructuredAPIFileLoader(..., unstructured_api_key='your_api_key')`.","cause":"When using `UnstructuredAPIFileLoader`, the `UNSTRUCTURED_API_KEY` environment variable is not set and no API key was provided in the constructor.","error":"ValueError: Unstructured API key not provided."}]}