{"id":4836,"library":"unstructured-inference","title":"Unstructured Inference","description":"unstructured-inference provides the core model inference code for layout parsing models used in the Unstructured.IO ecosystem. It enables the extraction of structured content from diverse unstructured documents like PDFs and images, supporting various detection models such as Detectron2 and YOLOX. The library is actively maintained with frequent releases, with the current version being 1.6.6.","status":"active","version":"1.6.6","language":"en","source_language":"en","source_url":"https://github.com/Unstructured-IO/unstructured-inference","tags":["NLP","OCR","Document Processing","Machine Learning Inference","Layout Analysis","PDF Processing"],"install":[{"cmd":"pip install unstructured-inference","lang":"bash","label":"Basic Installation"},{"cmd":"pip install 'git+https://github.com/facebookresearch/detectron2.git@57bdb21249d5418c130d54e2ebdc94dda7a4c01a'","lang":"bash","label":"Detectron2 (for layoutparser models on Linux/macOS)"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.12, <3.13"},{"reason":"Required for using models from the layoutparser model zoo. Not automatically installed and has complex installation, especially on Windows.","package":"detectron2","optional":true}],"imports":[{"symbol":"DocumentLayout","correct":"from unstructured_inference.inference.layout import DocumentLayout"},{"symbol":"get_model","correct":"from unstructured_inference.models.base import get_model"}],"quickstart":{"code":"import os\nimport tempfile\n\n# Create a dummy PDF file for demonstration\n# In a real scenario, you would provide the path to your actual PDF.\nwith tempfile.NamedTemporaryFile(suffix=\".pdf\", delete=False) as temp_pdf:\n    temp_pdf_path = temp_pdf.name\n    temp_pdf.write(b\"%PDF-1.4\\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj 3 0 obj<</Type/Page/Parent 2 0 R/MediaBox[0 0 612 792]/Contents 4 0 R>>endobj 4 0 obj<</Length 41>>stream\\nBT /F1 24 Tf 100 700 Td (Hello Unstructured!) Tj ET\\nendstream\\nendobj\\nxref\\n0 5\\n0000000000 65535 f\\n0000000009 00000 n\\n0000000055 00000 n\\n0000000108 00000 n\\n0000000201 00000 n\\ntrailer<</Size 5/Root 1 0 R>>startxref\\n294\\n%%EOF\")\n\nfrom unstructured_inference.inference.layout import DocumentLayout\n\ntry:\n    # Perform layout parsing on the document\n    # For real use, replace temp_pdf_path with your PDF file path.\n    layout = DocumentLayout.from_file(temp_pdf_path)\n\n    print(f\"Found {len(layout.pages)} page(s) in the document.\")\n    for i, page in enumerate(layout.pages):\n        print(f\"--- Page {i+1} ---\")\n        for element in page.elements:\n            print(f\"Element Type: {element.type}, Text: {element.text[:50]}...\")\n            # You can also access bounding box, model name, etc.\n            # print(f\"  Bounding Box: {element.bbox}, Model: {element.detectron_model_name}\")\nfinally:\n    # Clean up the dummy PDF file\n    os.remove(temp_pdf_path)","lang":"python","description":"This quickstart demonstrates how to load a PDF document and extract its layout elements using the default inference model. It creates a dummy PDF for immediate execution. In a real application, you would replace `temp_pdf_path` with the path to your actual PDF file. The output includes the detected element types and their truncated text content."},"warnings":[{"fix":"Refer to the Unstructured-IO documentation or Detectron2's installation guide for specific instructions. For macOS/Linux, `pip install 'git+https://github.com/facebookresearch/detectron2.git@57bdb21249d5418c130d54e2ebdc94dda7a4c01a'` is often required. Windows users may need to find community-supported workarounds or use WSL.","message":"Detectron2 is a crucial dependency for using many layout parsing models within unstructured-inference, particularly those from the layoutparser model zoo. It is NOT automatically installed with `pip install unstructured-inference` and its installation can be complex, especially on Windows, where it's not officially supported. Users on macOS/Linux may need to build it from source.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your environment is running Python 3.12. Consider using virtual environments (like `venv` or `uv`) to manage specific Python versions for your projects.","message":"The library has a strict Python version requirement, currently supporting Python 3.12 only. Using other Python versions will lead to installation or runtime errors.","severity":"breaking","affected_versions":"1.6.0+"},{"fix":"When updating `unstructured`, ensure `unstructured-inference` is also updated to a compatible version. Using `pip install \"unstructured[all-docs]\"` (if using `unstructured`) often helps ensure dependent packages are aligned.","message":"When `unstructured-inference` is used in conjunction with the main `unstructured` library, it's crucial to keep both packages synchronized to avoid unexpected behavior or errors, as `unstructured-inference` provides the underlying model capabilities for `unstructured`'s partitioning bricks.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If experiencing issues with table extraction, consult the GitHub issues for `unstructured-inference` for potential workarounds or targeted fixes. Downgrading to a previous version known to work with table extraction might be a temporary solution, but use with caution regarding other potential regressions.","message":"There have been reports of issues with table extraction functionality in recent versions of `unstructured-inference`, where the latest versions may not extract tables as effectively as older versions.","severity":"gotcha","affected_versions":"1.6.x (as of current observation)"},{"fix":"Update calls from `partition(..., model_name='yolox')` to `partition(..., hi_res_model_name='yolox')`.","message":"When using `unstructured`'s `partition` function with `strategy='hi_res'` (which utilizes `unstructured-inference` models), the `model_name` parameter is deprecated. Users should now use `hi_res_model_name` instead.","severity":"deprecated","affected_versions":"When used via `unstructured` library (unstructured 0.12.x+)"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}