Unstructured

library 0.22.18 ·python

✓ verified Jun 28, 2026

Unstructured is an open-source Python library designed to simplify the ingestion and preprocessing of diverse unstructured data formats, including PDFs, HTML, Word documents, and images. It provides modular functions for partitioning, cleaning, and staging data, primarily optimizing data workflows for Large Language Models (LLMs). The library is actively maintained with frequent releases, currently at version 0.22.18.

Traffic · last 30 days ↑200% vs prev 7d · indexed Thu Apr 09 · updated Sat Jul 11

total hits 15

actors 4 distinct systems

last hit 11h ago AhrefsBot

GPTBot

ByteDance

Search engines

Humans

top countries 🇺🇸 United States · 🇸🇬 Singapore · 🇨🇦 Canada

Resources

githubgithub.com/Unstructured-IO/unstructured ↗

packagepypi.org/project/unstructured/ ↗

homepageunstructured.io ↗

API endpoints

full doc /v1/registry/unstructured

install /v1/registry/unstructured/install

compatibility /v1/registry/unstructured/compatibility