Unstructured Python Client SDK

0.43.2 · active · verified Thu Apr 09

The `unstructured-client` library provides a Python SDK to interact with the Unstructured API, enabling users to programmatically partition, clean, and extract structured data from various document types (PDFs, images, HTML, Word, etc.) using Unstructured's cloud services. It is actively maintained with frequent updates, often on a weekly or bi-weekly cadence, reflecting ongoing API developments. The current version is 0.43.2.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `UnstructuredClient` and partition a local PDF file using the `general.partition` endpoint. It highlights the use of `PartitionParameters` and `File` objects for robust API interaction and emphasizes environment variable-based API key management.

import os
from unstructured_client import UnstructuredClient
from unstructured_client.models.shared import PartitionParameters, File

# --- IMPORTANT: Set your API key environment variable ---
# export UNSTRUCTURED_API_KEY="YOUR_API_KEY"
# Get your key from: https://unstructured.io/api-key

s = UnstructuredClient(
    api_key_auth=os.environ.get("UNSTRUCTURED_API_KEY", "")
)

# Create a dummy file for demonstration
try:
    with open("example.pdf", "w") as f:
        f.write("This is a test document with some text.")
except Exception:
    pass # Ignore if it exists or fails for simple dummy file

# Example: Partitioning a local file
try:
    with open("example.pdf", "rb") as f:
        # Prepare the file as a list of File objects for the API
        files = [
            File(
                content=f.read(),
                file_name="example.pdf",
                mime_type="application/pdf"
            )
        ]
        
        # Call the partition endpoint with parameters
        resp = s.general.partition(
            partition_parameters=PartitionParameters(
                files=files,
                strategy="auto", # 'fast', 'hi_res', 'auto'
                coordinates=True, # Include bounding box coordinates
                output_format="json" # 'json' (default), 'text'
            )
        )
        
        # Print the extracted elements
        print("Successfully partitioned document.")
        for element in resp.elements:
            print(f"Type: {element.type}, Text: {element.text[:70]}...")

except FileNotFoundError:
    print("Please ensure 'example.pdf' exists in the current directory for this example.")
except Exception as e:
    print(f"An error occurred during partitioning: {e}")
    if "API Key" in str(e):
        print("HINT: Ensure your UNSTRUCTURED_API_KEY environment variable is set correctly.")
    elif "401" in str(e) or "403" in str(e):
        print("HINT: Check your API key for correctness and permissions.")

view raw JSON →