Unstructured Python Client SDK
The `unstructured-client` library provides a Python SDK to interact with the Unstructured API, enabling users to programmatically partition, clean, and extract structured data from various document types (PDFs, images, HTML, Word, etc.) using Unstructured's cloud services. It is actively maintained with frequent updates, often on a weekly or bi-weekly cadence, reflecting ongoing API developments. The current version is 0.43.2.
Warnings
- gotcha The `unstructured-client` library is distinct from the `unstructured` library. `unstructured-client` interacts with the Unstructured Cloud API, while the `unstructured` library performs local document processing. Do not confuse their imports or functionalities.
- breaking API endpoint parameters and request body structures can change between Unstructured API versions, which the client library wraps. This can lead to breaking changes in your code when upgrading `unstructured-client`.
- gotcha Authentication via API Key is mandatory for most Unstructured API endpoints. Failing to set the `UNSTRUCTURED_API_KEY` environment variable or providing an invalid key will result in `401 Unauthorized` or `403 Forbidden` errors.
Install
-
pip install unstructured-client
Imports
- UnstructuredClient
from unstructured_client import UnstructuredClient
- PartitionParameters
from unstructured_client.models.shared import PartitionParameters
- File
from unstructured_client.models.shared import File
Quickstart
import os
from unstructured_client import UnstructuredClient
from unstructured_client.models.shared import PartitionParameters, File
# --- IMPORTANT: Set your API key environment variable ---
# export UNSTRUCTURED_API_KEY="YOUR_API_KEY"
# Get your key from: https://unstructured.io/api-key
s = UnstructuredClient(
api_key_auth=os.environ.get("UNSTRUCTURED_API_KEY", "")
)
# Create a dummy file for demonstration
try:
with open("example.pdf", "w") as f:
f.write("This is a test document with some text.")
except Exception:
pass # Ignore if it exists or fails for simple dummy file
# Example: Partitioning a local file
try:
with open("example.pdf", "rb") as f:
# Prepare the file as a list of File objects for the API
files = [
File(
content=f.read(),
file_name="example.pdf",
mime_type="application/pdf"
)
]
# Call the partition endpoint with parameters
resp = s.general.partition(
partition_parameters=PartitionParameters(
files=files,
strategy="auto", # 'fast', 'hi_res', 'auto'
coordinates=True, # Include bounding box coordinates
output_format="json" # 'json' (default), 'text'
)
)
# Print the extracted elements
print("Successfully partitioned document.")
for element in resp.elements:
print(f"Type: {element.type}, Text: {element.text[:70]}...")
except FileNotFoundError:
print("Please ensure 'example.pdf' exists in the current directory for this example.")
except Exception as e:
print(f"An error occurred during partitioning: {e}")
if "API Key" in str(e):
print("HINT: Ensure your UNSTRUCTURED_API_KEY environment variable is set correctly.")
elif "401" in str(e) or "403" in str(e):
print("HINT: Check your API key for correctness and permissions.")