Chunkr AI Python Client

0.3.7 · active · verified Wed Apr 15

Chunkr AI provides a Python client for its open-source document intelligence platform, offering API services for document layout analysis, OCR, and semantic chunking. It transforms complex documents like PDFs, PPTs, Word files, and images into structured, RAG/LLM-ready data, aiming for high-quality output and improved AI application performance. The current version is 0.3.7, and the project shows active development with regular updates and blog posts on new features and models.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Chunkr client and submit a document for processing. It assumes you have an API key set as an environment variable. After submission, you can monitor the task status or retrieve the output through the Chunkr AI dashboard or further API calls.

import os
from chunkr_ai import Chunkr
from chunkr_ai.models import ChunkProcessing, Configuration, Tokenizer

# Ensure your Chunkr API key is set as an environment variable CHUNKR_API_KEY
api_key = os.environ.get('CHUNKR_API_KEY', '')
if not api_key:
    print("Warning: CHUNKR_API_KEY environment variable not set. The API call will likely fail.")

chunkr = Chunkr(api_key=api_key)

# Example of processing a document (replace with your document URL or file path)
# This example uses default chunking strategies.
try:
    task = chunkr.parse_document(file_url="https://example.com/document.pdf")
    print(f"Document processing task submitted with ID: {task.task_id}")

    # You can poll for the task status or set up webhooks
    # For a simple quickstart, we'll just acknowledge submission.
    print("Check Chunkr AI dashboard or use get_task_output for results.")

except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →