{"id":6563,"library":"chunkr-ai","title":"Chunkr AI Python Client","description":"Chunkr AI provides a Python client for its open-source document intelligence platform, offering API services for document layout analysis, OCR, and semantic chunking. It transforms complex documents like PDFs, PPTs, Word files, and images into structured, RAG/LLM-ready data, aiming for high-quality output and improved AI application performance. The current version is 0.3.7, and the project shows active development with regular updates and blog posts on new features and models.","status":"active","version":"0.3.7","language":"en","source_language":"en","source_url":"https://github.com/lumina-ai-inc/chunkr","tags":["document intelligence","OCR","RAG","LLM","chunking","AI"],"install":[{"cmd":"pip install chunkr-ai --pre","lang":"bash","label":"Install pre-release version"}],"dependencies":[{"reason":"Requires Python 3.10 or newer.","package":"python","optional":false}],"imports":[{"symbol":"Chunkr","correct":"from chunkr_ai import Chunkr"},{"symbol":"ChunkProcessing","correct":"from chunkr_ai.models import ChunkProcessing"},{"symbol":"Configuration","correct":"from chunkr_ai.models import Configuration"},{"symbol":"Tokenizer","correct":"from chunkr_ai.models import Tokenizer"}],"quickstart":{"code":"import os\nfrom chunkr_ai import Chunkr\nfrom chunkr_ai.models import ChunkProcessing, Configuration, Tokenizer\n\n# Ensure your Chunkr API key is set as an environment variable CHUNKR_API_KEY\napi_key = os.environ.get('CHUNKR_API_KEY', '')\nif not api_key:\n    print(\"Warning: CHUNKR_API_KEY environment variable not set. The API call will likely fail.\")\n\nchunkr = Chunkr(api_key=api_key)\n\n# Example of processing a document (replace with your document URL or file path)\n# This example uses default chunking strategies.\ntry:\n    task = chunkr.parse_document(file_url=\"https://example.com/document.pdf\")\n    print(f\"Document processing task submitted with ID: {task.task_id}\")\n\n    # You can poll for the task status or set up webhooks\n    # For a simple quickstart, we'll just acknowledge submission.\n    print(\"Check Chunkr AI dashboard or use get_task_output for results.\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n","lang":"python","description":"This quickstart demonstrates how to initialize the Chunkr client and submit a document for processing. It assumes you have an API key set as an environment variable. After submission, you can monitor the task status or retrieve the output through the Chunkr AI dashboard or further API calls."},"warnings":[{"fix":"Always install with `pip install chunkr-ai --pre`. Consult the official documentation for the latest installation instructions and API stability updates.","message":"The Python SDK is currently in alpha and requires the `--pre` flag for installation. This indicates that the API might be subject to changes before a stable release.","severity":"gotcha","affected_versions":"0.3.7 and earlier pre-release versions"},{"fix":"Carefully consider your use case. For production workloads and higher performance, the Cloud API is recommended. Ensure you are using the correct client and configuration for the chosen platform. If self-hosting, be aware of the capabilities and limitations of the open-source models.","message":"There are two distinct versions: an open-source AGPL self-hosted version and a fully managed Cloud API. They use different underlying models (community/open-source vs. proprietary in-house), leading to differences in accuracy, speed, and available features (e.g., Excel support is Cloud API exclusive).","severity":"breaking","affected_versions":"All versions"},{"fix":"Obtain an API key from your Chunkr AI dashboard after creating an account. Set it as an environment variable (e.g., `CHUNKR_API_KEY`) and pass it to the `Chunkr` client upon initialization.","message":"API key is required for authentication with the Chunkr AI Cloud API. Failing to provide a valid key will result in authentication errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review Chunkr's documentation on custom chunking strategies, VLM processing, and configuration options. Experiment with `ChunkProcessing`, `Configuration`, and `Tokenizer` parameters to optimize chunk size and content for your specific use case and LLM.","message":"Suboptimal chunking strategies can lead to increased AI costs, reduced retrieval accuracy, and inconsistent LLM responses. While Chunkr aims for intelligent chunking, users should be aware of how different strategies impact their RAG systems.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}