{"id":5875,"library":"chonkie","title":"Chonkie","description":"Chonkie is a no-nonsense Python library for text chunking, offering various strategies including recursive, semantic, and AI-powered chunkers. It also supports advanced features like HTML table processing and visualization. Chonkie is actively maintained with frequent minor releases and bug fixes, with the current version being 1.6.2.","status":"active","version":"1.6.2","language":"en","source_language":"en","source_url":"https://github.com/chonkie-inc/chonkie","tags":["chunking","text processing","NLP","RAG","LLM"],"install":[{"cmd":"pip install chonkie","lang":"bash","label":"Core library"},{"cmd":"pip install 'chonkie[all]'","lang":"bash","label":"With all optional dependencies (e.g., LLMs, embedding)"},{"cmd":"pip install 'chonkie[llm]'","lang":"bash","label":"With LLM-related dependencies"}],"dependencies":[{"reason":"Used for HTTP requests, especially by some AI-powered chunkers.","package":"httpx","optional":true},{"reason":"Used for faster JSON processing.","package":"orjson","optional":true},{"reason":"Required for OpenAI-based chunkers.","package":"openai","optional":true},{"reason":"Required for LangChain-based chunkers.","package":"langchain","optional":true}],"imports":[{"symbol":"RecursiveChunker","correct":"from chonkie import RecursiveChunker"},{"note":"New in v1.6.2, requires `teraflopai` dependency (part of `[llm]` or `[all]` extras) and an API key.","symbol":"TeraflopAIChunker","correct":"from chonkie import TeraflopAIChunker"},{"note":"Allows visualizing chunking results.","symbol":"Visualizer","correct":"from chonkie import Visualizer"},{"symbol":"FastChunker","correct":"from chonkie import FastChunker"},{"symbol":"LateChunker","correct":"from chonkie import LateChunker"}],"quickstart":{"code":"import os\nfrom chonkie import RecursiveChunker\n\n# Instantiate a chunker. RecursiveChunker is a common choice.\nchunker = RecursiveChunker(chunk_size=500, chunk_overlap=50)\n\ntext = (\n    \"Chonkie is a highly efficient and flexible text chunking library in Python. \"\n    \"It provides various strategies for breaking down long documents into smaller, \"\n    \"manageable chunks, which is crucial for many NLP applications like RAG. \"\n    \"The library supports different chunking methods, including recursive, semantic, \"\n    \"and AI-driven approaches, and can handle various input formats like raw text and HTML. \"\n    \"Recent versions have introduced features like HTML table support and CLI tools.\" \n)\n\n# Chunk the text\nchunks = chunker.chunk(text)\n\nprint(f\"Original text length: {len(text)} characters\")\nprint(f\"Number of chunks: {len(chunks)}\")\nfor i, chunk in enumerate(chunks):\n    print(f\"Chunk {i+1} (length {len(chunk)}): {chunk[:100]}...\")\n\n# Example with TeraflopAIChunker (requires API key and 'llm' extra)\n# from chonkie import TeraflopAIChunker\n# teraflop_api_key = os.environ.get('TERAFLOPAI_API_KEY', 'YOUR_TERAFLOPAI_API_KEY')\n# if teraflop_api_key != 'YOUR_TERAFLOPAI_API_KEY':\n#     try:\n#         ai_chunker = TeraflopAIChunker(api_key=teraflop_api_key)\n#         ai_chunks = ai_chunker.chunk(text)\n#         print(f\"\\nAI Chunker chunks: {len(ai_chunks)}\")\n#     except Exception as e:\n#         print(f\"Could not use TeraflopAIChunker: {e}\")\n","lang":"python","description":"This quickstart demonstrates how to use the `RecursiveChunker` to break down a sample text into smaller pieces. It's a common and flexible chunking strategy. An commented-out example for `TeraflopAIChunker` is also included, highlighting the need for an API key and optional dependencies."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or higher (e.g., `python -m pip install --upgrade python` or use a new virtual environment).","message":"Chonkie v1.5.0 dropped support for Python 3.9. Users on Python 3.9 must upgrade their Python version to 3.10 or newer to use Chonkie v1.5.0 or later.","severity":"breaking","affected_versions":">=1.5.0"},{"fix":"Install Chonkie with the relevant extras, for example `pip install 'chonkie[llm]'` for LLM-related features, or `pip install 'chonkie[all]'` for all extras.","message":"Many advanced chunkers (e.g., OpenAI, TeraflopAI, LangChain-based) and visualization tools have optional dependencies that are not installed with `pip install chonkie`. Attempting to use these features without the correct dependencies will result in `ModuleNotFoundError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to Chonkie v1.5.5 or newer, which includes a fix for lazy imports, or ensure `openai` is installed if using older versions.","message":"Before v1.5.5, importing the `chonkie` library could fail with a `ModuleNotFoundError` if `openai` was not installed, even if you did not intend to use OpenAI-specific features. This was due to non-lazy imports.","severity":"gotcha","affected_versions":"<1.5.5"},{"fix":"Obtain an API key from TeraflopAI and provide it during chunker initialization (e.g., `TeraflopAIChunker(api_key=\"your_key\")`).","message":"The `TeraflopAIChunker` (introduced in v1.6.2) requires an API key for its service. Without a valid API key, initialization or chunking attempts will fail.","severity":"gotcha","affected_versions":">=1.6.2"},{"fix":"Ensure your environment has Rust toolchain if building from source. For most users installing via `pip`, pre-built wheels should handle this transparently.","message":"Chonkie migrated its performance-critical components from Cython to Rust in v1.5.4. While this is largely an internal change, it might affect build environments or specific performance characteristics for advanced users compiling from source.","severity":"gotcha","affected_versions":">=1.5.4"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}