{"id":9578,"library":"chonkie-core","title":"Chonkie Core","description":"Chonkie Core is a high-performance Python library for semantic text chunking, powered by a Rust backend for speed. It offers various strategies including delimiter-based, size-constrained, and Savitzky-Golay filter-based semantic splitting. The library is actively developed, with frequent releases adding new features and optimizations.","status":"active","version":"0.10.1","language":"en","source_language":"en","source_url":"https://github.com/chonkie-inc/chunk","tags":["text-chunking","nlp","rust-bindings","performance","semantic-chunking","tokenization"],"install":[{"cmd":"pip install chonkie-core","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0","wrong":"from memchunk import Chunker","symbol":"Chunker","correct":"from chonkie_core import Chunker"},{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0","wrong":"from memchunk import chunk","symbol":"chunk","correct":"from chonkie_core import chunk"},{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0","wrong":"from memchunk import chunk_offsets","symbol":"chunk_offsets","correct":"from chonkie_core import chunk_offsets"},{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0. Introduced in v0.8.0.","wrong":"from memchunk import merge_splits","symbol":"merge_splits","correct":"from chonkie_core import merge_splits"},{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0. Introduced in v0.6.0.","wrong":"from memchunk import split_at_delimiters","symbol":"split_at_delimiters","correct":"from chonkie_core import split_at_delimiters"},{"note":"Package renamed from 'memchunk' to 'chonkie-core' in v0.5.0. Savitzky-Golay module introduced in v0.9.0.","wrong":"from memchunk import savgol_filter","symbol":"savgol_filter","correct":"from chonkie_core import savgol_filter"}],"quickstart":{"code":"from chonkie_core import Chunker, chunk, chunk_offsets\n\ntext = \"This is the first sentence. This is the second sentence! And this is the third sentence, with a comma. Finally, the last one. Here is some Japanese: これは日本語のテキストです。句読点も含まれます。\"\n\n# Using Chunker class with delimiters and patterns\nprint(\"--- Using Chunker ---\")\nchunks_obj = list(Chunker(text, delimiters=\"\\n.?!\", patterns=[\"。\", \"，\", \"！\"]))\nfor c in chunks_obj:\n    print(f\"'{c.text}' (len: {len(c.text)})\\nOffset range: {c.offset_range}\")\n\n# Using convenience function `chunk`\nprint(\"\\n--- Using chunk function ---\")\nfor c in chunk(text, delimiters=\".\", patterns=[\"。\"]): # The 'chunk' function returns Chunk objects\n    print(f\"'{c.text}' (len: {len(c.text)})\\nOffset range: {c.offset_range}\")\n\n# Getting offsets directly\nprint(\"\\n--- Using chunk_offsets function ---\")\noffsets = chunk_offsets(text, delimiters=\".\", patterns=[\"。\"])\nprint(f\"Offsets: {offsets}\")\n","lang":"python","description":"Demonstrates basic text chunking using the Chunker class for customizable splitting, the `chunk` convenience function, and retrieving character offsets with `chunk_offsets`. It includes examples using both ASCII and multi-byte delimiters/patterns."},"warnings":[{"fix":"Update all import statements from `memchunk` to `chonkie_core` (e.g., `from memchunk import Chunker` becomes `from chonkie_core import Chunker`).","message":"The package and module name was changed from `memchunk` to `chonkie-core` in version `0.5.0`. Existing `import memchunk` statements will fail.","severity":"breaking","affected_versions":">=0.5.0"},{"fix":"Upgrade to `chonkie-core>=0.10.1` to use the `.patterns()` API. For earlier versions, only ASCII delimiters via `.delimiters()` are supported.","message":"The `.patterns()` API for multi-byte delimiter support was introduced for Python in `v0.10.1`. Attempting to use it on earlier versions will result in an `AttributeError`.","severity":"breaking","affected_versions":"<0.10.1"},{"fix":"Ensure the `text` argument passed to `Chunker` or `chunk` is a `str`. If you have byte data, decode it first (e.g., `my_bytes.decode('utf-8')`).","message":"The `chunk` and `Chunker` functions primarily operate on string (`str`) inputs. While they can sometimes handle bytes, unexpected behavior or type errors can occur if non-string/bytes types are passed directly.","severity":"gotcha","affected_versions":"all"},{"fix":"Upgrade to `chonkie-core>=0.9.0` to access the Savitzky-Golay filter module. Although NumPy is not a direct dependency of `chonkie-core` itself, it's highly recommended for optimal performance when using these functions.","message":"The Savitzky-Golay filter module, including `savgol_filter` and related functions for semantic chunking, was introduced in `v0.9.0`. These functions internally leverage NumPy for efficient array operations, providing zero-copy performance.","severity":"gotcha","affected_versions":"<0.9.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Replace all `import memchunk` statements with `import chonkie_core` (or `from chonkie_core import ...`). Ensure you have `chonkie-core` installed via `pip install chonkie-core`.","cause":"The package name was changed from `memchunk` to `chonkie-core` in version `0.5.0`.","error":"ModuleNotFoundError: No module named 'memchunk'"},{"fix":"Upgrade your `chonkie-core` installation to `0.10.1` or newer: `pip install --upgrade chonkie-core`.","cause":"The `.patterns()` method for multi-byte delimiters was added to the Python bindings in `chonkie-core` version `0.10.1`.","error":"AttributeError: 'Chunker' object has no attribute 'patterns'"},{"fix":"Ensure the `text` argument passed to `Chunker` or `chunk` is a string (`str`) type. For example, `chunk(str(my_int_var))`.","cause":"You are passing an integer or another non-string/non-bytes type as the primary text input to a chunking function.","error":"TypeError: argument 'text': 'int' object cannot be interpreted as a string, expected str"}]}