Oniguruma CFFI
Onigurumacffi provides Python cffi bindings for the Oniguruma regex engine. Currently at version 1.5.0, it wraps the Oniguruma C library, offering a performant, multibyte-aware regex engine that operates primarily on bytes, distinct from Python's built-in `re` module. The library bundles the Oniguruma C source, simplifying installation, and requires Python 3.10 or newer. Releases are infrequent, tied to updates in the underlying Oniguruma library or CFFI.
Common errors
-
ModuleNotFoundError: No module named 'onigurumacffi'
cause The `onigurumacffi` package is not installed in the current Python environment.fixRun `pip install onigurumacffi` to install the library. -
onigurumacffi.OnigurumaError: (syntax error) ...
cause The regular expression pattern provided is syntactically invalid according to the Oniguruma engine's rules.fixReview and correct your regex pattern, consulting Oniguruma's documentation for proper syntax. Common issues include unescaped special characters or malformed groups. -
TypeError: argument 'pattern' must be bytes, not str
cause You are passing a Python `str` object to `OnigurumaRegex` constructor or a match method, but it expects a `bytes` object.fixEncode your string pattern or input text to bytes. For example, change `'your_pattern'` to `b'your_pattern'` or `'your_string'.encode('utf-8')`.
Warnings
- gotcha Onigurumacffi uses the Oniguruma regex engine, which has different syntax and behavior compared to Python's built-in `re` module. Features, character classes, and performance characteristics will differ.
- gotcha The `onigurumacffi` library statically links against a specific version of the Oniguruma C library (e.g., 6.9.8). If you need features or bug fixes from a newer Oniguruma C version, you might need to wait for a new `onigurumacffi` release or compile it yourself.
- gotcha Oniguruma primarily operates on bytes, not Python strings. Patterns and input text passed to `OnigurumaRegex` methods must be `bytes` objects. While an `encoding` parameter is provided, direct byte handling is often necessary.
Install
-
pip install onigurumacffi
Imports
- OnigurumaRegex
from onigurumacffi import OnigurumaRegex
- OnigurumaMatch
from onigurumacffi import OnigurumaMatch
- OnigurumaError
from onigurumacffi import OnigurumaError
Quickstart
from onigurumacffi import OnigurumaRegex, OnigurumaMatch, OnigurumaError
# Create a regex object. Oniguruma supports various encodings, UTF-8 is common.
# Patterns are typically bytes for direct Oniguruma interaction.
regex = OnigurumaRegex(b'^(hello|hi)\\s+(world|there)!$', encoding='utf-8')
# Match a string (also bytes)
text = b'hello world!'
match: OnigurumaMatch | None = regex.match(text)
if match:
print(f"Match found!")
print(f"Full match: {match.group(0).decode('utf-8')}")
print(f"Group 1: {match.group(1).decode('utf-8')}")
print(f"Group 2: {match.group(2).decode('utf-8')}")
else:
print(f"No match for '{text.decode('utf-8')}'")
# Example of a search (finding a substring match)
search_text = b'some hello world! text'
search_match: OnigurumaMatch | None = regex.search(search_text, start=0)
if search_match:
print(f"\nSearch found!")
print(f"Search match: {search_match.group(0).decode('utf-8')}")
# Error handling
try:
invalid_regex = OnigurumaRegex(b'[invalid')
except OnigurumaError as e:
print(f"\nCaught expected error: {e}")