Oniguruma CFFI

1.5.0 · active · verified Fri Apr 17

Onigurumacffi provides Python cffi bindings for the Oniguruma regex engine. Currently at version 1.5.0, it wraps the Oniguruma C library, offering a performant, multibyte-aware regex engine that operates primarily on bytes, distinct from Python's built-in `re` module. The library bundles the Oniguruma C source, simplifying installation, and requires Python 3.10 or newer. Releases are infrequent, tied to updates in the underlying Oniguruma library or CFFI.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `OnigurumaRegex` with a byte pattern and encoding, perform `match` and `search` operations on byte strings, and access matched groups. It also includes basic error handling for invalid regex patterns.

from onigurumacffi import OnigurumaRegex, OnigurumaMatch, OnigurumaError

# Create a regex object. Oniguruma supports various encodings, UTF-8 is common.
# Patterns are typically bytes for direct Oniguruma interaction.
regex = OnigurumaRegex(b'^(hello|hi)\\s+(world|there)!$', encoding='utf-8')

# Match a string (also bytes)
text = b'hello world!'
match: OnigurumaMatch | None = regex.match(text)

if match:
    print(f"Match found!")
    print(f"Full match: {match.group(0).decode('utf-8')}")
    print(f"Group 1: {match.group(1).decode('utf-8')}")
    print(f"Group 2: {match.group(2).decode('utf-8')}")
else:
    print(f"No match for '{text.decode('utf-8')}'")

# Example of a search (finding a substring match)
search_text = b'some hello world! text'
search_match: OnigurumaMatch | None = regex.search(search_text, start=0)
if search_match:
    print(f"\nSearch found!")
    print(f"Search match: {search_match.group(0).decode('utf-8')}")

# Error handling
try:
    invalid_regex = OnigurumaRegex(b'[invalid')
except OnigurumaError as e:
    print(f"\nCaught expected error: {e}")

view raw JSON →