Aho-Corasick Rust Bindings for Python

1.0.3 · active · verified Thu Apr 16

ahocorasick-rs is a Python library that provides efficient multi-pattern string searching capabilities. It's implemented as a high-performance wrapper around the Rust `aho-corasick` library, offering a significantly faster alternative to pure Python or C-backed `pyahocorasick` for searching many substrings simultaneously. The library is actively maintained, with its latest version being 1.0.3, and typically releases updates as needed for performance improvements or new Python version support.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize an AhoCorasick object with a list of patterns and then use it to find occurrences within a haystack string, returning either the pattern indices and positions or the matched strings themselves. It also includes an example for byte string matching.

import ahocorasick_rs

patterns = ["hello", "world", "fish"]
haystack = "this is my first hello world. hello!"

# Create an AhoCorasick automaton
ac = ahocorasick_rs.AhoCorasick(patterns)

# Find matches and their indexes (pattern_index, start_index, end_index)
matches_by_index = ac.find_matches_as_indexes(haystack)
print(f"Matches by index: {matches_by_index}")
# Expected: [(0, 17, 22), (1, 23, 28), (0, 30, 35)]

# Find matches and return the actual strings
matches_as_strings = ac.find_matches_as_strings(haystack)
print(f"Matches as strings: {matches_as_strings}")
# Expected: ['hello', 'world', 'hello']

# For byte strings
byte_patterns = [b"foo", b"bar"]
byte_haystack = b"this is foo and bar"
byte_ac = ahocorasick_rs.BytesAhoCorasick(byte_patterns)
byte_matches = byte_ac.find_matches_as_indexes(byte_haystack)
print(f"Byte matches: {byte_matches}")

view raw JSON →