{"id":6959,"library":"ahocorasick-rs","title":"Aho-Corasick Rust Bindings for Python","description":"ahocorasick-rs is a Python library that provides efficient multi-pattern string searching capabilities. It's implemented as a high-performance wrapper around the Rust `aho-corasick` library, offering a significantly faster alternative to pure Python or C-backed `pyahocorasick` for searching many substrings simultaneously. The library is actively maintained, with its latest version being 1.0.3, and typically releases updates as needed for performance improvements or new Python version support.","status":"active","version":"1.0.3","language":"en","source_language":"en","source_url":"https://github.com/G-Research/ahocorasick_rs","tags":["string-matching","aho-corasick","performance","rust-bindings","multi-pattern","text-processing"],"install":[{"cmd":"pip install ahocorasick-rs","lang":"bash","label":"Install latest stable version"}],"dependencies":[],"imports":[{"symbol":"AhoCorasick","correct":"from ahocorasick_rs import AhoCorasick"},{"note":"For searching byte strings instead of Unicode strings.","symbol":"BytesAhoCorasick","correct":"from ahocorasick_rs import BytesAhoCorasick"}],"quickstart":{"code":"import ahocorasick_rs\n\npatterns = [\"hello\", \"world\", \"fish\"]\nhaystack = \"this is my first hello world. hello!\"\n\n# Create an AhoCorasick automaton\nac = ahocorasick_rs.AhoCorasick(patterns)\n\n# Find matches and their indexes (pattern_index, start_index, end_index)\nmatches_by_index = ac.find_matches_as_indexes(haystack)\nprint(f\"Matches by index: {matches_by_index}\")\n# Expected: [(0, 17, 22), (1, 23, 28), (0, 30, 35)]\n\n# Find matches and return the actual strings\nmatches_as_strings = ac.find_matches_as_strings(haystack)\nprint(f\"Matches as strings: {matches_as_strings}\")\n# Expected: ['hello', 'world', 'hello']\n\n# For byte strings\nbyte_patterns = [b\"foo\", b\"bar\"]\nbyte_haystack = b\"this is foo and bar\"\nbyte_ac = ahocorasick_rs.BytesAhoCorasick(byte_patterns)\nbyte_matches = byte_ac.find_matches_as_indexes(byte_haystack)\nprint(f\"Byte matches: {byte_matches}\")","lang":"python","description":"This quickstart demonstrates how to initialize an AhoCorasick object with a list of patterns and then use it to find occurrences within a haystack string, returning either the pattern indices and positions or the matched strings themselves. It also includes an example for byte string matching."},"warnings":[{"fix":"Upgrade to version 1.0.0 or later for guaranteed API stability. Review changelog for specific changes if migrating from pre-1.0.0 versions.","message":"Prior to version 1.0.0, the API of ahocorasick-rs was not guaranteed to be stable and may have included breaking changes in minor or patch releases. Users on older versions should consult specific release notes for migration paths. Version 1.0.0 introduced API stability.","severity":"breaking","affected_versions":"<1.0.0"},{"fix":"Benchmark with your specific use case. For extremely small-scale, infrequent operations, consider simpler string methods.","message":"While highly optimized, for very small haystacks or a minimal number of patterns (e.g., 1-3 patterns), the overhead of constructing the Aho-Corasick automaton might make simple `str.replace()` or regular expression searches slightly faster due to constant factors. The benefits of Aho-Corasick scale significantly with more patterns and larger haystacks.","severity":"gotcha","affected_versions":"All"},{"fix":"Explicitly configure the `MatchKind` when constructing `AhoCorasick` or `BytesAhoCorasick` if specific overlap handling is required. Consult the documentation for `MatchKind` options.","message":"The underlying Aho-Corasick algorithm has different 'MatchKind' semantics (e.g., standard, leftmost-first, leftmost-longest) that dictate how overlapping matches are reported. Using the wrong MatchKind can lead to unexpected results. The default `MATCHKIND_STANDARD` reports all possible matches, including overlaps.","severity":"gotcha","affected_versions":"All"},{"fix":"For memory-constrained environments or applications with very high pattern counts, explore `Implementation.ContiguousNFA` or `Implementation.NoncontiguousNFA` to reduce memory footprint at the potential cost of search speed. Profile your application to find the optimal balance.","message":"Building a Deterministic Finite Automaton (DFA) for maximum search speed can be memory-intensive and slow, especially with a very large number of patterns. The library uses a heuristic by default, but you can explicitly configure the underlying `Implementation` (e.g., `DFA`, `NFA`) which offers trade-offs between build time, memory usage, and search speed.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"The main class for string matching in `ahocorasick_rs` is `AhoCorasick` (or `BytesAhoCorasick` for byte strings). Replace `ahocorasick_rs.Automaton` with `ahocorasick_rs.AhoCorasick`.","cause":"Users migrating from the `pyahocorasick` library often incorrectly assume the class name 'Automaton' is used in `ahocorasick_rs`.","error":"AttributeError: module 'ahocorasick_rs' has no attribute 'Automaton'"},{"fix":"Ensure the input `patterns` argument is an iterable (e.g., `['pattern1', 'pattern2']`) and that all elements within it are consistent (e.g., all `str` or all `bytes`).","cause":"The `AhoCorasick` constructor expects an iterable (like a list or tuple) of patterns, and all patterns must be of the same type (all strings or all bytes).","error":"TypeError: patterns must be an iterable of strings or bytes"},{"fix":"Install the library using `pip install ahocorasick-rs`. If using a virtual environment, ensure it's activated.","cause":"The `ahocorasick-rs` package is not installed in the current Python environment or the environment is not correctly activated.","error":"ModuleNotFoundError: No module named 'ahocorasick_rs'"}]}