interegular
Interegular is a Python library designed to check a subset of Python regular expressions for intersections. Currently at version 0.3.3, it focuses on speed and compatibility with Python's `re` module syntax, differentiating itself from libraries like `greenery` by prioritizing performance over regex reconstruction from Finite State Machines (FSMs). The project appears to have an active development status, with updates released on an irregular cadence. [1, 2, 3]
Common errors
-
interegular.patterns.Unsupported: Escape \b is not implemented
cause The `interegular` library does not support all features of Python's `re` module, particularly certain escape sequences (like `\b` for word boundaries), backreferences, conditional matching, and some lookaheads/lookbacks, due to its FSM-based backend.fixSimplify the regular expression by removing or rewriting unsupported constructs. Consult the `interegular` documentation for supported regex syntax. -
interegular parse_pattern().to_fsm() very slow / application freezing
cause Complex or very long regular expressions can lead to the generation of extremely large Finite State Machines (FSMs), causing `interegular.parse_pattern().to_fsm()` and `FSM.reduce()` operations to be very slow and consume significant memory, potentially making the application unresponsive.fixSimplify the regular expression, break down complex patterns into smaller, manageable parts, or redesign the logic to avoid overly complex regexes that generate excessively large FSMs. -
AssertionError in make_byte_level_fsm (or similar related to case-insensitive regex)
cause When using case-insensitive regular expressions (`(?i:...)`) with certain Unicode characters (e.g., `ß`, `İ`), `interegular` can encounter an `AssertionError`. This happens because internal logic assumes that `str.upper()` or `str.lower()` operations on a single character will always result in a single-character string, which is not true for all Unicode characters.fixAvoid using case-insensitive flags with problematic Unicode characters or manually handle the case sensitivity for such characters within your regex if possible. A specific fix for this issue might require a patched version of the library as indicated in some discussions. -
interegular.utils.simple_parser.NoMatch: Can not match at index X. Got '...'
cause This error occurs when `interegular`'s internal parser cannot successfully parse the provided regular expression string at a specific index, indicating a syntax error, a malformed regex, or an unsupported pattern that the parser fails to recognize.fixCarefully review the regular expression string for any syntax errors, unclosed groups, or unsupported features. Ensure the regex strictly adheres to the subset of Python `re` syntax that `interegular` is designed to handle.
Warnings
- gotcha The library does not support all Python `re` features due to its FSM backend. Specifically, it lacks support for backwards references (e.g., `\1`, `(?P=name)`) and conditional matching (e.g., `(?(1)a|b)`). Some complex lookaheads/lookbacks may also not work correctly, potentially parsing but yielding incorrect results. [1, 2, 3]
- gotcha Not all `re` flags are currently implemented. The documentation specifically mentions 'ims' (from `aiLmsux`) as being in progress. If your regexes rely on specific flags, verify their support. [1, 2, 3]
- gotcha `interegular` is designed to work with the `lark` parser, but its functionality is currently limited to when the lexer in `lark` is set to `basic` or `contextual`. Using other lexer types might lead to unexpected behavior or errors. [8]
- gotcha Lazy quantifiers (e.g., `*?`, `+?`, `??`) are currently treated the same way as their greedy counterparts (`*`, `+`, `?`). This can lead to surprising behavior and incorrect intersection results if your regex logic relies on non-greedy matching. [18]
Install
-
pip install interegular
Imports
- compare_regexes
from interegular import compare_regexes
- parse_pattern
from interegular import parse_pattern
- Pattern
from interegular import Pattern
- REFlags
from interegular import REFlags
- FSM
from interegular import FSM
- Comparator
from interegular import Comparator
Quickstart
from interegular import compare_regexes
# Example 1: Simple intersection
regexes1 = [r"a+", r"a*b"]
intersections1 = list(compare_regexes(*regexes1))
print(f"Intersections for {regexes1}: {intersections1}")
# Example 2: No intersection
regexes2 = [r"[0-9]+", r"[a-zA-Z]+"]
intersections2 = list(compare_regexes(*regexes2))
print(f"Intersections for {regexes2}: {intersections2}")
# Example 3: More complex patterns
regexes3 = [r"foo(bar|baz)+", r"foobar+"]
intersections3 = list(compare_regexes(*regexes3))
print(f"Intersections for {regexes3}: {intersections3}")
# You can also work with Pattern objects directly
from interegular import parse_pattern, compare_patterns
pattern_a = parse_pattern(r"A.*Z")
pattern_b = parse_pattern(r"A[0-9]+Z")
pattern_intersections = list(compare_patterns(pattern_a, pattern_b))
print(f"Pattern intersections: {pattern_intersections}")