interegular
Interegular is a Python library designed to check a subset of Python regular expressions for intersections. Currently at version 0.3.3, it focuses on speed and compatibility with Python's `re` module syntax, differentiating itself from libraries like `greenery` by prioritizing performance over regex reconstruction from Finite State Machines (FSMs). The project appears to have an active development status, with updates released on an irregular cadence. [1, 2, 3]
Warnings
- gotcha The library does not support all Python `re` features due to its FSM backend. Specifically, it lacks support for backwards references (e.g., `\1`, `(?P=name)`) and conditional matching (e.g., `(?(1)a|b)`). Some complex lookaheads/lookbacks may also not work correctly, potentially parsing but yielding incorrect results. [1, 2, 3]
- gotcha Not all `re` flags are currently implemented. The documentation specifically mentions 'ims' (from `aiLmsux`) as being in progress. If your regexes rely on specific flags, verify their support. [1, 2, 3]
- gotcha `interegular` is designed to work with the `lark` parser, but its functionality is currently limited to when the lexer in `lark` is set to `basic` or `contextual`. Using other lexer types might lead to unexpected behavior or errors. [8]
- gotcha Lazy quantifiers (e.g., `*?`, `+?`, `??`) are currently treated the same way as their greedy counterparts (`*`, `+`, `?`). This can lead to surprising behavior and incorrect intersection results if your regex logic relies on non-greedy matching. [18]
Install
-
pip install interegular
Imports
- compare_regexes
from interegular import compare_regexes
- parse_pattern
from interegular import parse_pattern
- Pattern
from interegular import Pattern
- REFlags
from interegular import REFlags
- FSM
from interegular import FSM
- Comparator
from interegular import Comparator
Quickstart
from interegular import compare_regexes
# Example 1: Simple intersection
regexes1 = [r"a+", r"a*b"]
intersections1 = list(compare_regexes(*regexes1))
print(f"Intersections for {regexes1}: {intersections1}")
# Example 2: No intersection
regexes2 = [r"[0-9]+", r"[a-zA-Z]+"]
intersections2 = list(compare_regexes(*regexes2))
print(f"Intersections for {regexes2}: {intersections2}")
# Example 3: More complex patterns
regexes3 = [r"foo(bar|baz)+", r"foobar+"]
intersections3 = list(compare_regexes(*regexes3))
print(f"Intersections for {regexes3}: {intersections3}")
# You can also work with Pattern objects directly
from interegular import parse_pattern, compare_patterns
pattern_a = parse_pattern(r"A.*Z")
pattern_b = parse_pattern(r"A[0-9]+Z")
pattern_intersections = list(compare_patterns(pattern_a, pattern_b))
print(f"Pattern intersections: {pattern_intersections}")