Greenery
Greenery is a Python library designed for the manipulation of regular expressions by converting them into Finite State Machines (FSMs). It enables powerful operations like finding matching strings, determining unions, intersections, and differences between regular expressions. The current version is 4.2.2, and it maintains a relatively active release cadence with several updates per year.
Common errors
-
TypeError: intersection() missing 1 required positional argument: 'language'
cause Attempting to use FSM or Pattern methods like `intersection()`, `union()`, `difference()`, or `reduce()` from `greenery` version 4.0.0 or later without providing the mandatory `language` argument.fixUpdate method calls to explicitly include the `language` argument, which should be a `frozenset` of characters. Example: `(pattern1 & pattern2).reduce(language=pattern1.alphabet | pattern2.alphabet)`. -
AttributeError: 'Pattern' object has no attribute 'matches_any'
cause Using an API method (like `matches_any`, `is_null`, etc.) that was deprecated, renamed, or removed in a major version upgrade, likely version 4.x. The API underwent significant changes.fixConsult the official `greenery` documentation or GitHub repository for the equivalent modern method. For `matches_any`, the `Pattern.matches()` method is now used for individual string matching. -
ValueError: not a subset
cause This error can occur during FSM operations if the alphabets of the involved FSMs are not compatible, often when one FSM's alphabet is not a subset of another's, especially if they were created with different or implied `language` arguments.fixEnsure that all FSMs involved in an operation (like union or intersection) share a consistent and sufficiently broad `language` (alphabet) that encompasses all characters used by both. Provide this unified `language` explicitly when creating the FSMs or performing the operation.
Warnings
- breaking Version 4.0.0 introduced a mandatory `language` parameter for many core FSM methods (e.g., `FSM.intersection`, `FSM.union`, `Pattern.strings`, `Pattern.matches`). Code written for versions < 4.0.0 will raise `TypeError` if these methods are called without the `language` argument.
- gotcha The `language` parameter fundamentally affects the behavior of many operations. If not explicitly specified, it defaults to a `frozenset()` of characters seen so far, which can lead to unexpected results, especially when dealing with character classes (e.g., `[0-9]`) or when expecting an infinite language. Always consider the full character set your regex is intended to operate within.
- gotcha `Pattern.strings()` can yield an infinite number of strings if the regular expression matches an infinite language (e.g., `a*`). Iterating over it without a limit or explicitly checking `Pattern.is_finite()` can lead to an infinite loop, consuming all system resources.
Install
-
pip install greenery
Imports
- parse
from greenery import parse
- FSM
from greenery.fsm import FSM
- Pattern
from greenery.lego import Pattern
Quickstart
from greenery import parse
# Parse a regular expression string into a Pattern object
re_pattern = parse('a(b|c)*d')
# Check if a string matches the pattern
assert re_pattern.matches('ad')
assert re_pattern.matches('abd')
assert re_pattern.matches('abbcd')
assert not re_pattern.matches('aed')
# Get a set of all possible strings matched by the pattern (if finite)
# For infinite patterns, this will yield indefinitely, so use a limit.
# A 'language' (alphabet) must often be supplied for operations like union/intersection in v4.0.0+
# Example with specific alphabet (not strictly needed for .strings() but good practice)
alphabet = frozenset({'a', 'b', 'c', 'd'})
strings_generator = re_pattern.strings(language=alphabet)
# Get a few strings (it can be an infinite generator)
some_strings = [next(strings_generator) for _ in range(5)]
print(f"Some strings matched by 'a(b|c)*d': {some_strings}")
# Demonstrate intersection of two patterns
pattern1 = parse('a.*b')
pattern2 = parse('axb')
# When working with FSMs, ensure a consistent language/alphabet
alph1 = pattern1.alphabet
alph2 = pattern2.alphabet
common_alphabet = alph1 | alph2
intersection_pattern = (pattern1 & pattern2).reduce(language=common_alphabet)
print(f"Intersection of 'a.*b' and 'axb': {intersection_pattern}")
assert intersection_pattern.matches('axb')