Parsimonious
Parsimonious is a pure-Python library for creating parsers based on Parsing Expression Grammars (PEGs). It aims for speed and usability, allowing users to define grammars using a simplified EBNF notation. It is designed for applications requiring efficient parsing of structured text, such as configuration files or domain-specific languages. The current version is 0.11.0.
Warnings
- breaking Parsimonious is pre-1.0, and API changes have occurred in previous versions (e.g., 0.5). While 0.11.0 doesn't have explicitly documented breaking changes from 0.10.x, it's generally advised to pin exact versions to avoid unexpected behavior changes in future minor releases.
- gotcha The library's internal regex handling uses the `regex` library (not Python's built-in `re`) and employs a specific `~"regex"` syntax. The author has also indicated a potential future deprecation of explicit regexes in favor of dynamically built primitives.
- gotcha Parsimonious is a 'Collection parser' rather than a 'stream parser'. This means it loads the entire input (e.g., a string or file content) into memory before parsing. For extremely large inputs, this can lead to high memory consumption.
- gotcha While direct manipulation of `Node` objects is possible, the recommended and most robust way to process the Abstract Syntax Tree (AST) after parsing is to create a subclass of `NodeVisitor`.
Install
-
pip install parsimonious
Imports
- Grammar
from parsimonious.grammar import Grammar
- NodeVisitor
from parsimonious.nodes import NodeVisitor
Quickstart
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
# 1. Define your grammar
grammar = Grammar(
"""
expression = term (("+" / "-") term)*
term = factor (("*" / "/") factor)*
factor = "(" expression ")" / number
number = ~"[0-9]+"
"""
)
# 2. Parse an input string
input_string = "(10 + 20) * 3"
try:
tree = grammar.parse(input_string)
print(f"Successfully parsed: {input_string}")
# print(tree.prettily())
# 3. (Optional) Process the parse tree using a NodeVisitor
class CalculatorVisitor(NodeVisitor):
def visit_number(self, node, visited_children):
return int(node.text)
def visit_factor(self, node, visited_children):
if len(visited_children) == 3: # ( expression )
_, expr, _ = visited_children
return expr
return visited_children[0] # number
def visit_term(self, node, visited_children):
result = visited_children[0]
for i in range(1, len(visited_children), 2):
op = visited_children[i][0].text # Access the operator node's text
num = visited_children[i+1]
if op == '*':
result *= num
elif op == '/':
result /= num
return result
def visit_expression(self, node, visited_children):
result = visited_children[0]
for i in range(1, len(visited_children), 2):
op = visited_children[i][0].text # Access the operator node's text
num = visited_children[i+1]
if op == '+':
result += num
elif op == '-':
result -= num
return result
def generic_visit(self, node, visited_children):
return visited_children or node
calculator = CalculatorVisitor()
result = calculator.visit(tree)
print(f"Result: {result}")
except Exception as e:
print(f"Error parsing: {e}")