Arpeggio Parser
Arpeggio is a Python library that provides a Packrat parser interpreter. It allows you to define grammars using Python functions or EBNF-like strings and then parse input text according to those grammars. The current version is 2.0.3, with releases focusing on bug fixes, performance, and modern Python compatibility.
Warnings
- breaking Arpeggio 2.0.0 and later dropped support for Python 2.x and Python 3.x up to 3.5. The lowest supported Python version is now 3.6.
- gotcha Accessing a non-existent rule name as an attribute on a parse tree node (e.g., `parse_tree.non_existent_rule`) will raise an `AttributeError`.
- gotcha Error reporting for `NoMatch` exceptions was enhanced in 2.0.0 with the `eval_attrs` call, providing more detailed information on parse failures. Older versions might have less informative error messages.
- gotcha Arpeggio offers two primary ways to define grammars: using EBNF-like strings with `Parser` or Python functions with `ParserPython`. Mixing these or choosing the wrong parser can lead to unexpected behavior.
Install
-
pip install arpeggio
Imports
- ParserPython
from arpeggio import ParserPython
- Parser
from arpeggio import Parser
- PTNodeVisitor
from arpeggio import PTNodeVisitor
- visit_parse_tree
from arpeggio import visit_parse_tree
Quickstart
from arpeggio import ParserPython, visit_parse_tree
from arpeggio import PTNodeVisitor
# 1. Define your grammar using Python functions or a string
def calculator_grammar():
return r"""
calc = number (("+"|"-") number)* ;
number = /\d+/ ;
"""
# 2. Create a parser instance
# For grammars defined as strings, use Parser(grammar_string)
# For grammars defined as Python functions, use ParserPython(grammar_function)
parser = ParserPython(calculator_grammar)
# 3. Parse input text
input_expr = "10 + 20 - 5"
parse_tree = parser.parse(input_expr)
# 4. (Optional) Process the parse tree using a visitor
class CalculatorVisitor(PTNodeVisitor):
def visit_number(self, node):
return int(node.value)
def visit_calc(self, node):
# The parse tree node will contain parsed elements as children
res = node[0] # first number
for i in range(1, len(node), 2):
op = node[i].value
num = node[i+1]
if op == '+':
res += num
elif op == '-':
res -= num
return res
result = visit_parse_tree(parse_tree, CalculatorVisitor())
assert result == 25
# print(f"Input: '{input_expr}', Result: {result}")