RPly: Python Lex/Yacc Parser Generator
RPly is a pure Python parser generator, offering a modern API and compatibility with RPython. It is a re-implementation of David Beazley's PLY library. RPly simplifies the process of building lexers (tokenizers) and parsers (syntax analyzers) for domain-specific languages or custom syntaxes. The current version is 0.7.8, and it has a relatively slow release cadence.
Warnings
- gotcha When targeting RPython, AST nodes passed between parser productions *must* inherit from `rply.token.BaseBox`. This ensures type compatibility in the RPython type inference system. For pure Python usage, this inheritance is not strictly necessary but is good practice if RPython compatibility might be a future goal.
- gotcha Custom error handlers provided to `ParserGenerator` or `LexerGenerator` must raise an exception to correctly signal a parsing or lexing error. If an error handler merely returns, the parser/lexer will attempt to continue, potentially leading to incorrect results or infinite loops.
- gotcha Omitting or incorrectly defining precedence rules in `ParserGenerator` can lead to ambiguous grammars and unexpected parse trees, especially with operators like multiplication/division and addition/subtraction.
Install
-
pip install rply
Imports
- LexerGenerator
from rply import LexerGenerator
- ParserGenerator
from rply import ParserGenerator
- BaseBox
from rply.token import BaseBox
- LexingError
from rply.lexer import LexingError
- ParsingError
from rply import ParsingError
Quickstart
from rply import LexerGenerator, ParserGenerator, ParsingError
from rply.token import BaseBox
# 1. Define the Abstract Syntax Tree (AST) nodes
class Number(BaseBox):
def __init__(self, value):
self.value = value
def eval(self):
return self.value
class BinaryOp(BaseBox):
def __init__(self, left, right):
self.left = left
self.right = right
class Add(BinaryOp):
def eval(self):
return self.left.eval() + self.right.eval()
class Sub(BinaryOp):
def eval(self):
return self.left.eval() - self.right.eval()
class Mul(BinaryOp):
def eval(self):
return self.left.eval() * self.right.eval()
class Div(BinaryOp):
def eval(self):
return self.left.eval() / self.right.eval()
# 2. Build the Lexer
lg = LexerGenerator()
lg.add('NUMBER', r'\d+')
lg.add('PLUS', r'\+')
lg.add('MINUS', r'-')
lg.add('MUL', r'\*')
lg.add('DIV', r'/')
lg.add('OPEN_PAREN', r'\(')
lg.add('CLOSE_PAREN', r'\)')
lg.ignore(r'\s+')
lexer = lg.build()
# 3. Build the Parser
pg = ParserGenerator(
['NUMBER', 'PLUS', 'MINUS', 'MUL', 'DIV', 'OPEN_PAREN', 'CLOSE_PAREN'],
precedence=[('left', ['PLUS', 'MINUS']), ('left', ['MUL', 'DIV'])]
)
@pg.production('expression : NUMBER')
def expression_number(p):
return Number(int(p[0].getstr()))
@pg.production('expression : OPEN_PAREN expression CLOSE_PAREN')
def expression_paren(p):
return p[1]
@pg.production('expression : expression PLUS expression')
def expression_plus(p):
return Add(p[0], p[2])
@pg.production('expression : expression MINUS expression')
def expression_minus(p):
return Sub(p[0], p[2])
@pg.production('expression : expression MUL expression')
def expression_mul(p):
return Mul(p[0], p[2])
@pg.production('expression : expression DIV expression')
def expression_div(p):
return Div(p[0], p[2])
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
parser = pg.build()
# 4. Use the Lexer and Parser
text = "(10 + 5) * 2 / 3 - 1"
tokens = lexer.lex(text)
try:
result = parser.parse(tokens).eval()
print(f"Result of '{text}': {result}")
except ParsingError as e:
print(f"Parsing error at position {e.getsourcepos()}: {e}")
except ValueError as e:
print(f"Error: {e}")