Python Lex & Yacc
PLY (Python Lex-Yacc) is a pure-Python implementation of the lex and yacc tools commonly used to write parsers and compilers. It implements the LALR(1) parsing algorithm and offers extensive error reporting and diagnostic information. While version 3.11 is the latest stable release, the project's author announced its abandonment on December 21, 2025, with no further maintenance expected.
Warnings
- breaking The author officially announced the abandonment of the PLY project on December 21, 2025. No further maintenance is expected. Users are advised to consider other parsing libraries or vendor PLY into their projects.
- breaking PLY 3.11 requires Python 3.6 or greater. Python 2.x is not supported by this version or later development (PLY 4.0 drops Python 2 support entirely).
- gotcha PLY primarily uses Python function docstrings to define regular expressions for lexer rules and grammar rules for the parser. When running Python in optimized mode (`python -O`), docstrings are ignored, which breaks PLY's introspection-based rule discovery.
- gotcha The author strongly recommends against using `pip install ply` and instead suggests copying the `ply` directory directly into your project (vendoring). This is due to the project's specialized nature, its 'zero-dependency' status, and the author's desire not to be a link in a software supply chain that might be affected by occasional breaking changes or the recent abandonment.
- deprecated Using module-level functions like `lex.input()` and `lex.token()` directly (without first creating a lexer object) is discouraged and may be removed in future versions. These functions operate on the 'last' lexer created, which can lead to unexpected behavior in applications with multiple lexers or complex control flow.
Install
-
pip install ply
Imports
- lex
from ply import lex
- yacc
from ply import yacc
- Lexer
import ply.lex as lex; lexer = lex.lex(...)
- Parser
import ply.yacc as yacc; parser = yacc.yacc(...)
Quickstart
import ply.lex as lex
import ply.yacc as yacc
import os # For optimizer cache
# --- LEXER ---
# List of token names.
tokens = (
'NUMBER',
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'LPAREN',
'RPAREN'
)
# Regular expression rules for simple tokens
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'
# A regular expression rule with some action code
def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = ' \t'
# Error handling rule
def t_error(t):
print(f"Illegal character '{t.value[0]}' at line {t.lexer.lineno}")
t.lexer.skip(1)
# Build the lexer
# To handle Python's -O (optimized mode) ignoring docstrings, use optimize=1
# and ensure the lextab.py file is written.
# For a quickstart, we'll assume not running in optimized mode or handle it.
# If running in optimized mode, you might need to pre-generate tables.
# For this example, we explicitly ensure the cache directory exists and optimize is off for simplicity.
# If you intend to use optimized mode, pre-generate tables:
# lexer = lex.lex(optimize=1, lextab='lextab.py', outputdir='.')
lexer = lex.lex()
# --- PARSER ---
# Precedence rules for the arithmetic operators
precedence = (
('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE'),
)
# Grammar rules for expressions
def p_expression_binop(p):
'''expression : expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
if p[3] == 0:
print("Division by zero!")
p[0] = None # Or raise an error
else:
p[0] = p[1] / p[3]
def p_expression_group(p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
def p_expression_number(p):
'expression : NUMBER'
p[0] = p[1]
def p_error(p):
if p:
print(f"Syntax error at token '{p.type}' value '{p.value}' line {p.lineno}")
else:
print("Syntax error at EOF")
# Build the parser
parser = yacc.yacc()
# Test it out
while True:
try:
s = input('calc > ')
except EOFError:
break
if not s: continue
result = parser.parse(s)
if result is not None:
print(result)