Python Lex & Yacc

3.11 · abandoned · verified Sat Mar 28

PLY (Python Lex-Yacc) is a pure-Python implementation of the lex and yacc tools commonly used to write parsers and compilers. It implements the LALR(1) parsing algorithm and offers extensive error reporting and diagnostic information. While version 3.11 is the latest stable release, the project's author announced its abandonment on December 21, 2025, with no further maintenance expected.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a simple calculator using PLY's lexer and parser. It defines tokens, regular expression rules for lexical analysis, and grammar rules for parsing. The lexer (`lex.lex()`) and parser (`yacc.yacc()`) are then built and used to process input.

import ply.lex as lex
import ply.yacc as yacc
import os # For optimizer cache

# --- LEXER --- 

# List of token names.
tokens = (
    'NUMBER',
    'PLUS',
    'MINUS',
    'TIMES',
    'DIVIDE',
    'LPAREN',
    'RPAREN'
)

# Regular expression rules for simple tokens
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_LPAREN  = r'\('
t_RPAREN  = r'\)'

# A regular expression rule with some action code
def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

# Define a rule so we can track line numbers
def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

# A string containing ignored characters (spaces and tabs)
t_ignore  = ' \t'

# Error handling rule
def t_error(t):
    print(f"Illegal character '{t.value[0]}' at line {t.lexer.lineno}")
    t.lexer.skip(1)

# Build the lexer
# To handle Python's -O (optimized mode) ignoring docstrings, use optimize=1
# and ensure the lextab.py file is written. 
# For a quickstart, we'll assume not running in optimized mode or handle it.
# If running in optimized mode, you might need to pre-generate tables.
# For this example, we explicitly ensure the cache directory exists and optimize is off for simplicity.
# If you intend to use optimized mode, pre-generate tables:
# lexer = lex.lex(optimize=1, lextab='lextab.py', outputdir='.')
lexer = lex.lex()

# --- PARSER --- 

# Precedence rules for the arithmetic operators
precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE'),
)

# Grammar rules for expressions
def p_expression_binop(p):
    '''expression : expression PLUS expression
                  | expression MINUS expression
                  | expression TIMES expression
                  | expression DIVIDE expression'''
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]
    elif p[2] == '*':
        p[0] = p[1] * p[3]
    elif p[2] == '/':
        if p[3] == 0:
            print("Division by zero!")
            p[0] = None # Or raise an error
        else:
            p[0] = p[1] / p[3]

def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]

def p_expression_number(p):
    'expression : NUMBER'
    p[0] = p[1]

def p_error(p):
    if p:
        print(f"Syntax error at token '{p.type}' value '{p.value}' line {p.lineno}")
    else:
        print("Syntax error at EOF")

# Build the parser
parser = yacc.yacc()

# Test it out
while True:
    try:
        s = input('calc > ')
    except EOFError:
        break
    if not s: continue
    result = parser.parse(s)
    if result is not None:
        print(result)

view raw JSON →