pygmars
raw JSON → 1.0.0 verified Fri May 01 auth: no python
A library for crafting simple regex-based small language lexers and parsers. Build parsers from grammars and accept Pygments lexers as input. Derived from NLTK. Current version 1.0.0, released May 2024. Maintained by aboutcode-org, release cadence irregular.
pip install pygmars Common errors
error RecursionError: maximum recursion depth exceeded ↓
cause Left-recursive grammar productions cause infinite recursion in the parser.
fix
Rewrite grammar to eliminate left recursion. For example, replace E -> E + T | T with left-recursion removed grammar: E -> T E', E' -> + T E' | ε (use empty production via explicit epsilon).
error TypeError: 'str' object is not callable ↓
cause Trying to use add_ignore() or add_token() with a string that is not a valid regex or mis-calling the method.
fix
Ensure add_ignore(' ') uses a string argument, and method calls are correct: lexer.add_ignore(' '), not lexer.add_ignore = ' '.
error ValueError: Token pattern '...' is not a valid regex ↓
cause Invalid regex pattern passed to add_token or add_ignore.
fix
Check the regex syntax. Escaping backslashes correctly, e.g., r'\d+' not '\d+'.
Warnings
breaking In version 1.0.0, the __str__ method of ParseString no longer formats the string, breaking code that relied on it (PR #14). ↓
fix If you parsed structured data and used str(ParseString(...)) expecting formatting, you now need to call .format() or access attributes directly.
deprecated The library is derived from NLTK but has diverged. Do not mix imports; use pygmars exclusively for lexing/parsing tasks. ↓
fix Ensure you import only from pygmars, not from nltk.
gotcha Lexer token regexes are compiled in order of addition; overlapping patterns may lead to unexpected tokenization. Add more specific tokens first. ↓
fix Order token additions from most specific to least specific to avoid regex precedence issues.
gotcha The library does not support left-recursive grammars directly. Defining left-recursive productions (e.g., E -> E + T) will cause recursion depth error. ↓
fix Restructure grammars to be right-recursive or use a different parsing strategy.
Imports
- Lexer
from pygmars import Lexer - Grammar
from pygmars import Grammar - Parser
from pygmars import Parser - ParseString
from pygmars import ParseString - Tree
from pygmars import Tree
Quickstart
from pygmars import Lexer, Grammar, Parser
# Define a simple grammar for arithmetic expressions
lexer = Lexer()
lexer.add_token('NUM', r'\d+')
lexer.add_token('PLUS', r'\+')
lexer.add_token('MINUS', r'-')
lexer.add_token('TIMES', r'\*')
lexer.add_token('DIVIDE', r'/')
lexer.add_token('LPAREN', r'\(')
lexer.add_token('RPAREN', r'\)')
lexer.add_ignore(' ')
grammar = Grammar()
grammar.add_production('E', ['E', 'PLUS', 'T'])
grammar.add_production('E', ['T'])
grammar.add_production('T', ['T', 'TIMES', 'F'])
grammar.add_production('T', ['F'])
grammar.add_production('F', ['LPAREN', 'E', 'RPAREN'])
grammar.add_production('F', ['NUM'])
parser = Parser(grammar)
tokens = lexer.tokenize('2+3*4')
parse_tree = parser.parse(tokens)
print(parse_tree.pformat())