Lark Parser

raw JSON →
1.3.1 verified Tue May 12 auth: no python install: verified quickstart: verified

Lark is a modern, general-purpose parsing library for Python. It allows users to parse any context-free grammar efficiently with minimal code, supporting algorithms like Earley, LALR(1), and CYK. Lark automatically builds a parse-tree (AST) based on grammar structure and features a fast Unicode lexer. The library is actively maintained with frequent releases, currently at version 1.3.1.

pip install lark
error ModuleNotFoundError: No module named 'lark'
cause The 'lark' package has not been installed in the Python environment being used.
fix
Install the lark library using pip: pip install lark
error lark.exceptions.UnexpectedToken: Unexpected token
cause The input text provided to the parser does not conform to the defined grammar at the point where the unexpected token was encountered.
fix
Review your grammar definition and the input string to ensure they match, paying attention to the reported line and column number. Often, it's a mismatch between expected terminals/rules and the actual input. Debugging with Lark(grammar, debug=True) can provide more detailed conflict information.
error AttributeError: type object 'Lark' has no attribute 'pretty'
cause The `.pretty()` method is intended to be called on a `Tree` object (the result of parsing), not directly on the `Lark` parser class itself.
fix
First, parse the input string to get a Tree object, then call .pretty() on that Tree object: tree = parser.parse(text); print(tree.pretty())
error AttributeError: 'Token' object has no attribute 'children'
cause This error typically occurs within a `Transformer` or `Visitor` when a callback method expects a `Tree` object (which has `children`) but instead receives a `Token` object directly, often because a rule was inlined or simplified, making a token the direct child of another node.
fix
Adjust the Transformer or Visitor method to handle Token objects directly, or modify the grammar to ensure the expected structure is always a Tree where children are anticipated. Using v_args(inline=True) or v_args(meta=True) decorators on transformer methods can help manage arguments.
error lark.exceptions.LexError: Lexer does not allow zero-width tokens.
cause A regular expression defined for a terminal in the grammar is able to match an empty string, which is not permitted by Lark's lexer.
fix
Modify the regular expression for the specified terminal to ensure it always matches at least one character, for example, by changing * (zero or more) to + (one or more) if appropriate, or by making sure regex components are not optional in a way that allows an empty match.
breaking Lark dropped official support for Python versions lower than 3.8 starting with version 1.2.1. Users on older Python environments will need to upgrade Python or use an older Lark version.
fix Upgrade Python to 3.8 or higher. If unable to upgrade, pin `lark<1.2.1`.
breaking The PyPI package name changed from `lark-parser` to `lark` around version 1.0.0. The `lark-parser` package is significantly outdated (0.12.0) and incompatible with recent GitHub releases.
fix Use `pip install lark` for the current version. If you were using `lark-parser`, migrate your imports and ensure you install `lark` instead.
gotcha Changes to the Earley parser (e.g., in 1.2.2 and 1.3.0) related to ambiguity resolution might subtly change the parse tree output for certain ambiguous grammars. While often bug fixes, they can alter behavior if you relied on previous implicit resolution.
fix Thoroughly test grammars after upgrading. If specific ambiguity resolution is critical, explicitly handle it in your grammar or code, or review the `ambiguity` parameter options for `Lark`.
gotcha Lark's cache hashing mechanism changed from MD5 to SHA256 in version 1.1.6. If you rely on cached parsers, you might need to clear your cache or recompile parsers after upgrading to avoid unexpected behavior.
fix Clear any stored Lark parser cache files or re-instantiate `Lark` objects to force recompilation.
gotcha The `Lark.save()` method now explicitly raises an error if the parser type is not LALR (`parser!='lalr'`). Attempting to save Earley or CYK parsers will fail.
fix Only use `Lark.save()` for LALR parsers. For other parser types, consider regenerating the parser from the grammar each time or exploring custom serialization if absolutely necessary.
gotcha Lark's default regex behavior is greedy, which can lead to unexpected parsing results for free-form text or ambiguous patterns where a shorter match is desired.
fix Carefully design regex terminals in your grammar. Use non-greedy quantifiers (`*?`, `+?`) where appropriate, or consider `strict=True` for debugging, or using explicit negative lookaheads/lookbehinds. For highly ambiguous or free-form text, consider if Lark is the best tool or adapt your grammar significantly.
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.12s 18.6M
3.10 slim (glibc) - - 0.10s 19M
3.11 alpine (musl) - - 0.21s 20.7M
3.11 slim (glibc) - - 0.12s 21M
3.12 alpine (musl) - - 0.13s 12.5M
3.12 slim (glibc) - - 0.18s 13M
3.13 alpine (musl) - - 0.12s 12.1M
3.13 slim (glibc) - - 0.12s 13M
3.9 alpine (musl) - - 0.08s 18.1M
3.9 slim (glibc) - - 0.12s 19M

This quickstart demonstrates how to define a simple arithmetic grammar, parse an expression, and then use a Transformer to evaluate the resulting parse tree. It showcases the core `Lark` parser and `Transformer` classes.

from lark import Lark, Transformer, v_args

# Define your grammar as a string
grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*
    ?term: factor (("*") factor)*
    ?factor: NUMBER | "(" expression ")"

    %import common.NUMBER
    %import common.WS
    %ignore WS
"""

# Create the parser
arith_parser = Lark(grammar, start='expression')

# Define a transformer to evaluate the expression
@v_args(inline=True)
class CalculateString(Transformer):
    from operator import add, sub, mul
    number = int

    def expression(self, first, *rest):
        for op, num in zip(rest[::2], rest[1::2]):
            if op == '+':
                first = self.add(first, num)
            elif op == '-':
                first = self.sub(first, num)
        return first

    def term(self, first, *rest):
        for op, num in zip(rest[::2], rest[1::2]):
            if op == '*':
                first = self.mul(first, num)
        return first

# Example usage
text_to_parse = "(1 + 2) * 3"
tree = arith_parser.parse(text_to_parse)
result = CalculateString().transform(tree)

print(f"Parsed expression: {text_to_parse}")
print(f"Result: {result}")