Textparser

0.24.0 · active · verified Sat Apr 11

Textparser is a Python library designed for fast text parsing. It allows users to define token specifications using regular expressions and construct grammars to parse text into a structured parse tree. The project prioritizes parsing speed, as highlighted in its benchmarks. The current version is 0.24.0, released on April 16, 2022, with an infrequent release cadence.

Warnings

Install

Imports

Quickstart

This 'Hello World' example demonstrates how to define a custom parser by subclassing `textparser.Parser`. It specifies token types with regular expressions in `token_specs` and defines a simple grammar using `textparser.Sequence` in `grammar` to parse the string 'Hello, World!' into a parse tree.

import textparser
from textparser import Sequence

class MyParser(textparser.Parser):
    def token_specs(self):
        return [
            ('SKIP', r'[ \r\n\t]+'),
            ('WORD', r'\w+'),
            ('EMARK', '!', r'!'),
            ('COMMA', ',', r','),
            ('MISMATCH', r'.')
        ]

    def grammar(self):
        return Sequence('WORD', ',', 'WORD', '!')

tree = MyParser().parse('Hello, World!')
print('Tree:', tree)
# Expected output: Tree: ['Hello', ',', 'World', '!']

view raw JSON →