Textparser
Textparser is a Python library designed for fast text parsing. It allows users to define token specifications using regular expressions and construct grammars to parse text into a structured parse tree. The project prioritizes parsing speed, as highlighted in its benchmarks. The current version is 0.24.0, released on April 16, 2022, with an infrequent release cadence.
Warnings
- gotcha When defining token specifications with `(kind, name, re)`, ensure the `grammar` refers to the `name` instead of the `kind`. Using `kind` when `name` is provided will lead to parsing errors.
- gotcha The structure of parse trees returned by `textparser` can vary, and additional post-processing may be required to fit specific application needs. Its primary goal is speed, not necessarily a universally consistent parse tree format across different grammars.
- gotcha The library's last release was April 2022, indicating a slower development pace. While generally stable, users should be aware of potential future compatibility challenges with newer Python versions or lack of updates for new parsing paradigms.
Install
-
pip install textparser
Imports
- Parser
from textparser import Parser
- Sequence
from textparser import Sequence
Quickstart
import textparser
from textparser import Sequence
class MyParser(textparser.Parser):
def token_specs(self):
return [
('SKIP', r'[ \r\n\t]+'),
('WORD', r'\w+'),
('EMARK', '!', r'!'),
('COMMA', ',', r','),
('MISMATCH', r'.')
]
def grammar(self):
return Sequence('WORD', ',', 'WORD', '!')
tree = MyParser().parse('Hello, World!')
print('Tree:', tree)
# Expected output: Tree: ['Hello', ',', 'World', '!']