Tree-sitter Regex

0.25.0 · active · verified Sun Apr 12

tree-sitter-regex provides the Python bindings for the Tree-sitter regex grammar, enabling high-performance parsing of regular expressions into concrete syntax trees. It allows developers to analyze, transform, and understand regex patterns programmatically. The current version is 0.25.0, with an active but somewhat irregular release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Tree-sitter parser with the `tree-sitter-regex` grammar, parse a regular expression string, and inspect the resulting syntax tree. It shows how to get the `Language` object, set it for the parser, and retrieve node information.

import tree_sitter
from tree_sitter_regex import language

# Get the pre-compiled Tree-sitter Language object for regex
REGEX_LANGUAGE = language()

# Create a parser instance
parser = tree_sitter.Parser()
parser.set_language(REGEX_LANGUAGE)

# Define a regex string to parse (must be bytes)
regex_string = r"^([a-zA-Z0-9_\-]+)\s*=\s*(.+)$"
encoded_regex = bytes(regex_string, "utf8")

# Parse the regex string
tree = parser.parse(encoded_regex)

# Print the S-expression representation of the syntax tree
print("--- S-expression Tree ---")
print(tree.root_node.sexp())

# Traverse and print some nodes
print("\n--- Node Details ---")
root = tree.root_node
for child in root.children:
    print(f"Type: {child.type}, Text: {child.text.decode('utf8')}, Start: {child.start_point}, End: {child.end_point}")

# Example: Find all `_token_name` nodes
print("\n--- Token Names Found ---")
for node in root.descendant_for_point_range((0,0), (len(encoded_regex), 0)).children:
    if node.type == '_token_name':
        print(f"Found token name: {node.text.decode('utf8')}")

view raw JSON →