{"id":4814,"library":"tree-sitter-regex","title":"Tree-sitter Regex","description":"tree-sitter-regex provides the Python bindings for the Tree-sitter regex grammar, enabling high-performance parsing of regular expressions into concrete syntax trees. It allows developers to analyze, transform, and understand regex patterns programmatically. The current version is 0.25.0, with an active but somewhat irregular release cadence.","status":"active","version":"0.25.0","language":"en","source_language":"en","source_url":"https://github.com/tree-sitter/tree-sitter-regex","tags":["parsing","regex","tree-sitter","grammar","syntax-tree","ast"],"install":[{"cmd":"pip install tree-sitter-regex","lang":"bash","label":"Install tree-sitter-regex"}],"dependencies":[{"reason":"Required to use the grammar for parsing, as tree-sitter-regex only provides the grammar bindings, not the core parser library.","package":"tree-sitter","optional":false}],"imports":[{"note":"The `tree-sitter-regex` package provides a pre-compiled `Language` object via its `language()` function, simplifying setup. Manual compilation or incorrect dynamic loading is not needed.","wrong":"from tree_sitter import Language; language = Language.build_library('build/my-regex-language.so', ['path/to/tree-sitter-regex'])","symbol":"language","correct":"from tree_sitter_regex import language"}],"quickstart":{"code":"import tree_sitter\nfrom tree_sitter_regex import language\n\n# Get the pre-compiled Tree-sitter Language object for regex\nREGEX_LANGUAGE = language()\n\n# Create a parser instance\nparser = tree_sitter.Parser()\nparser.set_language(REGEX_LANGUAGE)\n\n# Define a regex string to parse (must be bytes)\nregex_string = r\"^([a-zA-Z0-9_\\-]+)\\s*=\\s*(.+)$\"\nencoded_regex = bytes(regex_string, \"utf8\")\n\n# Parse the regex string\ntree = parser.parse(encoded_regex)\n\n# Print the S-expression representation of the syntax tree\nprint(\"--- S-expression Tree ---\")\nprint(tree.root_node.sexp())\n\n# Traverse and print some nodes\nprint(\"\\n--- Node Details ---\")\nroot = tree.root_node\nfor child in root.children:\n    print(f\"Type: {child.type}, Text: {child.text.decode('utf8')}, Start: {child.start_point}, End: {child.end_point}\")\n\n# Example: Find all `_token_name` nodes\nprint(\"\\n--- Token Names Found ---\")\nfor node in root.descendant_for_point_range((0,0), (len(encoded_regex), 0)).children:\n    if node.type == '_token_name':\n        print(f\"Found token name: {node.text.decode('utf8')}\")","lang":"python","description":"This quickstart demonstrates how to initialize the Tree-sitter parser with the `tree-sitter-regex` grammar, parse a regular expression string, and inspect the resulting syntax tree. It shows how to get the `Language` object, set it for the parser, and retrieve node information."},"warnings":[{"fix":"Always import and call `from tree_sitter_regex import language` to get the grammar object: `regex_language = language()`.","message":"The `tree-sitter-regex` package provides a `language()` function that directly returns the pre-compiled `tree_sitter.Language` object. Do not attempt to manually compile the grammar or dynamically load 'regex' using `tree_sitter.Language.build_library` or `Language.load()`, as this can lead to compilation errors or `LanguageNotFound` exceptions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your input string is explicitly encoded to bytes, typically UTF-8, before passing it to `parse()`: `parser.parse(your_string.encode('utf8'))` or `parser.parse(bytes(your_string, 'utf8'))`.","message":"The `tree_sitter.Parser.parse()` method strictly expects a `bytes` object as input, not a Python `str`. Passing a string directly will result in a `TypeError`.","severity":"gotcha","affected_versions":"All `tree-sitter` and `tree-sitter-regex` versions"},{"fix":"After retrieving `node.text`, decode it using the appropriate encoding, usually UTF-8: `node.text.decode('utf8')`.","message":"When accessing node text (e.g., `node.text`), the returned value is always a `bytes` object. For human-readable output or string manipulation, this `bytes` object must be decoded.","severity":"gotcha","affected_versions":"All `tree-sitter` and `tree-sitter-regex` versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}