Tree-sitter Python Bindings
Tree-sitter provides Python bindings to the core Tree-sitter parsing library, enabling fast, incremental parsing and the generation of concrete syntax trees for various programming languages. It's actively maintained, with frequent updates (minor and patch releases typically every few weeks or months) to keep pace with the underlying C library. The current version is `0.25.2`.
Warnings
- breaking The `Language` constructor no longer accepts a raw pointer (`int`) directly but expects a capsule object. If you were dynamically loading a `.so` file, you might need to use `ctypes.CDLL` and extract the language function or adjust your loading mechanism. Additionally, `Parser.parse()` no longer accepts a `keep_text` argument, and range arguments for `Query.captures()` and `Query.matches()` have been removed; use `query.set_byte_range()` or `query.set_point_range()` instead.
- breaking Python 3.9 is no longer supported; the library now requires Python 3.10 or newer. The `Language(ptr: int)` constructor was deprecated, urging users to use language binding packages instead of raw pointers. The method `Node.child_containing_descendant` was also deprecated.
- gotcha Incompatibilities can arise between the `tree-sitter` Python package and specific `tree-sitter-<language>` grammar packages if their underlying Tree-sitter ABI versions differ. This can lead to runtime errors when loading languages.
- gotcha High memory usage has been reported in certain scenarios, particularly when parsing very large files or using complex grammars, potentially indicating memory leaks in specific contexts or substantial resource requirements of the core library.
- gotcha While many languages have `tree-sitter-<language>` packages, for less common languages or custom grammars, you might need to manually compile the grammar into a shared library (`.so` or `.dll`) and then load it. The `tree-sitter-languages` package, which simplifies loading, is currently unmaintained and may have compatibility issues with newer `tree-sitter` versions.
Install
-
pip install tree-sitter -
pip install tree-sitter-python
Imports
- Language
from tree_sitter import Language
- Parser
from tree_sitter import Parser
- Query
from tree_sitter import Query
- QueryCursor
from tree_sitter import QueryCursor
- language
import tree_sitter_python as tspython; PY_LANGUAGE = Language(tspython.language())
Quickstart
import os
from tree_sitter import Language, Parser
# NOTE: For this to work, you need 'tree-sitter-python' installed
# pip install tree-sitter-python
import tree_sitter_python as tspython
# Load the Python language grammar
PY_LANGUAGE = Language(tspython.language())
# Create a parser and set its language
parser = Parser()
parser.set_language(PY_LANGUAGE)
# Source code to parse
code = b"""
def greet(name): # type: (str) -> None
print(f"Hello, {name}!")
greet("World")
"""
# Parse the code
tree = parser.parse(code)
# Get the root node and print its type
root_node = tree.root_node
print(f"Root node type: {root_node.type}")
# Traverse the tree (example: find function definitions)
query_string = """
(function_definition
name: (identifier) @function.name)
"""
query = PY_LANGUAGE.query(query_string)
captures = query.captures(root_node)
for node, name in captures:
print(f"Found function: {node.text.decode('utf8')} (capture: {name})")