C++ Grammar for Tree-sitter
tree-sitter-cpp provides the C++ grammar for the Tree-sitter parsing library. It enables Python applications to parse C++ code, build abstract syntax trees (ASTs), and perform code analysis tasks like syntax highlighting, linting, and refactoring. The library is actively maintained with frequent minor updates (as seen in its v0.23.x releases), and it requires the core `tree-sitter` Python package to function.
Warnings
- breaking The `Language.build_library` static method was removed from the `py-tree-sitter` library around version 0.21.x.
- gotcha Installing `tree-sitter-cpp` alone is not sufficient to parse C++ code in Python. The core `tree-sitter` Python package, which provides the `Language` and `Parser` classes, must also be installed and imported.
- gotcha The `tree-sitter` parsing functions expect source code as a `bytes` object (UTF-8 or UTF-16 encoded), not a standard Python `str`.
Install
-
pip install tree-sitter-cpp tree-sitter
Imports
- Language, Parser
from tree_sitter import Language, Parser
- language()
import tree_sitter_cpp as ts_cpp cpp_language = Language(ts_cpp.language())
Quickstart
import tree_sitter_cpp as ts_cpp
from tree_sitter import Language, Parser
# Load the C++ grammar
CPP_LANGUAGE = Language(ts_cpp.language())
# Create a parser and set its language
parser = Parser()
parser.set_language(CPP_LANGUAGE)
# C++ code to parse
cpp_code = b"""#include <iostream>
int main() {
std::cout << "Hello, Tree-sitter!" << std::endl;
return 0;
}
"""
# Parse the code
tree = parser.parse(cpp_code)
# Get the root node of the syntax tree
root_node = tree.root_node
# Print the root node type and its first child (for demonstration)
print(f"Root node type: {root_node.type}")
if root_node.children:
print(f"First child type: {root_node.children[0].type}")
print(f"First child text: {root_node.children[0].text.decode('utf8')}")
# Example of traversing the tree (optional, for deeper analysis)
def traverse(node, depth=0):
print(' ' * depth + f'- {node.type} [ {node.start_point} - {node.end_point} ] {node.text.decode('utf8') if node.text else ''}')
for child in node.children:
traverse(child, depth + 1)
# print("\nFull AST:")
# traverse(root_node)