YAML Grammar for Tree-sitter
tree-sitter-yaml provides a pre-compiled Tree-sitter grammar for parsing YAML files within Python applications. It enables detailed syntax analysis and manipulation of YAML structures. Currently at version 0.7.2, the library is actively maintained and receives regular updates, often in conjunction with upstream Tree-sitter changes.
Warnings
- breaking Tree-sitter grammars are compiled against a specific ABI version of the core Tree-sitter library. Updating the `tree-sitter` Python package (or underlying C library) without a corresponding `tree-sitter-yaml` update can lead to 'Incompatible language version' errors.
- gotcha Directly using the `tree-sitter` library often involves compiling grammars. However, `tree-sitter-yaml` (and `tree-sitter-languages`) provides pre-compiled binary wheels. Attempting manual compilation (`Language.build_library()`) for these packages is usually unnecessary and can lead to errors or confusion if the package already supplies a pre-built grammar.
- gotcha Some users have reported memory leaks or unexpected highlighting issues when reusing Tree-sitter YAML parsers, especially with very large files or specific text editing patterns in integrated environments (e.g., Neovim). This can manifest as broken highlighting or increased memory consumption.
- breaking Downstream tools relying on Tree-sitter grammars (e.g., `nvim-treesitter` for syntax highlighting) may experience breaking changes if Tree-sitter's internal capture group naming conventions are updated. This requires corresponding updates in client applications' configurations, queries, or color schemes.
Install
-
pip install tree-sitter-yaml
Imports
- language
import tree_sitter_yaml from tree_sitter import Language, Parser YAML_LANGUAGE = Language(tree_sitter_yaml.language())
- get_language, get_parser
from tree_sitter_languages import get_language, get_parser # Or, for YAML specifically if available via this package language = get_language('yaml') parser = get_parser('yaml')
Quickstart
import tree_sitter_yaml
from tree_sitter import Language, Parser
# Load the YAML language grammar
YAML_LANGUAGE = Language(tree_sitter_yaml.language())
# Initialize the parser with the YAML language
parser = Parser()
parser.set_language(YAML_LANGUAGE)
# Example YAML content
yaml_code = b"""
name: John Doe
age: 30
cities:
- New York
- London
"""
# Parse the YAML code
tree = parser.parse(yaml_code)
# Get the root node of the syntax tree
root_node = tree.root_node
# Print the tree structure (simplified for quickstart)
def print_node(node, indent=0):
print(' ' * indent + f"Type: {node.type}, Text: {node.text.decode('utf8')}")
for child in node.children:
print_node(child, indent + 1)
print_node(root_node)
# Example: Find a specific node type (e.g., 'pair')
query = YAML_LANGUAGE.query("""
(pair (key) @key (value) @value)
""")
captures = query.captures(root_node)
print("\nKey-Value Pairs:")
for node, name in captures:
if name == 'key':
key_text = node.text.decode('utf8')
elif name == 'value':
value_text = node.text.decode('utf8')
print(f" Key: {key_text}, Value: {value_text}")