Tree-sitter HTML Grammar
tree-sitter-html provides the HTML grammar for the Tree-sitter parsing system. It enables parsing HTML code into a concrete syntax tree, facilitating static analysis, syntax highlighting, and code transformation. The library is actively developed with regular updates, currently at version 0.23.2.
Warnings
- gotcha The `tree-sitter` core library is a mandatory dependency and must be installed alongside `tree-sitter-html` for the Python bindings to function.
- gotcha Grammar updates can lead to changes in the generated Abstract Syntax Tree (AST), potentially affecting existing queries or code analysis logic. Common issues include unexpected handling of whitespace, attribute values, or void elements.
- gotcha When `tree-sitter-html` is used for parsing HTML embedded within other languages (e.g., PHP, JavaScript in `script` tags, CSS in `style` tags), ensure that the corresponding `tree-sitter` grammars (e.g., `tree-sitter-javascript`, `tree-sitter-css`) are also installed for complete and accurate highlighting/parsing of injected languages.
- deprecated Older examples might show `tree_sitter.Language.build_library()` for loading grammars. This method is deprecated for pre-compiled Python wheels (like `tree-sitter-html`).
Install
-
pip install tree-sitter tree-sitter-html
Imports
- Language, Parser
from tree_sitter import Language, Parser import tree_sitter_html
Quickstart
import tree_sitter import tree_sitter_html # Load the HTML language grammar HTML_LANGUAGE = tree_sitter.Language(tree_sitter_html.language()) # Initialize the parser and set the language parser = tree_sitter.Parser() parser.set_language(HTML_LANGUAGE) # Sample HTML code (must be bytes for tree-sitter) html_code = b"<!DOCTYPE html>\n<html><body><h1>Hello, Tree-sitter!</h1></body></html>" # Parse the code tree = parser.parse(html_code) # Get the root node of the syntax tree root_node = tree.root_node # Print a S-expression representation of the tree print(root_node.sexp())