Tree-sitter Markdown Grammar
tree-sitter-markdown provides a robust and comprehensive Markdown grammar for the Tree-sitter parsing library. It enables high-performance syntactic parsing of Markdown content, exposing an abstract syntax tree (AST) for various applications like code highlighting, refactoring, and static analysis. The library is actively maintained, with frequent minor releases addressing grammar improvements and compatibility updates.
Warnings
- breaking Version 0.5.2 updated Rust bindings to require `tree-sitter >=0.26.3` and replaced the `parse_with` API method with `parse_with_options` in the core `tree-sitter` library. Users with older `tree-sitter` versions or those using the Rust bindings directly will need to update and adjust their code.
- breaking Version 0.5.0 involved regenerating parsers with ABI 15. This change in the core Tree-sitter Application Binary Interface (ABI) might require users with custom-built `tree-sitter` libraries or older pre-compiled grammars to recompile them or update their `tree-sitter` installation to maintain compatibility.
- gotcha While `tree-sitter-markdown` provides pre-compiled language bindings via wheels for common platforms, `tree-sitter` itself (and thus any custom language builds) often requires a C/C++ compiler toolchain (e.g., `gcc`, `clang`) to be present on the system for installation or compilation from source.
Install
-
pip install tree-sitter-markdown
Imports
- language
from tree_sitter_markdown import language
Quickstart
import tree_sitter
from tree_sitter import Parser
from tree_sitter_markdown import language
# Load the Tree-sitter Markdown language
# The `tree_sitter_markdown` package provides a pre-compiled language object.
MARKDOWN_LANGUAGE = language()
# Initialize a parser and set its language
parser = Parser()
parser.set_language(MARKDOWN_LANGUAGE)
# Parse a Markdown string
markdown_text = "## Hello Tree-sitter\n\nThis is a *paragraph* with **bold** text."
tree = parser.parse(bytes(markdown_text, "utf8"))
# Print the S-expression representation of the syntax tree
print("Parsed S-expression:")
print(tree.root_node.sexp())
# Example: Accessing a specific node
root_node = tree.root_node
# Assuming the first child is the heading based on the input markdown
if root_node.child_count > 0:
heading_node = root_node.children[0]
print(f"\nFirst node type: {heading_node.type}, text: {heading_node.text.decode('utf8')}")