Tree-sitter XML & DTD Grammars

0.7.0 · active · verified Sun Apr 12

tree-sitter-xml provides pre-compiled Tree-sitter grammars for XML and DTD. It enables fast, robust parsing of XML and DTD documents within Python applications by integrating with the `tree-sitter` library. The current version is 0.7.0, with updates typically coinciding with upstream Tree-sitter grammar improvements or core library changes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load the XML grammar using `tree_sitter_xml.language()` and then parse a simple XML string using `tree_sitter.Parser`. It also shows how to print the S-expression of the parse tree and find specific elements.

import tree_sitter
from tree_sitter_xml import language

# Load the XML grammar
XML_LANGUAGE = language()

# Initialize the parser
parser = tree_sitter.Parser()
parser.set_language(XML_LANGUAGE)

# Sample XML string
xml_code = """
<root>
  <item id="1">Value 1</item>
  <item id="2">Value 2</item>
</root>
"""

# Parse the XML
tree = parser.parse(xml_code.encode('utf8'))

# Print the S-expression (a common way to inspect the parse tree)
print(f"Parsed XML Tree S-expression:\n{tree.root_node.sexp()}")

# Example of traversing a node (e.g., finding the 'item' elements)
root_node = tree.root_node
item_nodes = [child for child in root_node.children if child.type == 'element' and child.text.decode('utf8').strip().startswith('<item')]

print(f"\nFound {len(item_nodes)} 'item' elements.")
if item_nodes:
    print(f"First item's text: {item_nodes[0].text.decode('utf8')}")

view raw JSON →