untangle
untangle is a tiny Python library for parsing XML documents, converting them into easy-to-use Python objects. It simplifies accessing data in XML files using dot notation for elements and dictionary-like access for attributes. The current version is 1.2.1, released in July 2022, and it has an infrequent release cadence.
Warnings
- breaking Versions 1.2.0 and earlier are vulnerable to XML External Entity (XXE) injection and recursive entity references (Denial of Service). This could allow remote unauthenticated attackers to read local files or cause a DoS condition on the server.
- breaking Starting with version 1.2.1, `untangle` dropped official support for Python versions 3.4-3.6 and PyPy.
- gotcha XML element and attribute names containing hyphens (`-`), periods (`.`), or colons (`:`) are automatically converted to underscores (`_`) for attribute-style access in the Python object. For example, an XML tag `<foo-bar/>` will be accessed as `obj.foo_bar`.
- gotcha The text content within an XML tag (CDATA) is not accessed directly as the element's value but through a special `.cdata` attribute on the element object (e.g., `element.name.cdata`).
- gotcha If an XML document contains multiple sibling elements with the same tag name, `untangle` automatically groups them into a Python list. If there's only one, it's accessed directly as an object.
Install
-
pip install untangle
Imports
- parse
import untangle obj = untangle.parse(...)
Quickstart
import untangle
xml_data = """<?xml version="1.0"?>
<data>
<user id="123" status="active">
<name>Alice</name>
<email>alice@example.com</email>
</user>
<user id="456" status="inactive">
<name>Bob</name>
<email>bob@example.com</email>
</user>
</data>"""
# Parse the XML string
doc = untangle.parse(xml_data)
# Accessing the root element (data)
print(f"Root element tag: {doc.data._tag}")
# Accessing the first user element and its attributes/cdata
first_user = doc.data.user[0]
print(f"First user ID: {first_user['id']}")
print(f"First user Status: {first_user['status']}")
print(f"First user Name: {first_user.name.cdata}")
print(f"First user Email: {first_user.email.cdata}")
# Iterating through users
for user in doc.data.user:
print(f"User (ID: {user['id']}): {user.name.cdata} <{user.email.cdata}>")