XML Schema Validator and Decoder
xmlschema is a Python library for validating XML documents against XML Schema Definition (XSD) files and for decoding XML data into Python dictionaries. It supports XPath 1.0/2.0+ for schema processing and data extraction. The library is actively maintained with regular major and minor releases, typically several per year.
Warnings
- breaking Python 3.8 support was dropped in `xmlschema` v4.0.0, and Python 3.9 support was dropped in `xmlschema` v4.3.0. Users on older Python versions must use `xmlschema` v3.x or upgrade their Python interpreter.
- breaking Version 4.0.0 introduced significant internal API changes, replacing generators with normal functions for decoding/encoding and standardizing on `DecodeContext` and `EncodeContext` instead of keyword arguments. Code directly interacting with these lower-level APIs or internal methods will break.
- gotcha Version 4.2.0 introduced package limits for schema sources (`MAX_SCHEMA_SOURCES`) and XML elements (`MAX_XML_ELEMENTS`), and reduced `MAX_XML_DEPTH`. These limits are designed to prevent resource exhaustion but can lead to `XMLSchemaResourceError` exceptions for very large or deeply nested schemas/documents if not configured.
- gotcha While `xmlschema` can use `lxml` for improved parsing performance, it is an optional dependency. If `lxml` is not installed, the library falls back to Python's built-in `xml.etree.ElementTree`. Users expecting `lxml`'s speed or specific features (like `iterparse`'s `lxml` argument) must explicitly install it.
Install
-
pip install xmlschema -
pip install "xmlschema[lxml]"
Imports
- XMLSchema
from xmlschema import XMLSchema
- XMLResource
from xmlschema import XMLResource
- ValidationError
from xmlschema import ValidationError
Quickstart
import xmlschema
# Define a simple XML Schema (XSD) and an XML document
xsd_content = '''
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
'''
xml_content = '''
<data>
<item>First</item>
<item>Second</item>
</data>
'''
# 1. Load the XML Schema
try:
schema = xmlschema.XMLSchema(xsd_content)
print("Schema loaded successfully.")
except xmlschema.XMLSchemaException as e:
print(f"Schema error: {e}")
exit(1)
# 2. Validate an XML document against the schema
try:
schema.validate(xml_content)
print("XML document is valid.")
except xmlschema.ValidationError as e:
print(f"XML validation error: {e}")
# 3. Decode the XML document into a Python dictionary
try:
data_dict = schema.to_dict(xml_content)
print("\nDecoded XML to dictionary:")
print(data_dict)
except xmlschema.ValidationError as e:
print(f"Decoding error: {e}")
# Example of invalid XML
invalid_xml_content = '''
<data>
<extra_item>This should not be here</extra_item>
<item>Valid Item</item>
</data>
'''
try:
schema.validate(invalid_xml_content)
print("Invalid XML document is valid (ERROR!).")
except xmlschema.ValidationError as e:
print(f"Invalid XML correctly failed validation: {e.reason.splitlines()[0]}...")