XML Diff Utility
xmldiff is a Python library and command-line utility designed to create semantic differences between XML files. Unlike traditional line-by-line diff tools, it focuses on identifying structural and content changes in hierarchical XML data, often generating human-readable diffs. It is currently at version 2.7.0 and is under active development, though the library explicitly states that output formats and edit scripts might change between versions due to ongoing improvements.
Warnings
- breaking xmldiff 2.0 introduced a complete, ground-up rewrite of the library. This change included a new API, different output formats, and was initially significantly slower than previous 0.x/1.x versions. Code written for 0.x/1.x is incompatible with 2.x and later.
- gotcha The `xmldiff` library expects `lxml` ElementTree instances when using functions like `diff_trees()`. Passing standard `xml.etree.ElementTree` objects will result in errors or unexpected behavior, requiring conversion to `lxml` types first.
- gotcha The output (edit script or formatted XML) generated by `xmldiff` can change between minor versions. The library explicitly states that there are 'no guarantees' the output will be the same across versions, as it's under 'rapid development'. This means automated tests relying on exact output matches may break.
- gotcha Prior to version 2.6, `xmldiff` had limited or buggy handling of XML namespaces, potentially leading to 'Unknown namespace prefix' errors. While improved in 2.6, changing the URI of an existing namespace prefix is still not supported and will raise an error.
- gotcha The `ratio-mode` (`accurate`, `faster`, `fast`) and `--fast-match` options can significantly impact the diff's accuracy and performance. The `fast` mode, in particular, yields less accurate results, which might be acceptable for speed but could miss subtle changes.
Install
-
pip install xmldiff
Imports
- main
from xmldiff import main
- formatting
from xmldiff import formatting
- etree
from lxml import etree
Quickstart
import os
from lxml import etree
from xmldiff import main, formatting
# Create dummy XML files
xml1_content = """
<root>
<item id="1">Value A</item>
<item id="2">Value B</item>
</root>
"""
xml2_content = """
<root>
<item id="1">Value A - Changed</item>
<item id="3">Value C</item>
<item id="2" status="new">Value B</item>
</root>
"""
with open('file1.xml', 'w') as f:
f.write(xml1_content)
with open('file2.xml', 'w') as f:
f.write(xml2_content)
# Diff two XML files and format the output as XML with diff tags
diff_output_xml = main.diff_files(
'file1.xml',
'file2.xml',
formatter=formatting.XMLFormatter(pretty_print=True)
)
print("--- XML Diff ---")
print(diff_output_xml)
# Clean up dummy files
os.remove('file1.xml')
os.remove('file2.xml')
# Example using diff_trees with lxml elements directly
tree1 = etree.fromstring(xml1_content)
tree2 = etree.fromstring(xml2_content)
diff_actions = main.diff_trees(tree1, tree2)
print("\n--- Edit Script (List of Actions) ---")
for action in diff_actions:
print(action)