DataDiff (Python Data Structures)
DataDiff is a Python library (version 2.2.0) designed to provide human-readable diffs of common Python data structures, including lists, tuples, sets, and dictionaries. It recursively compares nested structures and offers special handling for multi-line strings, presenting them in a unified diff format. The library also provides drop-in replacements for some `nose` assertions, displaying clear data differences upon assertion failures. The project has a slower but consistent release cadence.
Common errors
-
ModuleNotFoundError: No module named 'data_diff'
cause You have installed `datadiff` (for Python data structures) but are trying to import `data_diff` (with an underscore), which belongs to the database diffing tool `data-diff`.fixIf you intend to compare Python data structures, use `from datadiff import diff`. If you intend to compare databases, run `pip install data-diff` and then import `from data_diff import connect_to_table, diff_tables`. -
Diff output for custom objects is unhelpful (e.g., shows memory addresses or object representations without semantic differences).
cause `datadiff` is optimized for standard Python collections. For custom objects, it may fall back to default object comparison, which often relies on `id()` or `__repr__()` if `__eq__()` is not properly implemented.fixFor meaningful diffs of custom objects, ensure your classes implement `__eq__` and `__repr__` methods, or convert your custom objects into standard Python dictionaries or lists before passing them to `datadiff.diff()`.
Warnings
- gotcha This `datadiff` library (no hyphen) is specifically for diffing Python data structures (dicts, lists, sets, strings). There is a similarly named but entirely different library, `data-diff` (with a hyphen), which focuses on diffing database tables. Ensure you are installing and importing the correct library for your use case to avoid confusion.
- gotcha While `datadiff` states compatibility with Python 2.6 through Python 3, using it in modern Python 3-only projects might encounter subtle compatibility issues or less optimized behavior compared to Python 3-native diffing tools. Its feature set might not leverage newer Python 3 language features.
Install
-
pip install datadiff
Imports
- diff
from data_diff import diff_tables
from datadiff import diff
Quickstart
from datadiff import diff a = dict(foo=1, bar=2, baz=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) b = dict(foo=1, bar=4, baz=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'changed', 11], qux='new') result = diff(a, b) print(result)