Diff Match Patch
Google's Diff Match and Patch libraries offer robust algorithms for synchronizing plain text, including diffing two texts, finding fuzzy matches for a pattern, and applying patches. Originally developed for Google Docs, this Python package provides a modern, actively maintained wrapper around the core algorithms. It's suitable for comparing texts, showing differences, and applying changes.
Warnings
- gotcha The original Google diff-match-patch project was archived in August 2024. This `diff-match-patch` PyPI package now tracks a community-maintained fork. Users should be aware that Google no longer actively maintains the original repository.
- gotcha When using `patch_fromText()` with unidiff strings, issues with line breaks (`%0A`) can occur if the patch string was not originally generated by `dmp.patch_make()`. The library might interpret line breaks differently, leading to incorrect patch application.
- gotcha The line-diffing algorithm, when used for performance optimization (e.g., in `diff_main` with `checklines=True`), can produce incorrect patches for files exceeding approximately 65,536 lines. This is due to a limitation in mapping lines to 16-bit Unicode characters, causing an overflow.
- gotcha `diff_cleanupSemantic()` improves human readability but uses heuristics. It may not provide semantically perfect or 'correct' differences for all text types, especially in complex natural language processing contexts, as it relies on surface patterns rather than deep linguistic analysis.
- gotcha Diff computations, especially for large or complex texts, can be time-consuming. The `Diff_Timeout` property (defaulting to 1.0 second) can prematurely terminate the 'exploration phase' of a diff, leading to potentially suboptimal or incomplete results.
Install
-
pip install diff-match-patch
Imports
- diff_match_patch
from diff_match_patch import diff_match_patch
Quickstart
from diff_match_patch import diff_match_patch
# Initialize the diff_match_patch object
dmp = diff_match_patch()
text1 = "The quick brown fox jumps over the lazy dog."
text2 = "A quick black fox jumps over the active cat."
# 1. Compute a diff
diffs = dmp.diff_main(text1, text2)
# Optional: Clean up the diff for semantic readability
dmp.diff_cleanupSemantic(diffs)
print(f"Computed Diffs: {diffs}")
# Expected output example: [(-1, 'The'), (1, 'A'), (0, ' quick '), (-1, 'brown'), (1, 'black'), (0, ' fox jumps over the '), (-1, 'lazy dog'), (1, 'active cat'), (0, '.')]
# 2. Generate a patch from the diffs
patches = dmp.patch_make(text1, text2, diffs)
patch_text = dmp.patch_toText(patches)
print(f"\nGenerated Patch: {patch_text}")
# 3. Apply the patch to an original text
# Let's simulate applying it to text1 to get text2
new_text, results = dmp.patch_apply(patches, text1)
print(f"\nApplied Patch (New Text): {new_text}")
print(f"Patch Application Results: {results}")