{"id":1990,"library":"diff-match-patch","title":"Diff Match Patch","description":"Google's Diff Match and Patch libraries offer robust algorithms for synchronizing plain text, including diffing two texts, finding fuzzy matches for a pattern, and applying patches. Originally developed for Google Docs, this Python package provides a modern, actively maintained wrapper around the core algorithms. It's suitable for comparing texts, showing differences, and applying changes.","status":"active","version":"20241021","language":"en","source_language":"en","source_url":"https://pypi.org/project/diff-match-patch/","tags":["diff","patch","text comparison","string operations","merge","text processing"],"install":[{"cmd":"pip install diff-match-patch","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"diff_match_patch","correct":"from diff_match_patch import diff_match_patch"}],"quickstart":{"code":"from diff_match_patch import diff_match_patch\n\n# Initialize the diff_match_patch object\ndmp = diff_match_patch()\n\ntext1 = \"The quick brown fox jumps over the lazy dog.\"\ntext2 = \"A quick black fox jumps over the active cat.\"\n\n# 1. Compute a diff\ndiffs = dmp.diff_main(text1, text2)\n\n# Optional: Clean up the diff for semantic readability\ndmp.diff_cleanupSemantic(diffs)\n\nprint(f\"Computed Diffs: {diffs}\")\n# Expected output example: [(-1, 'The'), (1, 'A'), (0, ' quick '), (-1, 'brown'), (1, 'black'), (0, ' fox jumps over the '), (-1, 'lazy dog'), (1, 'active cat'), (0, '.')] \n\n# 2. Generate a patch from the diffs\npatches = dmp.patch_make(text1, text2, diffs)\npatch_text = dmp.patch_toText(patches)\nprint(f\"\\nGenerated Patch: {patch_text}\")\n\n# 3. Apply the patch to an original text\n# Let's simulate applying it to text1 to get text2\nnew_text, results = dmp.patch_apply(patches, text1)\n\nprint(f\"\\nApplied Patch (New Text): {new_text}\")\nprint(f\"Patch Application Results: {results}\")","lang":"python","description":"This quickstart demonstrates the core functionalities: computing differences between two texts, generating a patch from these differences, and then applying that patch to an original text. The `diff_cleanupSemantic` method is optionally used to improve readability of the diff output."},"warnings":[{"fix":"Continue using this `diff-match-patch` PyPI package, as it is designed to track the actively maintained fork. Monitor the PyPI project page for updates on the new upstream source.","message":"The original Google diff-match-patch project was archived in August 2024. This `diff-match-patch` PyPI package now tracks a community-maintained fork. Users should be aware that Google no longer actively maintains the original repository.","severity":"gotcha","affected_versions":"All versions since August 2024"},{"fix":"Ensure that any patch string fed into `dmp.patch_fromText()` is either generated by `dmp.patch_make()` or carefully formatted to match its expected unidiff string representation, especially concerning line endings.","message":"When using `patch_fromText()` with unidiff strings, issues with line breaks (`%0A`) can occur if the patch string was not originally generated by `dmp.patch_make()`. The library might interpret line breaks differently, leading to incorrect patch application.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For very large files (tens of thousands of lines or more), consider splitting the input into smaller chunks, or disable line-mode optimizations if precise diffing is critical and performance is secondary. Reportedly, the issue stems from an ES5 limitation, but the Python wrapper might inherit this behavior.","message":"The line-diffing algorithm, when used for performance optimization (e.g., in `diff_main` with `checklines=True`), can produce incorrect patches for files exceeding approximately 65,536 lines. This is due to a limitation in mapping lines to 16-bit Unicode characters, causing an overflow.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Understand that `diff_cleanupSemantic()` is for presentation. For applications requiring strict, byte-level, or programmatically consistent diffs, consider using the raw output of `diff_main()` or applying other cleanup methods like `diff_cleanupEfficiency()` if appropriate.","message":"`diff_cleanupSemantic()` improves human readability but uses heuristics. It may not provide semantically perfect or 'correct' differences for all text types, especially in complex natural language processing contexts, as it relies on surface patterns rather than deep linguistic analysis.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Adjust `dmp.Diff_Timeout` to a higher value (e.g., `dmp.Diff_Timeout = 0` for no timeout, or a larger number of seconds) if computation time is acceptable and more accurate, exhaustive diffs are required.","message":"Diff computations, especially for large or complex texts, can be time-consuming. The `Diff_Timeout` property (defaulting to 1.0 second) can prematurely terminate the 'exploration phase' of a diff, leading to potentially suboptimal or incomplete results.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}