bsdiff4 Library
bsdiff4 is a Python library providing functions for binary diff and patch operations, based on the `bsdiff4` algorithm. It allows computing the difference between two byte sequences or files, and then reconstructing the new sequence/file from the old one and the diff. As of version 1.2.6, it provides a stable interface for managing binary data changes.
Warnings
- gotcha All input data (old, new, and diff) for `bsdiff4.diff` and `bsdiff4.patch` functions must be `bytes` objects, not strings. Passing strings will result in `TypeError`.
- gotcha For very large files, `bsdiff4` can be memory intensive, as it often loads entire files or significant portions into memory during diffing and patching operations. Monitor memory usage for performance-critical applications.
- gotcha The `bsdiff4` library implements the `bsdiff4` format, which is a specific version of the bsdiff algorithm. Diffs generated by this library may not be compatible with other bsdiff implementations (e.g., bsdiff3, or other variations) if they use a different format version.
- gotcha On some systems, particularly those without pre-built binary wheels available on PyPI (e.g., specific Linux distributions, older macOS versions, or Windows without development tools), `pip install bsdiff4` might fail due to the need for a C compiler to build its native extensions.
Install
-
pip install bsdiff4
Imports
- diff
import bsdiff4 # bsdiff4.diff(...)
- patch
import bsdiff4 # bsdiff4.patch(...)
- file_diff
import bsdiff4 # bsdiff4.file_diff(...)
- file_patch
import bsdiff4 # bsdiff4.file_patch(...)
Quickstart
import bsdiff4
import os
# Example with bytes
old_data = b"This is the old data string."
new_data = b"This is the new and updated data string."
# Generate a binary diff
diff_data = bsdiff4.diff(old_data, new_data)
print(f"Diff data length: {len(diff_data)} bytes")
# Apply the patch to get back the new data
patched_data = bsdiff4.patch(old_data, diff_data)
assert patched_data == new_data
print(f"Patched data (bytes): {patched_data.decode('utf-8')}\n")
# Example with files
file_old = 'old_file.txt'
file_new = 'new_file.txt'
file_diff = 'diff.bin'
file_patched = 'patched_file.txt'
with open(file_old, 'wb') as f:
f.write(old_data)
with open(file_new, 'wb') as f:
f.write(new_data)
# Generate diff between files
bsdiff4.file_diff(file_old, file_new, file_diff)
print(f"File diff created: {file_diff}")
# Apply patch to recreate new file
bsdiff4.file_patch(file_old, file_patched, file_diff)
with open(file_patched, 'rb') as f:
recreated_data = f.read()
assert recreated_data == new_data
print(f"Patched file created: {file_patched} (content matches new_file.txt)")
# Clean up generated files
os.remove(file_old)
os.remove(file_new)
os.remove(file_diff)
os.remove(file_patched)