Diff Parser
diff-parser is a Python package designed for parsing and representing diff files, specifically supporting git diff data or .diff file formats. It provides structured access to various properties for each changed file, including filenames, file paths, source-hashes, target-hashes, and line counts. Currently at version 1.1, the library appears actively maintained with recent updates, though it doesn't follow a strict release cadence, releasing on demand.
Warnings
- gotcha The `Diff` class constructor strictly expects a file path as input, not a raw diff string. To parse a string, it must first be written to a temporary file.
- gotcha The library primarily targets 'git diff data or .diff file' format. Variations in diff formats (e.g., from different VCS or `difflib`) might not be fully supported or could lead to unexpected parsing behavior. The documentation does not specify robust error handling for malformed diff inputs.
Install
-
pip install diff-parser
Imports
- Diff
from diff_parser import Diff
Quickstart
import tempfile
import os
from diff_parser import Diff
# Create a dummy diff string representing file changes
diff_content = """diff --git a/old_file.py b/new_file.py
index 1234567..890abcd 100644
--- a/old_file.py
+++ b/new_file.py
@@ -1,3 +1,4 @@
# This is an old line
-print("Hello, old world!")
+print("Hello, new world!")
+print("Another new line")
# End of file
"""
# diff-parser expects a file path, so we write the diff content to a temporary file.
temp_dir = tempfile.mkdtemp()
diff_file_path = os.path.join(temp_dir, "example.diff")
try:
with open(diff_file_path, "w") as f:
f.write(diff_content)
# Initialize the Diff parser with the path to the diff file
diff = Diff(diff_file_path)
# Iterate through each changed file block in the diff
for block in diff:
print(f"--- File Change ---")
print(f"Old filename: {block.old_filename}")
print(f"New filename: {block.new_filename}")
print(f"New filepath: {block.new_filepath}")
print(f"Lines added: {block.added_lines_count}")
print(f"Lines removed: {block.removed_lines_count}")
# Accessing individual hunks and lines is also possible:
# for hunk in block.hunks:
# for line in hunk.lines:
# print(f" Line type: {line.type}, Content: {line.value.strip()}")
finally:
# Clean up the temporary file and directory
os.remove(diff_file_path)
os.rmdir(temp_dir)