Checksumdir: Directory Hashing Utility
Checksumdir is a Python library designed to compute a single cryptographic hash for the contents of a given directory. It primarily focuses on the file contents, ignoring metadata by default. The current version is 1.3.0 and it maintains a sporadic or infrequent release cadence, with the latest PyPI release dated August 15, 2025.
Warnings
- gotcha By default, `dirhash` computes a hash based *only* on the contents of the files within a directory, ignoring file names, directory structure, and metadata like timestamps. This means renaming a file or moving it to a subdirectory will not change the hash if its content remains the same.
- gotcha Calling `checksumdir.dirhash()` with a path to a directory that does not exist will raise a `FileNotFoundError`.
- gotcha Hashing very large directories or directories with a vast number of small files can be I/O and CPU intensive, potentially leading to slow performance. Consider the scale of directories being hashed in performance-critical applications.
- breaking Checksumdir version 1.3.0 and later explicitly requires Python 3.9 or newer. Users on older Python versions (e.g., Python 3.8 or earlier) must use an older version of the library or upgrade their Python environment.
Install
-
pip install checksumdir
Imports
- dirhash
from checksumdir import dirhash
Quickstart
import checksumdir
import os
import tempfile
# Create a temporary directory and some files for demonstration
with tempfile.TemporaryDirectory() as tmpdir:
print(f"Created temporary directory: {tmpdir}")
os.makedirs(os.path.join(tmpdir, 'subdir'), exist_ok=True)
with open(os.path.join(tmpdir, 'file1.txt'), 'w') as f:
f.write('content one')
with open(os.path.join(tmpdir, 'subdir', 'file2.txt'), 'w') as f:
f.write('content two')
# Calculate MD5 hash of the directory contents (default ignores filenames/paths)
md5_hash = checksumdir.dirhash(tmpdir, 'md5')
print(f"MD5 hash (content only): {md5_hash}")
# Calculate SHA1 hash, including filenames in the hash calculation
sha1_hash_with_names = checksumdir.dirhash(tmpdir, 'sha1', hash_filename=True)
print(f"SHA1 hash (including filenames): {sha1_hash_with_names}")
# Example with exclusion
with open(os.path.join(tmpdir, 'temp_log.log'), 'w') as f:
f.write('temporary log content')
# Calculate MD5, excluding files ending with .log
md5_hash_excluded = checksumdir.dirhash(tmpdir, 'md5', excluded_extensions=['.log'])
print(f"MD5 hash (excluding .log files): {md5_hash_excluded}")
# The temporary directory is automatically cleaned up here