{"id":1991,"library":"dirhash","title":"dirhash: Directory Hashing Utility","description":"dirhash is a Python module and CLI tool for computing the hash of file system directories based on their structure and content. It supports all hashing algorithms available in Python's `hashlib` module, offers `.gitignore`-style glob/wildcard path matching for filtering files, and leverages multiprocessing for performance. The library computes hashes according to the Dirhash Standard, aiming for consistent and collision-resistant directory hash generation. It is actively maintained with irregular, feature-driven releases, currently at version 0.5.0.","status":"active","version":"0.5.0","language":"en","source_language":"en","source_url":"https://github.com/andhus/dirhash-python","tags":["hashing","directory","filesystem","integrity","checksum","cli"],"install":[{"cmd":"pip install dirhash","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"Required for experimental Windows support (introduced in v0.5.0).","package":"scantree","optional":false},{"reason":"Used for .gitignore-style glob/wildcard path matching.","package":"pathspec","optional":false}],"imports":[{"symbol":"dirhash","correct":"from dirhash import dirhash"}],"quickstart":{"code":"import os\nimport tempfile\nimport shutil\nfrom dirhash import dirhash\n\n# Create a temporary directory structure for demonstration\nwith tempfile.TemporaryDirectory() as tmpdir:\n    test_dir = os.path.join(tmpdir, 'my_project')\n    os.makedirs(os.path.join(test_dir, 'src'))\n    os.makedirs(os.path.join(test_dir, 'data'))\n\n    with open(os.path.join(test_dir, 'src', 'main.py'), 'w') as f:\n        f.write('print(\"Hello, dirhash!\")')\n\n    with open(os.path.join(test_dir, 'data', 'config.json'), 'w') as f:\n        f.write('{\"key\": \"value\"}')\n\n    with open(os.path.join(test_dir, '.gitignore'), 'w') as f:\n        f.write('*.json')\n\n    # Calculate the MD5 hash of the entire directory\n    full_md5_hash = dirhash(test_dir, 'md5')\n    print(f\"MD5 hash of {test_dir}: {full_md5_hash}\")\n\n    # Calculate SHA1 hash, excluding .json files using .gitignore style patterns\n    sha1_hash_no_json = dirhash(test_dir, 'sha1', ignore=['*.json'])\n    print(f\"SHA1 hash (excluding *.json): {sha1_hash_no_json}\")\n\n    # Calculate SHA256 hash, only including .py files\n    sha256_hash_only_py = dirhash(test_dir, 'sha256', match=['*.py'])\n    print(f\"SHA256 hash (only *.py): {sha256_hash_only_py}\")\n\n    # Demonstrate including empty directories (default is to exclude if no content included by filters)\n    # First, a hash without explicitly including empty dirs\n    empty_dir_path = os.path.join(test_dir, 'empty_folder')\n    os.makedirs(empty_dir_path)\n    hash_without_empty = dirhash(test_dir, 'md5')\n    print(f\"MD5 hash (without explicit empty dirs): {hash_without_empty}\")\n    # Now, a hash explicitly including empty dirs\n    hash_with_empty = dirhash(test_dir, 'md5', empty_dirs=True)\n    print(f\"MD5 hash (with empty dirs): {hash_with_empty}\")\n\n    # Cleanup is handled by TemporaryDirectory\n","lang":"python","description":"This quickstart demonstrates how to compute directory hashes using different algorithms and filtering options (`match` and `ignore` for `.gitignore` style patterns) using the `dirhash` function. It also shows the effect of including empty directories."},"warnings":[{"fix":"Review and test any existing directory hashing logic, especially those relying on `match` or `ignore` patterns, to ensure the new `.gitignore`-aligned behavior is as expected.","message":"The `pathspec` dependency's upper version limit was removed in `v0.4.0`. This change means that `dirhash` now uses `pathspec` versions greater than `0.10.0`. This update alters how some match/ignore patterns are treated, aligning behavior with `.gitignore` standards. Users upgrading from `v0.3.0` or earlier might observe different hash results for directories with complex filtering patterns.","severity":"breaking","affected_versions":">=0.4.0"},{"fix":"Users migrating from `v0.1.x` versions must thoroughly review the API and expected hash outputs against the new 'Dirhash Standard' implementation in `v0.2.0` and later.","message":"Version `0.2.0` introduced 'significant breaking changes' from `v0.1.1`, primarily by adopting the formal 'Dirhash Standard'. This was a fundamental re-implementation that likely affected API calls and internal hash calculation logic.","severity":"breaking","affected_versions":">=0.2.0"},{"fix":"Ensure your project is running on Python 3.8 or a newer compatible version.","message":"Python 2.7 support was officially dropped in version `0.3.0`. The library now requires Python 3.8 or newer.","severity":"deprecated","affected_versions":">=0.3.0"},{"fix":"Report any Windows-specific issues to the project's GitHub repository. Consider thorough testing on Windows environments before using in production.","message":"Windows support in `v0.5.0` is marked as 'experimental'. While `scantree>=0.0.4` was added for this purpose, users on Windows platforms may encounter platform-specific issues.","severity":"gotcha","affected_versions":"0.5.0"},{"fix":"Carefully configure `empty_dirs`, `no_linked_dirs`, `no_linked_files`, `allow_cyclic_links`, and `properties` arguments to precisely define what should be included in the hash, especially in environments with complex file structures or symbolic links. Refer to the 'Dirhash Standard' documentation for detailed behavior.","message":"The default behavior regarding symbolic links, empty directories, and which file/directory properties (`name`, `data`, `is_link`) are included in the hash can significantly impact the resulting hash. Misunderstanding these options can lead to inconsistent or unexpected hash values. By default, `name` and `data` are included, but `is_link` is not.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}