{"id":9552,"library":"bdbag","title":"Big Data Bag Utilities","description":"bdbag is a Python library that extends the BagIt specification (RFC 8493) with features for big data, focusing on FAIR data principles. It enables creation, validation, and manipulation of data bags, supporting checksums, remote payload manifests, and integration with HDF5 and various compression formats. The current version is 1.8.0, and the library is actively maintained with releases as needed for bug fixes and feature enhancements.","status":"active","version":"1.8.0","language":"en","source_language":"en","source_url":"https://github.com/fair-research/bdbag","tags":["data packaging","bagit","FAIR data","checksums","HDF5","data integrity","research data"],"install":[{"cmd":"pip install bdbag","lang":"bash","label":"Install bdbag"}],"dependencies":[{"reason":"Core dependency for implementing the BagIt specification.","package":"bagit"},{"reason":"Used for fetching remote manifests and data payloads.","package":"requests"},{"reason":"Required for HDF5 data handling features.","package":"h5py"}],"imports":[{"symbol":"BDBag","correct":"from bdbag import BDBag"},{"note":"The high-level API functions are exposed via `bdbag_api` directly from the top-level package, not `bdbag.api`.","wrong":"import bdbag.api","symbol":"bdbag_api","correct":"from bdbag import bdbag_api"}],"quickstart":{"code":"import os\nimport shutil\nfrom bdbag import bdbag_api\n\n# Define paths for the bag\nbag_dir = \"my_test_bag\"\ndata_dir = os.path.join(bag_dir, \"data\")\ntest_file_path = os.path.join(data_dir, \"example.txt\")\n\n# Clean up previous run if directory exists\nif os.path.exists(bag_dir):\n    shutil.rmtree(bag_dir)\n\n# 1. Create data directory for the bag payload\nos.makedirs(data_dir, exist_ok=True)\n\n# 2. Create some data to put into the bag\nwith open(test_file_path, \"w\") as f:\n    f.write(\"This is some example data for the bdbag.\\n\")\n    f.write(\"It will be bagged and validated.\\n\")\n\nprint(f\"Created test data at: {test_file_path}\")\n\ntry:\n    # 3. Create the bag\n    # The data_directory argument tells bdbag where to find the payload files\n    # and move/link them into the bag's 'data' directory.\n    bag = bdbag_api.make_bag(bag_dir,\n                             checksum_algorithms=['sha256'],\n                             data_directory=data_dir)\n    print(f\"Bag created successfully at: {bag.path}\")\n\n    # 4. Validate the bag\n    is_valid = bdbag_api.validate_bag(bag_dir)\n    if is_valid:\n        print(f\"Bag '{bag_dir}' is valid.\")\n    else:\n        print(f\"Bag '{bag_dir}' is NOT valid. Check logs for details.\")\n\nexcept Exception as e:\n    print(f\"An error occurred during bag creation or validation: {e}\")\nfinally:\n    # Optional: Clean up the created bag directory\n    # Uncomment the line below to remove the directory after inspection\n    # if os.path.exists(bag_dir):\n    #     shutil.rmtree(bag_dir)\n    pass\n","lang":"python","description":"This quickstart demonstrates how to create a simple bdbag, add a data file to it, and then validate its integrity. It sets up a temporary directory, writes a small file, and uses `bdbag_api.make_bag` to create the bag and `bdbag_api.validate_bag` to check its validity."},"warnings":[{"fix":"Ensure your Python environment is within the supported range (3.8-3.11). Use virtual environments (e.g., `venv` or `conda`) to manage specific Python versions for your projects.","message":"bdbag strictly enforces Python version compatibility. It requires Python versions 3.8 through 3.11. Using it with unsupported versions (e.g., Python 3.7 or Python 3.12+) can lead to `ImportError`, `ModuleNotFoundError`, or unexpected runtime errors due to dependency conflicts or syntax incompatibilities.","severity":"gotcha","affected_versions":"<1.8.0,>=3.8,<3.12"},{"fix":"For creating bags with existing data, always specify the `data_directory` argument pointing to the source of your payload data. For empty bags, omit `data_directory` and use bag manipulation methods.","message":"When creating a bag, if you provide a `data_directory` argument to `bdbag_api.make_bag`, bdbag expects this directory to contain the actual data files you want to include. It will then manage moving/linking these files into the bag's internal `data/` directory. If `data_directory` is omitted, `bdbag_api.make_bag` creates an empty bag, and you'll need to manually add files using `bag.add_file()` or similar methods.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Always specify `checksum_algorithms=['md5', 'sha256']` (or your preferred list) when calling `bdbag_api.make_bag` if the default SHA256 is not sufficient or you need multiple algorithms.","message":"By default, `bdbag` uses SHA256 checksums for payload files if `checksum_algorithms` is not specified during bag creation. If you need specific checksum algorithms (e.g., MD5, SHA1), always explicitly pass them as a list to the `checksum_algorithms` parameter in `bdbag_api.make_bag`.","severity":"gotcha","affected_versions":">=1.0.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure all files referenced in the bag's payload manifest actually exist at the locations bdbag expects them to be, relative to the `data_directory` or when using `add_file`.","cause":"You tried to add a file to the bag's payload manifest that does not exist at the specified path on the filesystem, or the `data_directory` provided to `make_bag` did not contain the expected files.","error":"FileNotFoundError: [Errno 2] No such file or directory: '/path/to/bag/data/nonexistent_file.txt'"},{"fix":"Inspect the output or logs provided by `bdbag_api.validate_bag` for specific reasons for invalidation. Missing payload files or checksum mismatches are the most frequent causes. Regenerate checksums if files were altered, or ensure all files are present.","cause":"The bag's integrity checks failed. Common causes include missing payload files, incorrect checksums (if files were modified after bag creation), or issues with the bag's tag files (e.g., bag-info.txt, bagit.txt).","error":"BagValidationError: Bag is invalid"},{"fix":"Ensure `bdbag` is correctly installed (`pip install bdbag`). The correct import statement for the `BDBag` class is `from bdbag import BDBag`. If importing high-level functions, use `from bdbag import bdbag_api`.","cause":"This usually happens when attempting to import `BDBag` directly from the `bdbag` module's top level without specifying the correct path, or if `bdbag` itself isn't installed or is installed incorrectly.","error":"ImportError: cannot import name 'BDBag' from 'bdbag'"}]}