Big Data Bag Utilities

1.8.0 · active · verified Fri Apr 17

bdbag is a Python library that extends the BagIt specification (RFC 8493) with features for big data, focusing on FAIR data principles. It enables creation, validation, and manipulation of data bags, supporting checksums, remote payload manifests, and integration with HDF5 and various compression formats. The current version is 1.8.0, and the library is actively maintained with releases as needed for bug fixes and feature enhancements.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a simple bdbag, add a data file to it, and then validate its integrity. It sets up a temporary directory, writes a small file, and uses `bdbag_api.make_bag` to create the bag and `bdbag_api.validate_bag` to check its validity.

import os
import shutil
from bdbag import bdbag_api

# Define paths for the bag
bag_dir = "my_test_bag"
data_dir = os.path.join(bag_dir, "data")
test_file_path = os.path.join(data_dir, "example.txt")

# Clean up previous run if directory exists
if os.path.exists(bag_dir):
    shutil.rmtree(bag_dir)

# 1. Create data directory for the bag payload
os.makedirs(data_dir, exist_ok=True)

# 2. Create some data to put into the bag
with open(test_file_path, "w") as f:
    f.write("This is some example data for the bdbag.\n")
    f.write("It will be bagged and validated.\n")

print(f"Created test data at: {test_file_path}")

try:
    # 3. Create the bag
    # The data_directory argument tells bdbag where to find the payload files
    # and move/link them into the bag's 'data' directory.
    bag = bdbag_api.make_bag(bag_dir,
                             checksum_algorithms=['sha256'],
                             data_directory=data_dir)
    print(f"Bag created successfully at: {bag.path}")

    # 4. Validate the bag
    is_valid = bdbag_api.validate_bag(bag_dir)
    if is_valid:
        print(f"Bag '{bag_dir}' is valid.")
    else:
        print(f"Bag '{bag_dir}' is NOT valid. Check logs for details.")

except Exception as e:
    print(f"An error occurred during bag creation or validation: {e}")
finally:
    # Optional: Clean up the created bag directory
    # Uncomment the line below to remove the directory after inspection
    # if os.path.exists(bag_dir):
    #     shutil.rmtree(bag_dir)
    pass

view raw JSON →