h5py
The h5py package provides a Pythonic interface to the HDF5 binary data format, allowing users to store and manipulate large amounts of numerical data efficiently, often integrating seamlessly with NumPy arrays. It offers both high-level and low-level access to HDF5 files, datasets, and groups. The current version is 3.16.0, with development actively maintained through frequent releases.
Warnings
- breaking The default mode for opening HDF5 files changed from read/write to read-only ('r') in h5py 3.0. Attempting to write without explicitly setting a write-enabled mode (e.g., 'w', 'a', 'r+') will result in an error.
- breaking h5py 3.0 and newer versions dropped support for Python 2.7. Python 3.6 or above is now required. For h5py 3.12, Python 3.9 or newer is required. For h5py 3.15, Python 3.10 or newer is required.
- deprecated The `Dataset.value` property, which would dump the entire dataset into a NumPy array, was deprecated in h5py 2.0 and later removed in h5py 3.0. Using it will lead to errors in recent versions.
- gotcha HDF5 files must be explicitly closed to ensure data integrity, especially after writing. Failing to do so can lead to corrupted files or unreleased file handles.
- gotcha The default `dtype` for `group.create_dataset()` is `numpy.float32` ('f'), which is different from NumPy's default `numpy.float64`. This can cause silent data type changes and potential precision loss if not explicitly specified.
- gotcha Using h5py with multiple threads (Python's `threading` module) will not provide parallel performance for HDF5 operations. The underlying `libhdf5` C library is generally not thread-safe, and h5py uses a global Python lock to serialize access to the HDF5 C API, preventing simultaneous calls.
Install
-
pip install h5py
Imports
- File
import h5py # ... h5py.File(...)
Quickstart
import h5py
import numpy as np
import os
file_path = 'my_data.h5'
# Create a new HDF5 file (mode 'w' will overwrite if exists)
with h5py.File(file_path, 'w') as f:
# Create a group (like a directory)
group = f.create_group('my_group')
# Create a dataset within the group (like a NumPy array)
data = np.arange(100).reshape(10, 10)
dset = group.create_dataset('dataset_1', data=data)
# Add attributes to the dataset (metadata)
dset.attrs['units'] = 'arbitrary'
dset.attrs['description'] = 'Sample 2D integer array'
# You can also create datasets directly at the root level
f.create_dataset('another_dataset', data=np.random.rand(5))
print(f"File '{file_path}' created successfully.")
# Read data from the HDF5 file
with h5py.File(file_path, 'r') as f:
# List all top-level objects
print(f"\nKeys in file: {list(f.keys())}")
# Access a group
group_read = f['my_group']
print(f"Keys in 'my_group': {list(group_read.keys())}")
# Access a dataset
dset_read = group_read['dataset_1']
# Read data into memory (using array-style slicing for the whole dataset)
read_data = dset_read[()]
print(f"\nShape of read_data: {read_data.shape}")
print(f"First 5 elements of read_data: {read_data.flatten()[:5]}")
# Access attributes
print(f"Units attribute: {dset_read.attrs['units']}")
# Read a slice of the data
slice_data = dset_read[0:5, 0:5]
print(f"Slice (0:5, 0:5) of dataset_1:\n{slice_data}")
# Clean up the created file
os.remove(file_path)