h5py

raw JSON →
3.16.0 verified Tue May 12 auth: no python install: verified quickstart: verified

The h5py package provides a Pythonic interface to the HDF5 binary data format, allowing users to store and manipulate large amounts of numerical data efficiently, often integrating seamlessly with NumPy arrays. It offers both high-level and low-level access to HDF5 files, datasets, and groups. The current version is 3.16.0, with development actively maintained through frequent releases.

pip install h5py
error ModuleNotFoundError: No module named 'h5py'
cause The h5py library is not installed in the Python environment being used, or the environment where it's installed is not activated.
fix
Install h5py using pip: pip install h5py or using conda: conda install h5py.
error OSError: Unable to open file (File signature not found)
cause This error indicates that the file you are trying to open is either corrupted, not a valid HDF5 file, or was improperly downloaded.
fix
Verify the file's integrity and ensure it is a legitimate HDF5 file. Try re-downloading the file if it came from an external source.
error KeyError: "Unable to open object (object 'data' doesn't exist)"
cause This error occurs when you try to access a dataset or group within an HDF5 file that does not exist at the specified path.
fix
Check the exact path and name of the dataset or group. Use file.keys() or list(file.keys()) to see top-level objects, and group.keys() to inspect contents of groups, to ensure the object name is correct.
error ValueError: Object dtype dtype('O') has no native HDF5 equivalent
cause h5py cannot directly store generic Python objects or NumPy arrays with `dtype=object` (which can hold mixed types like lists, dictionaries, or arbitrary Python objects) in an HDF5 dataset, as HDF5 is fundamentally designed for homogeneous numerical data.
fix
Convert your data to a supported NumPy numerical dtype (e.g., np.float32, np.int64) or a fixed-length or variable-length string dtype using h5py.string_dtype() before writing. If storing complex Python objects, you may need to serialize them (e.g., using pickle) and store them as byte strings.
error AttributeError: 'Group' object has no attribute 'dtype'
cause You are attempting to access a dataset-specific attribute (like `dtype`, `shape`, or `value`/`[:]`) on an `h5py.Group` object, which is a container for other HDF5 objects (datasets or other groups), not a dataset itself.
fix
Ensure you are accessing an actual h5py.Dataset object. Iterate through the group's contents and use isinstance() to distinguish between h5py.Group and h5py.Dataset objects before attempting to read data or access dataset-specific attributes. To get the data from a dataset, use dataset[...] or dataset[()].
breaking The default mode for opening HDF5 files changed from read/write to read-only ('r') in h5py 3.0. Attempting to write without explicitly setting a write-enabled mode (e.g., 'w', 'a', 'r+') will result in an error.
fix Always explicitly specify the file mode (e.g., `h5py.File('file.h5', 'w')` for write, `h5py.File('file.h5', 'a')` for append, `h5py.File('file.h5', 'r+')` for read/write).
breaking h5py 3.0 and newer versions dropped support for Python 2.7. Python 3.6 or above is now required. For h5py 3.12, Python 3.9 or newer is required. For h5py 3.15, Python 3.10 or newer is required.
fix Upgrade your Python environment to 3.10 or newer to use current h5py versions.
deprecated The `Dataset.value` property, which would dump the entire dataset into a NumPy array, was deprecated in h5py 2.0 and later removed in h5py 3.0. Using it will lead to errors in recent versions.
fix Use NumPy-style slicing to read the entire dataset: `mydataset[()]` or `mydataset[...]`.
gotcha HDF5 files must be explicitly closed to ensure data integrity, especially after writing. Failing to do so can lead to corrupted files or unreleased file handles.
fix Always use the `with h5py.File(...) as f:` context manager, which ensures the file is closed even if errors occur.
gotcha The default `dtype` for `group.create_dataset()` is `numpy.float32` ('f'), which is different from NumPy's default `numpy.float64`. This can cause silent data type changes and potential precision loss if not explicitly specified.
fix Explicitly specify the desired `dtype` when creating datasets, e.g., `group.create_dataset('name', data=my_array, dtype=np.float64)` or `group.create_dataset('name', shape=(...), dtype='f8')` for double precision.
gotcha Using h5py with multiple threads (Python's `threading` module) will not provide parallel performance for HDF5 operations. The underlying `libhdf5` C library is generally not thread-safe, and h5py uses a global Python lock to serialize access to the HDF5 C API, preventing simultaneous calls.
fix For parallel I/O, consider using multiprocessing with explicit file closing in each process, or compile h5py and HDF5 with MPI support and use `mpi4py` for true Parallel HDF5 (for writing). Parallel *read* access is generally safe from separate processes.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.33s 105.2M
3.10 alpine (musl) - - 0.26s 105.2M
3.10 slim (glibc) wheel 3.9s 0.24s 102M
3.10 slim (glibc) - - 0.21s 102M
3.11 alpine (musl) wheel - 0.45s 113.0M
3.11 alpine (musl) - - 0.42s 113.0M
3.11 slim (glibc) wheel 3.7s 0.42s 109M
3.11 slim (glibc) - - 0.36s 109M
3.12 alpine (musl) wheel - 0.40s 102.8M
3.12 alpine (musl) - - 0.34s 102.8M
3.12 slim (glibc) wheel 3.6s 0.40s 98M
3.12 slim (glibc) - - 0.35s 98M
3.13 alpine (musl) wheel - 0.32s 102.3M
3.13 alpine (musl) - - 0.32s 102.2M
3.13 slim (glibc) wheel 3.7s 0.37s 98M
3.13 slim (glibc) - - 0.35s 98M
3.9 alpine (musl) build_error - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) wheel 4.6s 0.30s 110M
3.9 slim (glibc) - - 0.22s 110M

This quickstart demonstrates how to create an HDF5 file, add groups and datasets, store NumPy arrays, attach metadata as attributes, and then read the data and attributes back. It emphasizes using context managers (`with h5py.File(...)`) for proper file handling.

import h5py
import numpy as np
import os

file_path = 'my_data.h5'

# Create a new HDF5 file (mode 'w' will overwrite if exists)
with h5py.File(file_path, 'w') as f:
    # Create a group (like a directory)
    group = f.create_group('my_group')
    
    # Create a dataset within the group (like a NumPy array)
    data = np.arange(100).reshape(10, 10)
    dset = group.create_dataset('dataset_1', data=data)
    
    # Add attributes to the dataset (metadata)
    dset.attrs['units'] = 'arbitrary'
    dset.attrs['description'] = 'Sample 2D integer array'
    
    # You can also create datasets directly at the root level
    f.create_dataset('another_dataset', data=np.random.rand(5))

print(f"File '{file_path}' created successfully.")

# Read data from the HDF5 file
with h5py.File(file_path, 'r') as f:
    # List all top-level objects
    print(f"\nKeys in file: {list(f.keys())}")
    
    # Access a group
    group_read = f['my_group']
    print(f"Keys in 'my_group': {list(group_read.keys())}")
    
    # Access a dataset
    dset_read = group_read['dataset_1']
    
    # Read data into memory (using array-style slicing for the whole dataset)
    read_data = dset_read[()]
    print(f"\nShape of read_data: {read_data.shape}")
    print(f"First 5 elements of read_data: {read_data.flatten()[:5]}")
    
    # Access attributes
    print(f"Units attribute: {dset_read.attrs['units']}")
    
    # Read a slice of the data
    slice_data = dset_read[0:5, 0:5]
    print(f"Slice (0:5, 0:5) of dataset_1:\n{slice_data}")

# Clean up the created file
os.remove(file_path)