hdf5plugin
hdf5plugin is a Python library that provides additional HDF5 compression filters for use with h5py, enabling reading and writing of compressed datasets with various algorithms like Blosc, Bitshuffle, LZ4, Zstd, and more. It is actively maintained with frequent releases, currently at version 6.0.0, and often updates its embedded compression libraries and introduces new filters.
Warnings
- breaking Version 6.0.0 requires Python >= 3.9. Prior versions supported older Python versions (e.g., v5.0.0 required >=3.8, v4.0.0 required >=3.7).
- breaking Version 5.0.0 requires h5py >= 3.0.0. This was a significant bump from previous versions.
- breaking Deprecated constants `hdf5plugin.config`, `hdf5plugin.date`, `hdf5plugin.hexversion`, and `hdf5plugin.strictversion` were removed in version 5.0.0.
- deprecated The SZ filter has been deprecated in version 6.0.0. While it might still function, its continued support is not guaranteed.
- gotcha Data compressed with newer versions of the H5Z-ZFP filter (e.g., v1.1.0 in hdf5plugin v4.0.0) might not be readable by older versions of the filter, though newer versions can read older data.
- gotcha Some advanced Blosc2 compression codecs (e.g., blosc2-grok, blosc2-openhtj2k) might require additional, separately installed plugins to be present in the HDF5_PLUGIN_PATH environment variable for decompression to work correctly. hdf5plugin itself may not bundle all possible Blosc2 sub-filters.
- gotcha Poorly chosen HDF5 chunking strategies (e.g., very large/small chunks, or chunk shapes misaligned with common access patterns) can significantly degrade performance, even with efficient compression filters. This is a general HDF5/h5py concern but applies directly to hdf5plugin usage.
Install
-
pip install hdf5plugin -
pip install hdf5plugin --no-binary hdf5plugin
Imports
- hdf5plugin
import hdf5plugin
- h5py
import h5py
Quickstart
import numpy
import h5py
import hdf5plugin
# Create a dummy dataset
data_to_write = numpy.arange(100, dtype='i4')
# Write compressed data to an HDF5 file using an hdf5plugin filter (e.g., LZ4)
file_name = 'test_compressed.h5'
with h5py.File(file_name, 'w') as f:
dset = f.create_dataset('data', data=data_to_write, compression=hdf5plugin.LZ4())
print(f"Dataset 'data' written to {file_name} with LZ4 compression.")
# Read the compressed data back
with h5py.File(file_name, 'r') as f:
read_data = f['data'][()]
print(f"Data read successfully: {read_data}")
assert numpy.array_equal(data_to_write, read_data)
print("Original and read data match.")