h5netcdf: NetCDF4 via h5py
h5netcdf is an open-source Python package that provides an interface for the netCDF4 file-format, reading and writing local or remote HDF5 files directly via h5py or h5pyd. It aims to offer netCDF4 capabilities without relying on the Unidata netCDF C library. The current version is 1.8.1, and it maintains a regular release cadence, with recent patch and minor releases occurring every few months.
Warnings
- breaking With h5py version 3.0+, the default behavior for decoding variable-length strings changed from automatically decoding to UTF-8 strings to returning arrays of bytes. To restore the automatic decoding behavior that matches the legacy h5py API and netCDF4-python, explicitly set `decode_vlen_strings=True` in the `h5netcdf.File` constructor.
- breaking The `track_order` parameter's default behavior changed in h5netcdf 1.1.0 to `True` (if h5py >= 3.7.0 is detected) for *newly created* netCDF4 files. This ensures compatibility with netCDF4-c. However, files created with older versions of h5netcdf (e.g., 1.0.2 and older, except for 0.13.0) where `track_order=False` was effectively or explicitly set, will continue to open with order tracking disabled in newer h5netcdf versions, potentially leading to interoperability issues if external netCDF4-c tools expect ordered dimensions/variables.
- gotcha By default, `h5netcdf` raises a `CompatibilityError` if you attempt to write HDF5 features (like certain data types or arbitrary filters) that are not considered valid NetCDF4 by other tools. While these are valid HDF5, they break NetCDF compatibility. In versions prior to 0.7.3, this was merely a warning.
- gotcha If you access variables in an HDF5 file that have no dimension scale associated with one of their axes, `h5netcdf` will raise a `ValueError`. This often occurs with non-NetCDF HDF5 files.
- gotcha When using the new API, automatic resizing of unlimited dimensions with array indexing (e.g., `variable[i, :] = data`) is *not* available. This differs from the `netCDF4-python` library's behavior.
- gotcha Repeated access to properties that rely on the underlying `_h5ds` HDF5 dataset object can be costly in terms of performance, as `_h5ds` is created on demand. This can impact workflows that frequently query properties like `variable.shape` in a loop.
- gotcha If you initialize `h5netcdf.File` by passing an existing `h5py.File` object (e.g., `h5netcdf.File(h5py_file_obj)`), closing the `h5netcdf.File` wrapper will *not* close the underlying `h5py.File` object. However, if the file is opened by path (e.g., `h5netcdf.File('mydata.nc')`), closing the `h5netcdf.File` *will* close the underlying HDF5 file.
Install
-
pip install h5netcdf -
pip install h5netcdf[h5py] -
pip install h5netcdf[pyfive] -
pip install h5netcdf[h5pyd]
Imports
- File
from h5netcdf import File
- Dataset
from h5netcdf.legacyapi import Dataset
Quickstart
import h5netcdf
import numpy as np
import os
file_path = 'my_test_data.nc'
# Write data using the new API
with h5netcdf.File(file_path, 'w') as f:
f.dimensions = {'x': 5, 'y': 3}
var = f.create_variable('temperature', ('x', 'y'), 'f4')
var[:] = np.random.rand(5, 3)
var.units = 'Kelvin'
f.create_group('forecast_data')
print(f"Successfully wrote data to {file_path}")
# Read data
with h5netcdf.File(file_path, 'r') as f:
print(f"Dimensions: {list(f.dimensions.keys())}")
temp = f.variables['temperature']
print(f"Variable 'temperature' shape: {temp.shape}")
print(f"Variable 'temperature' units: {temp.units}")
print(f"First few values: {temp[:2, :2]}")
if 'forecast_data' in f.groups:
print("Group 'forecast_data' exists.")
# Clean up
os.remove(file_path)