HDMF: Hierarchical Data Modeling Framework
HDMF (Hierarchical Data Modeling Framework) is a Python package designed for standardizing, reading, and writing hierarchical object data. It provides APIs for defining data models, interacting with various storage backends, and representing data using Python objects. Developed as a core component of the Neurodata Without Borders (NWB) project, HDMF offers a flexible and extensible approach to data modeling for scientific communities. The library is actively maintained, with frequent releases, currently at version 5.1.0.
Common errors
-
ModuleNotFoundError: No module named 'hdmf.build.map'
cause The `hdmf.build.map` module was removed in HDMF 4.0.0.fixChange your imports to use `hdmf.build` directly for builder-related components. For example, replace `from hdmf.build.map import TypeMap` with `from hdmf.build import BuildManager` (or the specific class you need). -
AttributeError: 'Container' object has no attribute 'add_child'
cause The `add_child` method was removed in HDMF 4.0.0 as part of API refactoring.fixRevisit the HDMF 4.0.0 changelog and documentation to find the correct method for adding children to containers, or restructure your code to use appropriate container management. -
TypeError: TypeSource() got an unexpected keyword argument 'source_path'
cause `TypeSource` was converted to a frozen dataclass in HDMF 5.0.0, and its constructor arguments may have changed or been refactored.fixConsult the HDMF 5.0.0 changelog and documentation regarding the new `TypeSource` constructor and the refactored `TypeMap.load_namespaces` for updated usage patterns.
Warnings
- breaking HDMF 5.0.0 introduced significant changes to the spec resolution system and `TypeMap` functionality. `TypeMap.load_namespaces` was refactored, `TypeMap.container_types` property was removed, and `TypeSource` became a frozen dataclass.
- breaking HDMF 4.0.0 removed several deprecated classes and methods, including `Array`, `AbstractSortedArray`, `SortedArray`, `LinSpace`, `Query`, `RegionSlicer`, `DataRegion`, `fmt_docval_args`, `call_docval_func`, `get_container_cls`, `add_child`, and `set_dataio` (refactored to `set_data_io`). The `hdmf.build.map` module was also removed; imports should now be directly from `hdmf.build`.
- gotcha HDMF versions 4.3.1 and later (until explicitly stated otherwise in future releases) restrict `pandas` to versions less than 3 (`<3`). Using `pandas` 3.x with these HDMF versions may lead to compatibility issues, particularly with string data types and data ingestion.
- gotcha For HDMF 4.0.0 and later, `numcodecs` is restricted to versions less than 0.16 (`<0.16`) due to incompatibilities with `zarr<3`. If you are working with Zarr storage backends, ensure these version constraints are met.
Install
-
pip install hdmf
Imports
- GroupSpec
from hdmf.spec import GroupSpec
- DatasetSpec
from hdmf.spec import DatasetSpec
- NamespaceBuilder
from hdmf.spec import NamespaceBuilder
- DynamicTable
from hdmf.common import DynamicTable
- HDF5IO
from hdmf.backends.hdf5 import HDF5IO
- *
from hdmf.build.map import TypeMap
from hdmf.build import BuildManager
Quickstart
import os
from hdmf.spec import GroupSpec, DatasetSpec, NamespaceBuilder
from hdmf.common import DynamicTable, VectorData
from hdmf.backends.hdf5 import HDF5IO
# 1. Define a custom data type specification
my_dataset_spec = DatasetSpec(name='my_data', doc='An example dataset', dtype='float32')
my_group_spec = GroupSpec(name='MyTypeContainer', doc='A custom data type container', datasets=[my_dataset_spec])
# 2. Create a namespace for your specification
namespace_builder = NamespaceBuilder(
doc='My Custom HDMF Extension',
name='my_extension',
full_name='My Custom Extension',
version='0.1.0',
auto_detect_namespace=True
)
# In a real scenario, you would save this to a YAML file and load it
# For quickstart, we'll demonstrate using built-in common types
# 3. Work with common HDMF data types, e.g., DynamicTable
table = DynamicTable(name='example_table', description='An example table of items')
table.add_column('item_name', 'Name of the item', dtype='text')
table.add_column('quantity', 'Quantity of the item', dtype='int')
table.add_row(item_name='Apple', quantity=10)
table.add_row(item_name='Banana', quantity=5)
# 4. Save to HDF5 file
file_name = 'my_hdmf_data.h5'
with HDF5IO(file_name, 'w') as io:
io.write(table)
print(f"DynamicTable saved to {file_name}")
# 5. Read from HDF5 file
with HDF5IO(file_name, 'r') as io:
read_table = io.read()
print("\nRead DynamicTable:")
print(read_table)
# Clean up
os.remove(file_name)