Zarr: Chunked, Compressed N-dimensional Arrays
Zarr is a Python package that provides an implementation of chunked, compressed, N-dimensional arrays. It is designed for efficient use in parallel computing and supports various storage backends, including local disk, cloud object stores (like S3), and in-memory stores. The library is actively maintained, with its current version being 3.1.6, and recently underwent a significant refactor with the release of version 3, which introduced support for the Zarr v3 specification and improved performance.
Warnings
- breaking Zarr-Python 3.0 introduced significant breaking changes compared to 2.x, particularly a major refactor of the API, storage layer, and codec handling. Direct imports of codecs (e.g., `Blosc`) from `zarr.*` are no longer supported; they must be imported directly from `zarr.codecs` or `numcodecs`. Direct construction of `zarr.Array` is discouraged in favor of `zarr.create_array` or `zarr.open_array`.
- breaking Newly created arrays in Zarr-Python 3.0 and later default to Zarr format 3. This means that arrays created without explicitly specifying a format will use the new V3 specification. This can cause compatibility issues with older Zarr consumers that only support Zarr format 2.
- gotcha Consolidated metadata is a feature in Zarr-Python that aggregates all metadata into a single file for faster access. However, for Zarr format 3, consolidated metadata is currently not part of the official specification. Its use may lead to compatibility issues with other Zarr implementations and its behavior might change in future Zarr-Python versions.
- gotcha Zarr-Python 3 introduces an asynchronous I/O architecture which significantly improves performance, especially with high-latency cloud stores. While a synchronous interface is provided, users with performance-critical cloud workloads may benefit from understanding and leveraging the async APIs.
Install
-
pip install zarr
Imports
- zarr
import zarr
- zarr.create_array
import zarr z = zarr.create_array(...)
- zarr.codecs.BloscCodec
import zarr.codecs compressors=zarr.codecs.BloscCodec(...)
Quickstart
import zarr
import numpy as np
import os
# Create a directory for the Zarr store
store_path = 'data/example_zarr_array.zarr'
os.makedirs(os.path.dirname(store_path), exist_ok=True)
# Create a 2D Zarr array
# This will default to Zarr format 3
z_array = zarr.create_array(
store=store_path,
shape=(100, 100),
chunks=(10, 10),
dtype='f4'
)
# Assign data to the array
z_array[:, :] = np.random.random((100, 100))
print(f"Created Zarr array at: {store_path}")
print(f"Array info:\n{z_array.info}")
# Access data
subset = z_array[0:5, 0:5]
print(f"Subset of array:\n{subset}")
# Clean up the created directory
import shutil
shutil.rmtree('data')