Zarr: Chunked, Compressed N-dimensional Arrays

raw JSON →
3.1.6 verified Tue May 12 auth: no python install: draft

Zarr is a Python package that provides an implementation of chunked, compressed, N-dimensional arrays. It is designed for efficient use in parallel computing and supports various storage backends, including local disk, cloud object stores (like S3), and in-memory stores. The library is actively maintained, with its current version being 3.1.6, and recently underwent a significant refactor with the release of version 3, which introduced support for the Zarr v3 specification and improved performance.

pip install zarr
breaking Zarr-Python 3.0 introduced significant breaking changes compared to 2.x, particularly a major refactor of the API, storage layer, and codec handling. Direct imports of codecs (e.g., `Blosc`) from `zarr.*` are no longer supported; they must be imported directly from `zarr.codecs` or `numcodecs`. Direct construction of `zarr.Array` is discouraged in favor of `zarr.create_array` or `zarr.open_array`.
fix Review the Zarr-Python 3.0 Migration Guide. Update codec imports (e.g., `from zarr.codecs import BloscCodec` or `from numcodecs import BloscCodec`). Use `zarr.create_array` or `zarr.open_array` for array creation.
breaking Newly created arrays in Zarr-Python 3.0 and later default to Zarr format 3. This means that arrays created without explicitly specifying a format will use the new V3 specification. This can cause compatibility issues with older Zarr consumers that only support Zarr format 2.
fix If compatibility with Zarr format 2 is required, explicitly set `zarr_format=2` when creating new arrays, e.g., `zarr.create_array(..., zarr_format=2)`. Also, consider using `zarr.open_array` which can infer the format of existing stores.
gotcha Consolidated metadata is a feature in Zarr-Python that aggregates all metadata into a single file for faster access. However, for Zarr format 3, consolidated metadata is currently not part of the official specification. Its use may lead to compatibility issues with other Zarr implementations and its behavior might change in future Zarr-Python versions.
fix Be aware that consolidated metadata for Zarr v3 is a Zarr-Python-specific extension. If cross-implementation compatibility or strict adherence to the Zarr v3 spec is critical, consider avoiding consolidated metadata or managing metadata through other means. It is standard for Zarr-Python v2.
gotcha Zarr-Python 3 introduces an asynchronous I/O architecture which significantly improves performance, especially with high-latency cloud stores. While a synchronous interface is provided, users with performance-critical cloud workloads may benefit from understanding and leveraging the async APIs.
fix For optimal performance, especially in cloud environments, review the Zarr-Python documentation on asynchronous APIs and concurrency limits. Consider using `async` operations where applicable in your application design.
breaking Installation of Zarr or its core dependency `numcodecs` may fail in minimal environments (like Alpine Linux) due to missing C/C++ build tools. `numcodecs` includes C extensions that require compilation during installation, and without a C compiler (e.g., `gcc`) and Python development headers, the build process will fail.
fix Ensure that C/C++ build tools (e.g., `gcc`, `g++`) and Python development headers (e.g., `python3-dev` or `musl-dev` on Alpine) are installed in your environment before attempting to install Zarr or its dependencies.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) build_error - - - - - -
3.10 alpine (musl) - - - - - -
3.10 slim (glibc) sdist 4.5s 0.34s 123M 12.0M clean
3.10 slim (glibc) - - 0.31s 123M 12.0M -
3.11 alpine (musl) build_error - - - - - -
3.11 alpine (musl) - - - - - -
3.11 slim (glibc) wheel 4.3s 1.14s 135M 19.1M clean
3.11 slim (glibc) - - 1.23s 135M 19.1M -
3.12 alpine (musl) build_error - - - - - -
3.12 alpine (musl) - - - - - -
3.12 slim (glibc) wheel 4.2s 1.56s 124M 19.3M clean
3.12 slim (glibc) - - 1.64s 123M 19.0M -
3.13 alpine (musl) build_error - - - - - -
3.13 alpine (musl) - - - - - -
3.13 slim (glibc) wheel 4.2s 1.25s 123M 19.5M clean
3.13 slim (glibc) - - 1.47s 123M 18.8M -
3.9 alpine (musl) build_error - - - - - -
3.9 alpine (musl) - - - - - -
3.9 slim (glibc) sdist 5.5s 0.39s 128M 11.1M clean
3.9 slim (glibc) - - 0.43s 128M 11.1M -

This quickstart demonstrates how to create a Zarr array, assign data to it using NumPy, and retrieve a subset. The array is stored on the local filesystem. This example defaults to Zarr format 3, which is the standard for Zarr-Python 3.x and newer.

import zarr
import numpy as np
import os

# Create a directory for the Zarr store
store_path = 'data/example_zarr_array.zarr'
os.makedirs(os.path.dirname(store_path), exist_ok=True)

# Create a 2D Zarr array
# This will default to Zarr format 3
z_array = zarr.create_array(
    store=store_path,
    shape=(100, 100),
    chunks=(10, 10),
    dtype='f4'
)

# Assign data to the array
z_array[:, :] = np.random.random((100, 100))

print(f"Created Zarr array at: {store_path}")
print(f"Array info:\n{z_array.info}")

# Access data
subset = z_array[0:5, 0:5]
print(f"Subset of array:\n{subset}")

# Clean up the created directory
import shutil
shutil.rmtree('data')