Zarr: Chunked, Compressed N-dimensional Arrays

3.1.6 · active · verified Wed Apr 01

Zarr is a Python package that provides an implementation of chunked, compressed, N-dimensional arrays. It is designed for efficient use in parallel computing and supports various storage backends, including local disk, cloud object stores (like S3), and in-memory stores. The library is actively maintained, with its current version being 3.1.6, and recently underwent a significant refactor with the release of version 3, which introduced support for the Zarr v3 specification and improved performance.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a Zarr array, assign data to it using NumPy, and retrieve a subset. The array is stored on the local filesystem. This example defaults to Zarr format 3, which is the standard for Zarr-Python 3.x and newer.

import zarr
import numpy as np
import os

# Create a directory for the Zarr store
store_path = 'data/example_zarr_array.zarr'
os.makedirs(os.path.dirname(store_path), exist_ok=True)

# Create a 2D Zarr array
# This will default to Zarr format 3
z_array = zarr.create_array(
    store=store_path,
    shape=(100, 100),
    chunks=(10, 10),
    dtype='f4'
)

# Assign data to the array
z_array[:, :] = np.random.random((100, 100))

print(f"Created Zarr array at: {store_path}")
print(f"Array info:\n{z_array.info}")

# Access data
subset = z_array[0:5, 0:5]
print(f"Subset of array:\n{subset}")

# Clean up the created directory
import shutil
shutil.rmtree('data')

view raw JSON →