Rechunker

0.5.4 · active · verified Thu Apr 16

Rechunker is a Python package designed for efficient and scalable manipulation of the chunk structure of chunked array formats, such as Zarr and TileDB. It takes an input array (or group of arrays) from persistent storage and writes out a new array with the same data but a different chunking scheme, often utilizing an intermediate temporary store. It is currently at version 0.5.4 and is actively maintained by the Pangeo community, with regular releases addressing compatibility and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to rechunk a Zarr array from an initial chunking scheme to a new one. It defines source, intermediate, and target Zarr stores, specifies the target chunk size and maximum memory per worker, creates a rechunking plan, and then executes it.

import zarr
from rechunker import rechunk
import os

# Create a source Zarr array
source_store = 'source.zarr'
if not os.path.exists(source_store):
    zarr.ones((10, 10, 10), chunks=(2, 2, 2), store=source_store, overwrite=True)
source = zarr.open(source_store, mode='r')

# Define target and intermediate stores
intermediate_store = 'intermediate.zarr'
target_store = 'target.zarr'

# Define the target chunking scheme (e.g., contiguous in the first dimension)
target_chunks = (10, 5, 5)

# Define maximum memory for each worker (e.g., 256MB)
max_mem = '256MB'

# Create the rechunking plan
rechunked_plan = rechunk(
    source, 
    target_chunks, 
    max_mem, 
    target_store, 
    intermediate_store
)

# Execute the plan
result = rechunked_plan.execute()

print(f"Source array chunks: {source.chunks}")
print(f"Target array chunks: {result.chunks}")

# Clean up example files
import shutil
shutil.rmtree(source_store)
shutil.rmtree(intermediate_store)
shutil.rmtree(target_store)

view raw JSON →