Kerchunk

raw JSON →
0.2.10 verified Fri May 01 auth: no python

Kerchunk is a Python library for creating and manipulating chunked reference descriptions for cloud-optimized data access. It enables efficient reading of scientific data (e.g., NetCDF/HDF5) from remote storage without downloading entire files. The current version is 0.2.10, supporting Python >=3.11, with a stable but evolving API.

pip install kerchunk
error KeyError: 'refs'
cause Attempting to open a single-file reference dict directly with fsspec without using the correct mapper.
fix
Use fsspec.get_mapper('reference://', fo='ref.json') or wrap the dict in a ReferenceFileSystem. See quickstart.
error AttributeError: module 'kerchunk' has no attribute 'combine'
cause Outdated kerchunk version (<0.2.0) or wrong import path. The combine module exists but is not imported by default from top-level.
fix
Use from kerchunk.combine import MultiZarrToZarr and ensure kerchunk>=0.2.0.
error ValueError: unrecognized chunk manager: none
cause Opening a reference file with xarray without specifying engine='zarr' or using a non-Zarr engine.
fix
Always specify engine='zarr' when opening reference datasets in xarray.
breaking Kerchunk 0.2.0 dropped Python 3.8 support. Use Python >=3.11 as of 0.2.10.
fix Upgrade Python to 3.11 or later.
gotcha For multi-file concatenation, MultiZarrToZarr expects a list of reference dicts, not file paths. Passing file paths will raise cryptic errors.
fix Generate single-file references first using single_file_to_reference, then pass to MultiZarrToZarr.
deprecated The 'kerchunk.combine' module's 'concat' function is deprecated in favor of MultiZarrToZarr.
fix Use MultiZarrToZarr with appropriate options.

Creates a single-file reference and opens it with xarray via Zarr engine.

import fsspec
import xarray as xr
from kerchunk.hdf import single_file_to_reference
from kerchunk.combine import MultiZarrToZarr

# Generate reference for a single file
url = 's3://example-bucket/file.nc'  # or local path
fs = fsspec.filesystem('s3', anon=True)
with fs.open(url) as f:
    h5chunks = single_file_to_reference(f, url)

# Save reference as JSON
import json
with open('ref.json', 'w') as f:
    json.dump(h5chunks, f)

# Open with xarray
mapper = fsspec.get_mapper('reference://', fo='ref.json')
ds = xr.open_dataset(mapper, engine='zarr')
print(ds)