Kerchunk
raw JSON → 0.2.10 verified Fri May 01 auth: no python
Kerchunk is a Python library for creating and manipulating chunked reference descriptions for cloud-optimized data access. It enables efficient reading of scientific data (e.g., NetCDF/HDF5) from remote storage without downloading entire files. The current version is 0.2.10, supporting Python >=3.11, with a stable but evolving API.
pip install kerchunk Common errors
error KeyError: 'refs' ↓
cause Attempting to open a single-file reference dict directly with fsspec without using the correct mapper.
fix
Use fsspec.get_mapper('reference://', fo='ref.json') or wrap the dict in a ReferenceFileSystem. See quickstart.
error AttributeError: module 'kerchunk' has no attribute 'combine' ↓
cause Outdated kerchunk version (<0.2.0) or wrong import path. The combine module exists but is not imported by default from top-level.
fix
Use
from kerchunk.combine import MultiZarrToZarr and ensure kerchunk>=0.2.0. error ValueError: unrecognized chunk manager: none ↓
cause Opening a reference file with xarray without specifying engine='zarr' or using a non-Zarr engine.
fix
Always specify
engine='zarr' when opening reference datasets in xarray. Warnings
breaking Kerchunk 0.2.0 dropped Python 3.8 support. Use Python >=3.11 as of 0.2.10. ↓
fix Upgrade Python to 3.11 or later.
gotcha For multi-file concatenation, MultiZarrToZarr expects a list of reference dicts, not file paths. Passing file paths will raise cryptic errors. ↓
fix Generate single-file references first using single_file_to_reference, then pass to MultiZarrToZarr.
deprecated The 'kerchunk.combine' module's 'concat' function is deprecated in favor of MultiZarrToZarr. ↓
fix Use MultiZarrToZarr with appropriate options.
Imports
- combine_kwargs wrong
from kerchunk import MultiZarrToZarrcorrectfrom kerchunk.combine import MultiZarrToZarr - open_dataset wrong
from kerchunk import open_datasetcorrectfrom kerchunk.hdf import KerchunkGroup
Quickstart
import fsspec
import xarray as xr
from kerchunk.hdf import single_file_to_reference
from kerchunk.combine import MultiZarrToZarr
# Generate reference for a single file
url = 's3://example-bucket/file.nc' # or local path
fs = fsspec.filesystem('s3', anon=True)
with fs.open(url) as f:
h5chunks = single_file_to_reference(f, url)
# Save reference as JSON
import json
with open('ref.json', 'w') as f:
json.dump(h5chunks, f)
# Open with xarray
mapper = fsspec.get_mapper('reference://', fo='ref.json')
ds = xr.open_dataset(mapper, engine='zarr')
print(ds)