Numcodecs
Numcodecs is a Python package providing a diverse set of buffer compression and transformation codecs. It is widely used in data storage and communication applications, especially as a dependency for the Zarr N-dimensional array store. The current version is 0.16.5, with frequent patch and minor releases, typically on a monthly or bi-monthly cadence.
Warnings
- breaking Numcodecs versions 0.16.x and higher require Python 3.11 or newer. Installations on older Python versions will fail.
- gotcha Many high-performance codecs (e.g., Blosc, Zstd, LZ4, Snappy) are implemented as optional dependencies. If you try to use them without their respective packages installed, an `ImportError` or `ValueError` will occur.
- breaking Compatibility with Zarr v3 has evolved across recent numcodecs versions. Changes in Zarr3 configuration serialization (v0.15.1) and handling for Zarr 3.1.0 (v0.16.2) may affect users relying on specific Zarr3 features or metadata structures.
- gotcha Prior to v0.16.3, there was a known issue with Zstd decompression leading to negative size errors on 32-bit platforms.
Install
-
pip install numcodecs -
pip install "numcodecs[all]"
Imports
- numcodecs
import numcodecs
- Blosc
from numcodecs.blosc import Blosc
- Zstd
from numcodecs.zstd import Zstd
- GZip
from numcodecs.gzip import GZip
Quickstart
import numpy as np
import numcodecs
# Define some data to encode
data = np.arange(10000, dtype='i4').reshape(100, 100)
# Choose a codec. Blosc is common, but requires 'pip install "numcodecs[blosc]"'.
# If Blosc isn't installed, GZip is a good fallback.
try:
codec = numcodecs.blosc.Blosc(cname='lz4', clevel=5, shuffle=numcodecs.blosc.SHUFFLE)
print("Using Blosc codec.")
except ImportError:
print("Blosc not installed. Falling back to GZip codec.")
codec = numcodecs.gzip.GZip(level=5)
# Encode the data
encoded_data = codec.encode(data.tobytes())
print(f"Original data shape: {data.shape}, dtype: {data.dtype}")
print(f"Original bytes: {data.nbytes}")
print(f"Encoded bytes: {len(encoded_data)}")
# Decode the data
decoded_bytes = codec.decode(encoded_data)
# Reconstruct the numpy array
decoded_data = np.frombuffer(decoded_bytes, dtype=data.dtype).reshape(data.shape)
# Verify that the decoded data matches the original
assert np.array_equal(data, decoded_data)
print("Data successfully encoded and decoded!")
# You can also register codecs globally for retrieval by ID
numcodecs.register_codec(codec)
retrieved_codec = numcodecs.get_codec({'id': codec.codec_id, **codec.get_config()})
assert retrieved_codec.codec_id == codec.codec_id
print(f"Codec '{retrieved_codec.codec_id}' registered and retrieved successfully.")