Blosc2

4.1.2 · active · verified Thu Apr 09

Blosc2 is a high-performance compressed ndarray library for Python, using the C-Blosc2 compression backend. It provides efficient storage and manipulation of arbitrarily large N-dimensional datasets, following the Array API standard, and includes a flexible compute engine for complex calculations on compressed data. Currently at version 4.1.2, it maintains an active development pace with frequent updates and feature enhancements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates creating a compressed NDArray, performing a simple computation, and decompressing it. It also includes an example of using `blosc2.TreeStore` for hierarchical data persistence, converting NumPy arrays and Blosc2 NDArrays to a file-based storage format.

import blosc2
import numpy as np

# Create a Blosc2 NDArray from a NumPy array
data = np.arange(1_000_000, dtype=np.float64)
ndarray = blosc2.asarray(data)
print(f"Original data size: {data.nbytes / (1024**2):.2f} MB")
print(f"Compressed data size: {ndarray.nbytes / (1024**2):.2f} MB")

# Perform a computation (e.g., sum) on the compressed array
computed_sum = ndarray.sum()
print(f"Sum of array elements: {computed_sum}")

# Decompress the array back to a NumPy array
decompressed_data = ndarray[:]
assert np.allclose(data, decompressed_data)
print("Data compressed and decompressed successfully.")

# Example of using TreeStore for hierarchical data storage
with blosc2.TreeStore("my_data.b2z", mode="w") as ts:
    ts["/group1/dataset_a"] = np.random.rand(100, 100)
    ts["/group2/dataset_b"] = blosc2.zeros((50, 50), dtype=np.int32)
    print("Data stored in TreeStore 'my_data.b2z'.")

with blosc2.TreeStore("my_data.b2z", mode="r") as ts:
    ds_a = ts["/group1/dataset_a"]
    ds_b = ts["/group2/dataset_b"]
    print(f"Read dataset_a shape: {ds_a.shape}")
    print(f"Read dataset_b dtype: {ds_b.dtype}")

os.remove("my_data.b2z")

view raw JSON →