{"id":1952,"library":"blosc2","title":"Blosc2","description":"Blosc2 is a high-performance compressed ndarray library for Python, using the C-Blosc2 compression backend. It provides efficient storage and manipulation of arbitrarily large N-dimensional datasets, following the Array API standard, and includes a flexible compute engine for complex calculations on compressed data. Currently at version 4.1.2, it maintains an active development pace with frequent updates and feature enhancements.","status":"active","version":"4.1.2","language":"en","source_language":"en","source_url":"https://github.com/Blosc/python-blosc2","tags":["compression","ndarray","data science","performance","compute","hdf5","zarr"],"install":[{"cmd":"pip install blosc2 --upgrade","lang":"bash","label":"Install from PyPI"},{"cmd":"conda install -c conda-forge python-blosc2","lang":"bash","label":"Install with Conda"}],"dependencies":[{"reason":"Crucial for creating and interacting with blosc2.NDArray objects, which are designed to be compatible with NumPy arrays. While not a direct install_requires, it's a de-facto dependency for most use cases.","package":"numpy","optional":true},{"reason":"Used by Blosc2's compute engine (miniexpr) for hyper-fast multithreaded element-wise computations and reductions.","package":"numexpr","optional":true}],"imports":[{"symbol":"blosc2","correct":"import blosc2"},{"note":"NDArray is generally accessed as `blosc2.NDArray` or through creation functions like `blosc2.zeros` or `blosc2.asarray`.","wrong":"from blosc2 import NDArray","symbol":"NDArray","correct":"import blosc2\narray = blosc2.zeros((10, 10))"},{"symbol":"TreeStore","correct":"from blosc2 import TreeStore"},{"note":"Compression codecs should be passed using the `blosc2.Codec` enum members, not string literals.","wrong":"blosc2.compress(data, codec='blosclz')","symbol":"Codec","correct":"blosc2.Codec.BLOSCLZ"},{"note":"Compression filters should be passed using the `blosc2.Filter` enum members, not string literals.","wrong":"blosc2.compress(data, filter='shuffle')","symbol":"Filter","correct":"blosc2.Filter.SHUFFLE"}],"quickstart":{"code":"import blosc2\nimport numpy as np\n\n# Create a Blosc2 NDArray from a NumPy array\ndata = np.arange(1_000_000, dtype=np.float64)\nndarray = blosc2.asarray(data)\nprint(f\"Original data size: {data.nbytes / (1024**2):.2f} MB\")\nprint(f\"Compressed data size: {ndarray.nbytes / (1024**2):.2f} MB\")\n\n# Perform a computation (e.g., sum) on the compressed array\ncomputed_sum = ndarray.sum()\nprint(f\"Sum of array elements: {computed_sum}\")\n\n# Decompress the array back to a NumPy array\ndecompressed_data = ndarray[:]\nassert np.allclose(data, decompressed_data)\nprint(\"Data compressed and decompressed successfully.\")\n\n# Example of using TreeStore for hierarchical data storage\nwith blosc2.TreeStore(\"my_data.b2z\", mode=\"w\") as ts:\n    ts[\"/group1/dataset_a\"] = np.random.rand(100, 100)\n    ts[\"/group2/dataset_b\"] = blosc2.zeros((50, 50), dtype=np.int32)\n    print(\"Data stored in TreeStore 'my_data.b2z'.\")\n\nwith blosc2.TreeStore(\"my_data.b2z\", mode=\"r\") as ts:\n    ds_a = ts[\"/group1/dataset_a\"]\n    ds_b = ts[\"/group2/dataset_b\"]\n    print(f\"Read dataset_a shape: {ds_a.shape}\")\n    print(f\"Read dataset_b dtype: {ds_b.dtype}\")\n\nos.remove(\"my_data.b2z\")","lang":"python","description":"This quickstart demonstrates creating a compressed NDArray, performing a simple computation, and decompressing it. It also includes an example of using `blosc2.TreeStore` for hierarchical data persistence, converting NumPy arrays and Blosc2 NDArrays to a file-based storage format."},"warnings":[{"fix":"Ensure all consumers of Blosc2-generated data are using Blosc2. For C-Blosc1 compatibility, consider defining `BLOSC1_COMPAT` during C-Blosc2 compilation if using the C API directly.","message":"Buffers generated with C-Blosc2 are generally not format-compatible with C-Blosc1 (i.e., forward compatibility is not supported). While C-Blosc2 is backward compatible with the C-Blosc1 API and in-memory format, users upgrading or sharing data between versions should be aware of this limitation.","severity":"breaking","affected_versions":"All versions where C-Blosc2 is used with C-Blosc1"},{"fix":"For the size in bytes, use `ndarray.nbytes`. Update any code that used `.size` expecting byte count.","message":"The `NDArray.size` property changed its behavior in version 3.11.0. It now returns the number of elements in the array (Array API standard compliant) instead of the size of the array in bytes. Code relying on `NDArray.size` for byte size will need to be updated.","severity":"breaking","affected_versions":">=3.11.0"},{"fix":"Migrate code to use `blosc2.concat()` for future compatibility.","message":"The `blosc2.concatenate()` function was renamed to `blosc2.concat()` in version 3.5.0 to align with the Array API. While `concatenate` is still available for backward compatibility, it will be removed in a future release.","severity":"breaking","affected_versions":">=3.5.0"},{"fix":"Always use `blosc2.Codec.CODEC_NAME` (e.g., `blosc2.Codec.BLOSCLZ`) and `blosc2.Filter.FILTER_NAME` (e.g., `blosc2.Filter.SHUFFLE`).","message":"When specifying compression codecs or filters, users must pass members of the `blosc2.Codec` and `blosc2.Filter` enums, respectively, not string literals. Passing strings will result in an `AttributeError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Evaluate performance for specific workloads and hardware. Blosc2 shines in I/O-bound scenarios and for out-of-core computations. Consider the trade-offs between compression overhead and I/O savings.","message":"For in-memory tasks, Blosc2's overhead can sometimes make it slower than pure NumPy/Numexpr, especially on x86 CPUs. However, it consistently outperforms them for on-disk operations or on modern ARM architectures (e.g., Apple Silicon) due to its efficient use of compression and cache optimization.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Configure Blosc2's compression parameters directly, ensuring its internal shuffle is used when writing HDF5 datasets with the Blosc2 filter.","message":"When using Blosc2 as an HDF5 filter, it is important not to activate the shuffle filter directly within HDF5. Blosc2 uses an internal SIMD shuffle that is much faster and should be handled by Blosc2 itself for optimal performance.","severity":"gotcha","affected_versions":"All versions with HDF5 integration"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}