Blosc Data Compressor
Blosc is a high-performance Python wrapper for the C-Blosc meta-compressor library. It's designed for compressing and decompressing numerical datasets, especially those used with NumPy, offering fast, multi-threaded operations. The current version is 1.11.4, and it maintains a regular release cadence, primarily updating its vendored C-Blosc library and supporting newer Python versions.
Warnings
- breaking Python 3.7 support was dropped in `blosc` 1.11.0, and Python 3.8 support was dropped in 1.11.2.
- gotcha When compressing structured data like NumPy arrays, setting the `typesize` parameter is crucial for optimal compression ratio and speed. While `typesize=0` works for generic byte streams, it should be set to `array.itemsize` for arrays.
- gotcha `blosc.decompress()` returns a `bytes` object. When decompressing data that originated from a NumPy array, you must convert this `bytes` object back to a NumPy array manually using `numpy.frombuffer()` with the correct `dtype`.
- deprecated Version 1.11.4 included fixes for deprecated NumPy usage. Older `blosc` versions might emit warnings or fail when used with newer NumPy versions due to reliance on deprecated NumPy APIs.
Install
-
pip install blosc
Imports
- blosc
import blosc
- compress
from blosc import compress, decompress
- decompress
from blosc import compress, decompress
Quickstart
import blosc
data_bytes = b"This is a test string that will be compressed by blosc." * 10
# Compress data
# For byte strings, typesize=1 is appropriate.
# For NumPy arrays, use typesize=array.itemsize
compressed_data = blosc.compress(data_bytes, typesize=1)
print(f"Original size: {len(data_bytes)} bytes")
print(f"Compressed size: {len(compressed_data)} bytes")
# Decompress data
decompressed_data = blosc.decompress(compressed_data)
# Verify
assert data_bytes == decompressed_data
print("Decompression successful!")