Python ZSTD Bindings
The 'zstd' library provides fast Python bindings to Yann Collet's Zstandard (zstd) lossless compression algorithm. It offers a compelling balance of speed and compression ratio, making it a popular choice for real-time compression and large-scale data processing. The library is actively maintained with frequent releases, currently at version 1.5.7.3, and focuses on performance optimizations and bug fixes.
Warnings
- breaking Starting with Python 3.14, a new `compression.zstd` module will be added to the standard library (PEP 784). This may cause import ambiguities or conflicts if the third-party `zstd` package is also installed and `import zstd` is used directly, as the standard library module might take precedence or behave differently.
- gotcha Using `zstd.compress()` or `zstd.decompress()` with extremely large datasets (e.g., multi-gigabyte files) can lead to significant memory consumption as the entire input/output must fit in memory simultaneously. This can result in `MemoryError`.
- gotcha When using `ZstdCompressor` for incremental compression, it is crucial to call its `flush()` method after providing all input data to ensure that all buffered compressed data is emitted and the Zstandard frame is properly finalized. Failing to do so can result in incomplete or corrupted compressed data that cannot be decompressed correctly by other tools or even the same library.
- breaking In version 1.5.7.1, the `ZSTD_min_compression_level()` function was fixed to return a 'real number' (likely a standard integer representing the level) instead of a 'shifted int value'. If previous code relied on the specific bitwise representation or shifted value returned by this function, its behavior will change.
Install
-
pip install zstd
Imports
- zstd
import zstd
Quickstart
import zstd
original_data = b"This is some data that will be compressed using Zstandard. It\'s a fairly long string to demonstrate compression efficiency." * 100
# Compress data
# level can range from -100 (ultra-fast) to 22 (slowest, best compression)
# threads can be 0 (auto-tune) or a specific number
compressed_data = zstd.compress(original_data, level=3, threads=0)
print(f"Original size: {len(original_data)} bytes")
print(f"Compressed size: {len(compressed_data)} bytes")
print(f"Compression ratio: {len(compressed_data) / len(original_data):.2f}")
# Decompress data
decompressed_data = zstd.decompress(compressed_data)
assert original_data == decompressed_data
print("Decompression successful! Data matches original.")
# Example of streaming compression for large data
cctx = zstd.ZstdCompressor(level=1)
dctx = zstd.ZstdDecompressor()
chunk_size = len(original_data) // 5
compressed_chunks = []
# Stream compress
for i in range(0, len(original_data), chunk_size):
chunk = original_data[i:i + chunk_size]
compressed_chunk = cctx.compress(chunk)
compressed_chunks.append(compressed_chunk)
# Important: flush the compressor to finalize the frame
compressed_chunks.append(cctx.flush())
streaming_compressed_data = b''.join(compressed_chunks)
# Stream decompress
streaming_decompressed_data = dctx.decompress(streaming_compressed_data)
assert original_data == streaming_decompressed_data
print("Streaming decompression successful! Data matches original.")