Lilcom
Lilcom is a Python library that provides lossy compression for sequence data stored in NumPy arrays. It efficiently compresses floating-point or 16-bit integer NumPy arrays into byte strings, typically used in machine learning applications for storing training data and models. The current version is 1.8.2, and it has a fairly active release cadence, with updates addressing compatibility and functionality.
Warnings
- gotcha Lilcom provides *lossy* compression. The decompressed data will not be bit-for-bit identical to the original array. The amount of error is controlled by the `tick_power` argument (default -8), which determines the step size for discretized values.
- breaking The method for controlling compression accuracy changed. Older versions of Lilcom might have used `bits_per_sample`. Current versions (e.g., 1.8.2 and recent GitHub README) use `tick_power` to specify the quantization step size.
- breaking Lilcom requires Python 3.6 or newer. It is not compatible with Python 2.x. Attempts to install or run on unsupported Python versions will fail.
- gotcha The underlying algorithm is highly vulnerable to transmission errors. Even a single bit error in the compressed byte string can make the entire file or sequence unreadable during decompression. This is acceptable for its target machine learning applications where data integrity is often handled at a higher level or re-generated.
- gotcha If installing `lilcom` from source (e.g., if pre-compiled wheels are not available for your specific platform/Python version), a C++ compiler (like g++ or clang) is required on your system.
Install
-
pip install lilcom
Imports
- lilcom
import lilcom
- compress
lilcom.compress(...)
- decompress
lilcom.decompress(...)
Quickstart
import numpy as np
import lilcom
# Create a sample NumPy array
a = np.random.randn(300, 500).astype(np.float32)
# Compress the array (default tick_power=-8, controls accuracy)
a_compressed = lilcom.compress(a)
# Decompress the array, specifying original dtype
a_decompressed = lilcom.decompress(a_compressed, dtype=a.dtype)
print(f"Original array shape: {a.shape}, dtype: {a.dtype}")
print(f"Compressed data size: {len(a_compressed)} bytes")
print(f"Decompressed array shape: {a_decompressed.shape}, dtype: {a_decompressed.dtype}")
print(f"Max absolute error: {np.max(np.abs(a - a_decompressed)):.2e}")