zlib-state: Low-level Zlib Interface for Decoding State Capture
The zlib-state library provides a low-level Python interface to the zlib compression library, specifically enabling the capture and restoration of decompression states. This allows for advanced use cases such as resuming decompression from arbitrary points within gzip or raw deflate streams. It features `Decompressor` for byte-level control and `GzipStateFile` for file-like object interaction, and is actively maintained with support for recent Python versions up to 3.13.
Warnings
- gotcha The `Decompressor` class is described as 'picky and unforgiving'. Users must precisely handle input, buffer sizes, and the `wbits` parameter to avoid `zlib` errors like `Z_BUF_ERROR`, `Z_DATA_ERROR`, or `Z_STREAM_ERROR`. Incorrect parameterization, especially for `wbits`, is a common source of issues when dealing with different compression formats (raw deflate, zlib, gzip).
- gotcha While `zlib-state` enables efficient iteration over gzip files and state capture, `GzipStateFile` is 'somewhat slower than python's gzip' for general, full-file decompression. It is optimized for scenarios requiring state manipulation and resuming, not necessarily for raw speed improvements in basic decompression tasks.
- breaking The library explicitly supports Python 3.6 and newer, with recent versions adding support for Python 3.12 and 3.13. Using `zlib-state` with older Python versions (e.g., <3.6) is not supported and will likely lead to installation failures or runtime errors due to C extension compatibility.
Install
-
pip install zlib-state
Imports
- Decompressor
from zlib_state import Decompressor
- GzipStateFile
from zlib_state import GzipStateFile
- zlib_state
import zlib_state
Quickstart
import zlib_state
import gzip
import os
# Create a dummy gzipped file for demonstration
dummy_content = b"Line 1\nLine 2\nLine 3 (State Capture Point)\nLine 4\n" * 50
with gzip.open("test_data.txt.gz", "wb") as f:
f.write(dummy_content)
TARGET_LINE = 100
state_to_resume = None
position_to_resume = 0
try:
# Use GzipStateFile to capture state at a specific point
with zlib_state.GzipStateFile('test_data.txt.gz', keep_last_state=True) as f:
for i, line in enumerate(f):
if i == TARGET_LINE:
state_to_resume = f.last_state
position_to_resume = f.last_state_pos
print(f"Captured state at line {i+1}, byte pos {position_to_resume}")
break
if state_to_resume and position_to_resume:
print(f"\nResuming decompression from line {TARGET_LINE+1}...")
with zlib_state.GzipStateFile('test_data.txt.gz') as f_resume:
f_resume.zseek(position_to_resume, state_to_resume)
remainder = f_resume.read(50) # Read a small portion after resuming
print(f"Decompressed remainder (first 50 bytes): {remainder.decode('utf-8').strip()}...")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up the dummy file
if os.path.exists("test_data.txt.gz"):
os.remove("test_data.txt.gz")