Stream-unzip
Stream-unzip is a Python library (version 0.0.101) designed to efficiently decompress ZIP archives without loading the entire file or any of its uncompressed contents into memory. It focuses on streaming capabilities, making it suitable for large files or network-bound operations. The library is actively maintained with regular releases and supports various ZIP formats including Deflate64, Zip64, and both AES and legacy ZipCrypto encryption.
Warnings
- breaking Support for Python 3.6 was dropped in version 0.0.96. Additionally, support for earlier versions of Python 3.7 was dropped in version 0.0.100.
- gotcha The `unzipped_chunks` generator yielded for each file *must* be fully consumed (iterated to completion). Failing to do so will result in an `UnfinishedIterationError`.
- gotcha Version 0.0.96 had a 'missing main import' bug which could lead to `ImportError`. This was fixed in version 0.0.97.
- gotcha The library processes ZIP entries as they are read and does not rely on the central directory (which is at the end of a ZIP file). While this enables true streaming, it may lead to edge cases or failures with malformed or unusual ZIP archives that depend on the central directory for critical metadata.
- gotcha The file name and file size values extracted from the ZIP archive should be treated as untrusted input, as they are provided by the ZIP file's creator and could be malicious.
Install
-
pip install stream-unzip
Imports
- stream_unzip
from stream_unzip import stream_unzip
Quickstart
import httpx
import os
from stream_unzip import stream_unzip
def get_zipped_chunks(url):
# Ensure httpx is installed: pip install httpx
# Example URL for a small test zip if not set:
# url = url or 'https://www.learningcontainer.com/wp-content/uploads/2020/07/sample-zip-file.zip'
with httpx.stream('GET', url, follow_redirects=True) as r:
r.raise_for_status()
yield from r.iter_bytes(chunk_size=65536)
def main():
# Replace with a real ZIP file URL or use a local file stream
zip_url = os.environ.get('STREAM_UNZIP_TEST_URL', 'https://www.learningcontainer.com/wp-content/uploads/2020/07/sample-zip-file.zip')
print(f"Downloading and unzipping from: {zip_url}")
for file_name, file_size, unzipped_chunks in stream_unzip(get_zipped_chunks(zip_url)):
print(f"\nProcessing file: {file_name.decode('utf-8')} (Size: {file_size if file_size is not None else 'Unknown'} bytes)")
total_unzipped_bytes = 0
# IMPORTANT: unzipped_chunks *must* be iterated to completion
for chunk in unzipped_chunks:
total_unzipped_bytes += len(chunk)
# Process the chunk, e.g., write to disk or another stream
# print(f" Read {len(chunk)} bytes from {file_name.decode('utf-8')}")
print(f" Finished reading {total_unzipped_bytes} bytes for {file_name.decode('utf-8')}")
if __name__ == '__main__':
main()