Stream-unzip

0.0.101 · active · verified Sun Apr 12

Stream-unzip is a Python library (version 0.0.101) designed to efficiently decompress ZIP archives without loading the entire file or any of its uncompressed contents into memory. It focuses on streaming capabilities, making it suitable for large files or network-bound operations. The library is actively maintained with regular releases and supports various ZIP formats including Deflate64, Zip64, and both AES and legacy ZipCrypto encryption.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to stream-unzip a ZIP file downloaded via HTTPX. The `stream_unzip` function takes an iterable of bytes (representing the ZIP archive) and yields tuples for each file: `(file_name, file_size, unzipped_bytes_generator)`. The `unzipped_bytes_generator` for each file *must* be iterated to completion to avoid `UnfinishedIterationError`.

import httpx
import os
from stream_unzip import stream_unzip

def get_zipped_chunks(url):
    # Ensure httpx is installed: pip install httpx
    # Example URL for a small test zip if not set:
    # url = url or 'https://www.learningcontainer.com/wp-content/uploads/2020/07/sample-zip-file.zip'
    with httpx.stream('GET', url, follow_redirects=True) as r:
        r.raise_for_status()
        yield from r.iter_bytes(chunk_size=65536)

def main():
    # Replace with a real ZIP file URL or use a local file stream
    zip_url = os.environ.get('STREAM_UNZIP_TEST_URL', 'https://www.learningcontainer.com/wp-content/uploads/2020/07/sample-zip-file.zip')
    print(f"Downloading and unzipping from: {zip_url}")
    for file_name, file_size, unzipped_chunks in stream_unzip(get_zipped_chunks(zip_url)):
        print(f"\nProcessing file: {file_name.decode('utf-8')} (Size: {file_size if file_size is not None else 'Unknown'} bytes)")
        total_unzipped_bytes = 0
        # IMPORTANT: unzipped_chunks *must* be iterated to completion
        for chunk in unzipped_chunks:
            total_unzipped_bytes += len(chunk)
            # Process the chunk, e.g., write to disk or another stream
            # print(f"  Read {len(chunk)} bytes from {file_name.decode('utf-8')}")
        print(f"  Finished reading {total_unzipped_bytes} bytes for {file_name.decode('utf-8')}")

if __name__ == '__main__':
    main()

view raw JSON →