Stream ZIP
Stream-zip is a Python library designed to construct ZIP archives on the fly without requiring the entire archive or its constituent files to be held in memory or on disk. This makes it particularly suitable for memory-constrained environments or generating ZIP files for streaming HTTP responses in web servers. It offers both synchronous and asynchronous interfaces and is currently at version 0.0.84, with frequent minor updates.
Warnings
- breaking The signature for the `async_stream_zip` function changed in version `0.0.81`. If you were using the asynchronous interface, verify the updated argument order and names.
- gotcha Client code must explicitly choose between `ZIP_32` and `ZIP_64` compression methods for each member file. The library cannot automatically determine if `ZIP_64` (for files > 4GiB) is needed during streaming, as this information is required before the compressed data is processed. Incorrect choice can lead to `ZipOverflowError` for large files.
- gotcha Using `NO_COMPRESSION_32` or `NO_COMPRESSION_64` for uncompressed files will buffer the *entire content* of those files in memory. This negates the streaming benefit for uncompressed large files, as their full size and CRC32 must be known before their data in the ZIP stream.
- gotcha Due to the ZIP file format specification, small bits of metadata for each member file (like names) must be placed at the end of the ZIP archive. `stream-zip` buffers this metadata in memory until it can be output. For archives with a very large number of small files, this could still lead to increased memory usage for metadata.
Install
-
pip install stream-zip
Imports
- stream_zip
from stream_zip import stream_zip
- ZIP_32
from stream_zip import ZIP_32
- async_stream_zip
from stream_zip import async_stream_zip
Quickstart
from datetime import datetime
from stat import S_IFREG
from stream_zip import ZIP_32, stream_zip, ZIP_64
def generate_file_content(data):
yield data.encode('utf-8')
def member_files():
modified_at = datetime.now()
mode = S_IFREG | 0o600 # Regular file, owner read/write
# Example 1: Small file using ZIP_32 (default)
yield (
'my-file-1.txt',
modified_at,
mode,
ZIP_32,
generate_file_content('This is some content for file 1.')
)
# Example 2: Potentially larger file or for explicit 64-bit support
yield (
'my-file-2.json',
modified_at,
mode,
ZIP_64,
generate_file_content('{"key": "value", "data": [1, 2, 3]}')
)
# Example 3: An empty directory
yield (
'my-directory/',
modified_at,
S_IFREG | 0o755, # Directory permissions
ZIP_32,
()
)
# Stream the ZIP file chunks
zipped_chunks = stream_zip(member_files())
# In a real application, you would send these chunks directly as an HTTP response
# or write them to a file. For this example, we'll print them.
# You might also use io.BytesIO to collect them into a single bytes object for testing.
# Example of consuming chunks (e.g., writing to a file)
# with open('output.zip', 'wb') as f:
# for chunk in zipped_chunks:
# f.write(chunk)
print("Generated ZIP chunks (first few bytes of each):")
for i, chunk in enumerate(zipped_chunks):
print(f"Chunk {i}: {len(chunk)} bytes (starts with: {chunk[:20]})")
if i > 5: # Limit output for demonstration
print("...")
break