Zipstream-new
Zipstream-new is a Python library that functions as a zip archive generator, taking input files and streams to create a ZIP file in small chunks. This is particularly useful for streaming large archives in web applications or to cloud storage without holding the entire file in memory or on disk. The current version is 1.1.8, and while the last PyPI release was in 2020, the project remains active on GitHub with a focus on stable generation.
Common errors
-
KeyError: 'There is no item named <large_number> in the archive'
cause Attempting to use `zipstream.ZipFile` directly with functions like `boto3.client.upload_fileobj` that expect a complete file-like object (with methods like `seek`, `tell`, `read`) instead of an iterable of byte chunks. The `KeyError` indicates an internal issue when `boto3` tries to interpret the iterable as a standard file.fixImplement a wrapper class around the `zipstream.ZipFile` iterator that provides the `read()` method and manages a buffer to simulate a file-like object for compatibility with APIs like `upload_fileobj`. -
TypeError: 'zipstream.ZipFile' object is not readable
cause Attempting to call `.read()` or `.seek()` on a `zipstream.ZipFile` object, which is an iterable generator of byte chunks, not a standard file-like object that supports such methods.fixIterate directly over the `zipstream.ZipFile` object to consume its byte chunks: `for chunk in my_zip_stream: ...`. If a file-like interface is strictly required, a custom adapter class is needed.
Warnings
- gotcha When streaming large archives (over 4GB or 65,535 files), `zipstream.ZipFile` defaults to the standard ZIP32 format, which has these limitations. For larger archives, you must explicitly enable ZIP64 extensions.
- gotcha The `ZipFile` object is an iterator that yields byte chunks, not a traditional file-like object with `read()` or `seek()` methods. Direct use with APIs expecting a `file-like` object (e.g., `boto3.client.upload_fileobj`) will fail.
- deprecated The `write` method with a single path argument (e.g., `z.write('path/to/files')`) implicitly adds files relative to the current working directory. For clarity and to avoid unexpected archive structures, it's recommended to always specify `arcname`.
Install
-
pip install zipstream-new
Imports
- ZipFile
from zipstream import ZipFile
- ZIP_DEFLATED
from zipstream import ZIP_DEFLATED
Quickstart
import zipstream
import os
def generate_zip_stream():
# Create a dummy directory and files for demonstration
if not os.path.exists('test_files'):
os.makedirs('test_files')
with open('test_files/file1.txt', 'w') as f:
f.write('This is file 1 content.')
with open('test_files/file2.txt', 'w') as f:
f.write('This is file 2 content. More data...')
# Initialize ZipFile for streaming
# Use allowZip64=True for archives larger than 4GB or 65,535 files
# Use compression=zipstream.ZIP_DEFLATED for compressed output
zs = zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED, allowZip64=True)
# Add files from a path
zs.write('test_files/file1.txt', arcname='archive/document1.txt')
# Add content from a string or bytes directly
zs.writestr('archive/dynamic_content.json', b'{"data": [1, 2, 3]}')
# Add content from an iterable (e.g., a generator)
def data_generator():
yield b"Line 1\n"
yield b"Line 2\n"
yield b"Line 3\n"
zs.write_iter('archive/generated_log.txt', data_generator())
# Iterate over the zipstream to get chunks of the ZIP file
# These chunks can be streamed directly to a HTTP response or cloud storage
for chunk in zs:
yield chunk
# Clean up dummy files/directory (optional)
os.remove('test_files/file1.txt')
os.remove('test_files/file2.txt')
os.rmdir('test_files')
# Example of consuming the stream to a local file
with open('output.zip', 'wb') as f:
for chunk in generate_zip_stream():
f.write(chunk)
print("ZIP file 'output.zip' created successfully.")