remotezip2
remotezip2 is a Python library that provides efficient access to individual files within a remote ZIP archive without requiring the full download of the entire archive. It functions by leveraging HTTP Range requests, making it suitable for scenarios where only specific contents of large remote ZIP files are needed. As a fork of the original `python-remotezip`, it aims for continued maintenance and responsiveness. The current version is 0.0.2, with a relatively slow release cadence, having received a maintenance update in late 2024 since its initial release.
Common errors
-
ModuleNotFoundError: No module named 'remotezip'
cause The user is attempting to import from the original `remotezip` package instead of `remotezip2`.fixChange your import statement from `from remotezip import RemoteZip` to `from remotezip2 import RemoteZip`. -
KeyError: 'Content-length' or `requests.exceptions.HTTPError: 416 Range Not Satisfiable`
cause The remote server hosting the ZIP file does not support HTTP Range requests, which are essential for `remotezip2` to function by fetching only parts of the file. Or the requested range is invalid.fixCheck the server configuration to ensure it supports `Range` headers. Some CDNs or static file servers might disable this. If you control the server, enable Range header support. Otherwise, you might need to download the full file or use a different service. -
BadZipFile: File is not a zip file
cause The URL provided either does not point to a valid ZIP file, or the initial bytes fetched (e.g., central directory) are corrupted or malformed, preventing the underlying `zipfile` module from recognizing it.fixVerify the `zip_url` is correct and points to a legitimate, uncorrupted ZIP archive. Ensure the file itself is a valid ZIP by trying to open it locally.
Warnings
- gotcha The `extractall()` and `testzip()` methods are generally inefficient for remote ZIPs as they necessitate downloading the entire archive. If these operations are frequently needed, a full download might be more efficient.
- gotcha Performing negative seek operations (e.g., `seek(-offset, 1)`) within a `ZipExtFile` object is highly inefficient, as it typically triggers a new remote request to restart reading from the beginning of the member content.
- gotcha The library heavily relies on the remote web server's support for HTTP Range headers to function efficiently. Without this support, the library may fail or resort to full downloads, losing its primary benefit.
- gotcha As of February 2026, PyPI is enforcing stricter ZIP file validation (e.g., for wheels), rejecting archives with duplicate filenames or invalid `RECORD` metadata. While `remotezip2` consumes ZIPs, not creates them, consuming a malformed ZIP that would now be rejected by PyPI could lead to unexpected behavior or errors.
Install
-
pip install remotezip2
Imports
- RemoteZip
from remotezip import RemoteZip
from remotezip2 import RemoteZip
Quickstart
import os
from remotezip2 import RemoteZip
# Replace with a URL to a real remote ZIP file that supports HTTP Range headers
# For testing, you can use a publicly available ZIP, e.g., one from thematicmapping.org
zip_url = os.environ.get('REMOTE_ZIP_URL', 'http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip')
try:
with RemoteZip(zip_url) as rz:
print(f"Files in remote ZIP at {zip_url}:")
for name in rz.namelist():
print(f" - {name}")
# Example: Extract a specific file
file_to_extract = 'Readme.txt'
if file_to_extract in rz.namelist():
print(f"\nExtracting '{file_to_extract}'...")
with rz.open(file_to_extract) as remote_file:
content = remote_file.read().decode('utf-8')
print(f"Content of '{file_to_extract}':\n---\n{content[:200]}...\n---") # Print first 200 chars
else:
print(f"\nFile '{file_to_extract}' not found in the archive.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure the URL is valid and the server supports HTTP Range requests.")