savepagenow
savepagenow is a simple Python wrapper and command-line interface for archive.org’s 'Save Page Now' capturing service. It allows users to programmatically request that the Internet Archive save a specific URL. The library is currently at version 1.3.1 and maintains an active release cadence, primarily with dependency updates and important feature additions like authentication and rate limit documentation.
Common errors
-
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://web.archive.org/save/
cause You have exceeded the Internet Archive's rate limit for the Save Page Now service.fixReduce the frequency of your requests. Add delays between calls to `save_page_now` or implement an exponential backoff strategy. -
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://web.archive.org/save/
cause Your provided authentication credentials (API key/secret) are missing or invalid.fixEnsure that `WAYBACK_ACCESS_KEY` and `WAYBACK_SECRET_KEY` environment variables are correctly set, or that `auth_key` and `auth_secret` arguments are passed with valid credentials to `save_page_now`. -
ImportError: cannot import name 'save_page_now' from 'savepagenow'
cause Typo in the import statement or trying to import a non-existent symbol.fixUse the correct import statement: `from savepagenow import save_page_now`. The main function is directly in the top-level package. -
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteHostClosedError('Remote host closed the connection without notice'))cause A network connectivity issue occurred, either on your end, between you and the Internet Archive, or on the Internet Archive's server.fixCheck your internet connection. For intermittent issues, retry the request after a short delay. Consider implementing retry logic in your application.
Warnings
- gotcha The Internet Archive's 'Save Page Now' service has implemented rate limits. Exceeding these limits will result in HTTP 429 (Too Many Requests) errors. Rapid, consecutive requests should be avoided or spaced out.
- gotcha Authentication is required for certain types of captures, particularly private ones. If you receive 401 Unauthorized or 403 Forbidden errors, it likely means your API keys are missing or invalid, or you lack the necessary permissions.
- gotcha The Internet Archive's 'Save Page Now' service can sometimes be slow or temporarily unavailable. While `savepagenow` handles network requests, failures might occur due to upstream service issues.
Install
-
pip install savepagenow
Imports
- save_page_now
import savepagenow.save_page_now
from savepagenow import save_page_now
Quickstart
import os
from savepagenow import save_page_now
# Replace with the URL you want to archive
url_to_archive = "https://example.com/"
# Optional: Set authentication credentials via environment variables or direct arguments
# If you don't set these, captures will be public (if supported by IA)
# For private captures, you'll need a Wayback Machine access key and secret.
# WAYBACK_ACCESS_KEY='YOUR_ACCESS_KEY'
# WAYBACK_SECRET_KEY='YOUR_SECRET_KEY'
# Example using environment variables (recommended)
access_key = os.environ.get('WAYBACK_ACCESS_KEY', '')
secret_key = os.environ.get('WAYBACK_SECRET_KEY', '')
try:
if access_key and secret_key:
print(f"Attempting to save {url_to_archive} with authentication...")
archive = save_page_now(
url_to_archive,
auth_key=access_key,
auth_secret=secret_key
)
else:
print(f"Attempting to save {url_to_archive} without authentication...")
archive = save_page_now(url_to_archive)
if archive.get('archive_url'):
print(f"Page saved successfully: {archive['archive_url']}")
else:
print(f"Failed to save page. Response: {archive}")
except Exception as e:
print(f"An error occurred: {e}")