Waybackpy
Waybackpy is a Python package and command-line tool that interfaces with the Internet Archive's Wayback Machine APIs. It provides functionalities to archive web pages and retrieve various archived versions easily. The library is actively maintained, with the current version being 3.0.6, and has a consistent release cadence.
Warnings
- breaking Python 3.5 and older are no longer supported. Version 3.0.3 dropped support for Python 3.4-3.6, though Python 3.6 support was later restored in version 3.0.5. The library now explicitly requires Python >= 3.6.
- deprecated The `get()` method, which was previously used to retrieve the source code of a webpage, was deprecated in version 3.0.0. This method was considered a poor fit for the library's scope.
- gotcha Starting from version 3.0.0, Waybackpy introduced dedicated classes for each of the three main Wayback Machine APIs: `WaybackMachineSaveAPI`, `WaybackMachineCDXServerAPI`, and `WaybackMachineAvailabilityAPI`. While the older `Url` class remains functional for backward compatibility, new development should leverage these specialized classes for better clarity and access to specific API features.
- gotcha The Internet Archive recommends against using the `WaybackMachineAvailabilityAPI` due to potential performance issues. Its functionality is largely covered by the `WaybackMachineCDXServerAPI`.
Install
-
pip install waybackpy
Imports
- WaybackMachineSaveAPI
from waybackpy import WaybackMachineSaveAPI
- WaybackMachineCDXServerAPI
from waybackpy import WaybackMachineCDXServerAPI
- Url
from waybackpy import Url
Quickstart
import os
from waybackpy import WaybackMachineSaveAPI
url = "https://www.example.com"
# Use a descriptive User-Agent for better API interaction
user_agent = os.environ.get('WAYBACKPY_USER_AGENT', 'MyCustomApp (https://mycustomapp.com)')
save_api = WaybackMachineSaveAPI(url, user_agent)
try:
archive_url = save_api.save()
print(f"Page archived successfully: {archive_url}")
except Exception as e:
print(f"Error archiving page: {e}")