Parfive: Parallel File Downloader
Parfive is an asynchronous HTTP and FTP parallel file downloader for Python. It leverages `asyncio` to efficiently download multiple files concurrently, providing features like progress bars, connection throttling, and retry mechanisms. The current version, 2.3.1, offers a robust asynchronous API for managing large-scale file transfers. Releases are made periodically to add features, address issues, and ensure compatibility with newer Python versions, maintaining an active development status.
Common errors
-
RuntimeError: Event loop is already running
cause Attempting to call `asyncio.run()` in an environment where an event loop is already active (e.g., inside a Jupyter notebook or another async function).fixIf in Jupyter, use `await` directly at the top level (if supported) or `nest_asyncio.apply()`. If within another async function, just `await` the parfive call without `asyncio.run()`. -
TypeError: object asyncio.tasks.Task can't be awaited
cause Forgetting the `await` keyword before calling an asynchronous function or method, specifically `downloader.download()`.fixEnsure you use `await downloader.download()` to properly execute the asynchronous download process and retrieve its results. -
AttributeError: module 'parfive' has no attribute 'Downloader'
cause Trying to access `parfive.Downloader` when `Downloader` is directly exported by the package and should be imported with `from parfive import Downloader`.fixChange `parfive.Downloader` to `Downloader` after using `from parfive import Downloader` or ensure you are importing the `Downloader` class correctly.
Warnings
- breaking Version 2.0.0 introduced significant breaking changes, migrating the API to be fully `asyncio` native. The `Downloader` class signature and the return value of `download()` changed from a simple list of paths to a `parfive.results.Results` object which is a specialized list subclass.
- gotcha All `parfive` download operations are asynchronous and must be executed within an `asyncio` event loop. Forgetting to `await` the `downloader.download()` call or attempting to run it outside an event loop will lead to runtime errors.
- gotcha By default, `parfive` skips downloading a file if it already exists at the target path and has the correct size. While often desired, this can lead to stale files if the remote content changes without a size change. It also means you need to manage unique filenames or explicitly remove existing files if fresh downloads are always required.
- gotcha The `max_conn` (default 5) and `max_downloads` (default 10) parameters of `Downloader` control concurrency. Setting these too high can exhaust system resources (file descriptors, network sockets) or trigger rate limits on target servers, leading to slower downloads or connection errors.
Install
-
pip install parfive
Imports
- Downloader
import parfive.Downloader
from parfive import Downloader
Quickstart
import parfive
import asyncio
import os
async def main():
# Define some public URLs to download
urls = [
"https://raw.githubusercontent.com/sunpy/parfive/main/README.md",
"https://raw.githubusercontent.com/sunpy/parfive/main/LICENSE"
]
# Create a directory for downloads if it doesn't exist
download_dir = "parfive_downloads"
os.makedirs(download_dir, exist_ok=True)
# Initialize the Downloader with a maximum of 5 concurrent connections
# and display a progress bar.
downloader = parfive.Downloader(max_conn=5, progress=True)
# Add URLs to the downloader, specifying the local path
for url in urls:
downloader.add_url(url, path=download_dir)
# Execute the downloads asynchronously
print(f"Starting download of {len(urls)} files to '{download_dir}'...")
results = await downloader.download()
# Print the paths of the downloaded files
print("Downloaded files:")
for filepath in results:
print(f"- {filepath}")
# Optionally clean up the downloaded files
# for filepath in results:
# os.remove(filepath)
# os.rmdir(download_dir)
if __name__ == "__main__":
asyncio.run(main())