Fastdownload
Fastdownload is a Python library designed for easily downloading, verifying, and extracting data archives. It ensures users always have the latest versions of datasets or other files by checking against file size and hash information. The library is a core component of the fastai ecosystem for managing datasets. The current version is 0.0.7, and releases occur as needed for bug fixes and new features.
Warnings
- gotcha By default, fastdownload stores archives in `~/.fastdownload/archive` and extracts data to `~/.fastdownload/data`. This can fill up disk space if not managed.
- gotcha Fastdownload will automatically verify file sizes and hashes if a `download_checks.py` file is present (typically generated when publishing datasets with fastdownload). If local files don't match, it may re-download.
- gotcha There is unrelated adware also named 'Fastdownload' that can cause browser issues. Ensure you are installing the Python library `fastdownload` (by fastai) and not any other software.
- gotcha The `FastDownload` class provides `get`, `download`, and `extract` methods for different behaviors. `get` downloads and extracts, `download` only downloads, and `extract` only extracts (assuming the file is already downloaded). Confusing these can lead to unexpected behavior.
Install
-
pip install fastdownload
Imports
- FastDownload
from fastdownload import FastDownload
- download_url
from fastdownload.core import download_url
Quickstart
import os
import tempfile
import shutil
from pathlib import Path
from fastdownload import FastDownload
# Create a temporary directory for demonstration
tmp_dir = Path(tempfile.mkdtemp())
try:
fd = FastDownload(base=tmp_dir)
# URL for a small, publicly available archive (e.g., MNIST tiny dataset from fast.ai)
test_url = 'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz'
print(f"Downloading and extracting from {test_url} to {tmp_dir}...")
extracted_path = fd.get(test_url)
print(f"Successfully downloaded and extracted to: {extracted_path}")
print(f"Contents of the extracted directory: {list(extracted_path.iterdir())}")
# Example of downloading only without extraction
print(f"\nDownloading only (no extraction) from {test_url} to {tmp_dir / 'archive'}...")
downloaded_file_path = fd.download(test_url)
print(f"Successfully downloaded to: {downloaded_file_path}")
finally:
# Clean up the temporary directory
if tmp_dir.exists():
print(f"\nCleaning up temporary directory: {tmp_dir}")
shutil.rmtree(tmp_dir)