Fastdownload

0.0.7 · active · verified Sun Apr 12

Fastdownload is a Python library designed for easily downloading, verifying, and extracting data archives. It ensures users always have the latest versions of datasets or other files by checking against file size and hash information. The library is a core component of the fastai ecosystem for managing datasets. The current version is 0.0.7, and releases occur as needed for bug fixes and new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `FastDownload` to download and extract a remote archive to a temporary directory. It also shows how to download a file without immediate extraction. The `base` parameter is used to direct all downloaded and extracted content to a specified location, overriding the default `~/.fastdownload`.

import os
import tempfile
import shutil
from pathlib import Path
from fastdownload import FastDownload

# Create a temporary directory for demonstration
tmp_dir = Path(tempfile.mkdtemp())

try:
    fd = FastDownload(base=tmp_dir)
    # URL for a small, publicly available archive (e.g., MNIST tiny dataset from fast.ai)
    test_url = 'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz'
    
    print(f"Downloading and extracting from {test_url} to {tmp_dir}...")
    extracted_path = fd.get(test_url)
    
    print(f"Successfully downloaded and extracted to: {extracted_path}")
    print(f"Contents of the extracted directory: {list(extracted_path.iterdir())}")
    
    # Example of downloading only without extraction
    print(f"\nDownloading only (no extraction) from {test_url} to {tmp_dir / 'archive'}...")
    downloaded_file_path = fd.download(test_url)
    print(f"Successfully downloaded to: {downloaded_file_path}")

finally:
    # Clean up the temporary directory
    if tmp_dir.exists():
        print(f"\nCleaning up temporary directory: {tmp_dir}")
        shutil.rmtree(tmp_dir)

view raw JSON →