cached-path
cached-path is a Python file utility library that provides a unified, simple interface for accessing both local and remote files. It automatically downloads and caches remote resources, making them available as local file paths. Currently at version 1.8.10, the library maintains an active development pace with frequent patch and minor releases to address compatibility and add new features.
Warnings
- breaking Older versions of cached-path (pre-1.7.2) had incompatibility issues with `boto3/botocore >=1.37.34`. Upgrading `boto3` or `botocore` without updating `cached-path` could lead to errors when accessing S3 resources.
- gotcha As of v1.8.10, the default progress bar output for downloads was changed to write to `stderr` instead of `stdout`. Scripts redirecting `stdout` might no longer capture progress information.
- gotcha Version 1.8.9 introduced a fix to ensure filesystem sync (`os.fdatasync`) when replacing temporary files with permanent ones. While a data integrity improvement, it implies that prior versions could be susceptible to data loss or corruption in case of power failure or system crash during a file download/replacement operation.
- gotcha Multiple versions (1.6.7, 1.7.3, 1.7.4) added support for newer `HuggingFace-Hub` versions. Using an older `cached-path` with a very new `huggingface-hub` might lead to unexpected behavior or errors when using the `hf://` scheme. Always ensure `cached-path` is kept up-to-date when working with `huggingface-hub`.
- gotcha When using `extract_archive=True`, `cached_path` extracts the entire archive and returns the path to the extracted directory. To access a specific file or sub-directory within an archive without extracting everything, append `!/path/to/file` to the URL (e.g., `hf://model!config.json`).
- gotcha The default cache directory is `~/.cache/cached_path/`. This can be overridden globally via the `CACHED_PATH_CACHE_ROOT` environment variable, programmatically with `set_cache_dir()`, or per-call using the `cache_dir` argument to `cached_path()`. Conflicting settings might lead to unexpected cache locations.
Install
-
pip install cached-path
Imports
- cached_path
from cached_path import cached_path
- get_cache_dir
from cached_path import get_cache_dir
Quickstart
import os
from cached_path import cached_path
# Download and cache a remote file
remote_url = "https://raw.githubusercontent.com/allenai/cached_path/main/README.md"
local_path = cached_path(remote_url)
print(f"Cached file path: {local_path}")
assert os.path.exists(local_path)
# Example with an archive, extracting it
archive_url = "https://github.com/allenai/cached_path/releases/download/v0.1.0/cached_path-0.1.0.tar.gz"
extracted_dir = cached_path(archive_url, extract_archive=True)
print(f"Extracted archive directory: {extracted_dir}")
assert os.path.isdir(extracted_dir)
# Clean up (optional, for demonstration)
# import shutil
# shutil.rmtree(os.path.dirname(local_path))
# shutil.rmtree(extracted_dir)