{"id":5141,"library":"cached-path","title":"cached-path","description":"cached-path is a Python file utility library that provides a unified, simple interface for accessing both local and remote files. It automatically downloads and caches remote resources, making them available as local file paths. Currently at version 1.8.10, the library maintains an active development pace with frequent patch and minor releases to address compatibility and add new features.","status":"active","version":"1.8.10","language":"en","source_language":"en","source_url":"https://github.com/allenai/cached_path","tags":["file-management","caching","remote-files","cloud-storage"],"install":[{"cmd":"pip install cached-path","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required for accessing resources from AWS S3 (s3:// scheme).","package":"boto3","optional":true},{"reason":"Required for accessing resources from Google Cloud Storage (gs:// scheme).","package":"google-cloud-storage","optional":true},{"reason":"Required for accessing resources from HuggingFace Hub (hf:// scheme).","package":"huggingface-hub","optional":true},{"reason":"Required for accessing resources from Beaker (beaker:// scheme).","package":"beaker-py","optional":true}],"imports":[{"symbol":"cached_path","correct":"from cached_path import cached_path"},{"symbol":"get_cache_dir","correct":"from cached_path import get_cache_dir"}],"quickstart":{"code":"import os\nfrom cached_path import cached_path\n\n# Download and cache a remote file\nremote_url = \"https://raw.githubusercontent.com/allenai/cached_path/main/README.md\"\nlocal_path = cached_path(remote_url)\nprint(f\"Cached file path: {local_path}\")\nassert os.path.exists(local_path)\n\n# Example with an archive, extracting it\narchive_url = \"https://github.com/allenai/cached_path/releases/download/v0.1.0/cached_path-0.1.0.tar.gz\"\nextracted_dir = cached_path(archive_url, extract_archive=True)\nprint(f\"Extracted archive directory: {extracted_dir}\")\nassert os.path.isdir(extracted_dir)\n\n# Clean up (optional, for demonstration)\n# import shutil\n# shutil.rmtree(os.path.dirname(local_path))\n# shutil.rmtree(extracted_dir)","lang":"python","description":"This quickstart demonstrates how to use `cached_path()` to download and cache a remote file, and how to extract an archive. It verifies that the returned paths exist locally."},"warnings":[{"fix":"Upgrade cached-path to version 1.7.2 or later to ensure compatibility with recent `boto3` and `botocore` releases.","message":"Older versions of cached-path (pre-1.7.2) had incompatibility issues with `boto3/botocore >=1.37.34`. Upgrading `boto3` or `botocore` without updating `cached-path` could lead to errors when accessing S3 resources.","severity":"breaking","affected_versions":"<1.7.2"},{"fix":"If you rely on capturing download progress, redirect `stderr` instead of `stdout` or configure a custom progress display.","message":"As of v1.8.10, the default progress bar output for downloads was changed to write to `stderr` instead of `stdout`. Scripts redirecting `stdout` might no longer capture progress information.","severity":"gotcha","affected_versions":">=1.8.10"},{"fix":"Upgrade to version 1.8.9 or later to benefit from improved data integrity during file caching operations.","message":"Version 1.8.9 introduced a fix to ensure filesystem sync (`os.fdatasync`) when replacing temporary files with permanent ones. While a data integrity improvement, it implies that prior versions could be susceptible to data loss or corruption in case of power failure or system crash during a file download/replacement operation.","severity":"gotcha","affected_versions":"<1.8.9"},{"fix":"Keep `cached-path` updated to the latest version, especially when updating `huggingface-hub`, to ensure full compatibility with the `hf://` scheme.","message":"Multiple versions (1.6.7, 1.7.3, 1.7.4) added support for newer `HuggingFace-Hub` versions. Using an older `cached-path` with a very new `huggingface-hub` might lead to unexpected behavior or errors when using the `hf://` scheme. Always ensure `cached-path` is kept up-to-date when working with `huggingface-hub`.","severity":"gotcha","affected_versions":"<1.7.4"},{"fix":"Understand the distinction: `extract_archive=True` extracts the full archive, while `url!path/to/file` extracts and points to a specific entry within the archive.","message":"When using `extract_archive=True`, `cached_path` extracts the entire archive and returns the path to the extracted directory. To access a specific file or sub-directory within an archive without extracting everything, append `!/path/to/file` to the URL (e.g., `hf://model!config.json`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be explicit about your desired cache directory and understand the precedence: argument > `set_cache_dir()` > environment variable.","message":"The default cache directory is `~/.cache/cached_path/`. This can be overridden globally via the `CACHED_PATH_CACHE_ROOT` environment variable, programmatically with `set_cache_dir()`, or per-call using the `cache_dir` argument to `cached_path()`. Conflicting settings might lead to unexpected cache locations.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}