{"library":"pooch","title":"Pooch: A friend to fetch your data files","description":"Pooch is a Python library designed to simplify the management and fetching of data files. It automatically downloads files from remote servers (supporting HTTP, FTP, Zenodo, and Figshare) only when they are needed, stores them in a local cache, and ensures data integrity through SHA256 hash checks. This makes it ideal for Python libraries distributing sample datasets or for scientists managing research data. The current version is 1.9.0, with a release cadence that typically spans several months to a year between minor versions, reflecting ongoing development and stability.","status":"active","version":"1.9.0","language":"en","source_language":"en","source_url":"https://github.com/fatiando/pooch","tags":["data management","caching","download","sample data","integrity check"],"install":[{"cmd":"pip install pooch","lang":"bash","label":"PyPI"},{"cmd":"conda install conda-forge::pooch","lang":"bash","label":"Conda"}],"dependencies":[],"imports":[{"note":"The primary entry point for configuring and using Pooch is the `create` function, which returns a Pooch instance.","symbol":"create","correct":"import pooch\nmy_pooch = pooch.create(...)"},{"note":"Used to get the default OS-specific cache directory for your application.","symbol":"os_cache","correct":"import pooch\ncache_path = pooch.os_cache('my_project_name')"}],"quickstart":{"code":"import pooch\nimport os\n\n# A dummy URL for demonstration. In a real scenario, this would point to your hosted data.\n# For a runnable example, we'll use a small file from fatiando/pooch's actual data.\ndata_url = \"https://github.com/fatiando/pooch/raw/v1.9.0/pooch/tests/data/tiny-data.txt\"\ndata_hash = \"sha256:d48d4841b5d197607a9b0c7a522533c095311e3895e5330a9e25d2c510800b50\"\n\n# Configure a new Pooch instance\n# We use a temporary directory for this example to avoid cluttering the actual cache.\n# In a real application, you'd likely use pooch.os_cache(\"your_app_name\")\n\n# Create a temporary directory for the cache\n# This is a workaround for the quickstart to be self-contained and runnable without permissions issues.\n# In a real library, use pooch.os_cache() to get the system default cache dir.\ncache_dir = os.environ.get('POOCH_TEST_CACHE', None)\nif not cache_dir:\n    import tempfile\n    temp_dir = tempfile.TemporaryDirectory()\n    cache_dir = temp_dir.name\nelse:\n    temp_dir = None # Manage cleanup later if not using TemporaryDirectory\n\n\nregistry = {\"tiny-data.txt\": data_hash}\n\nmy_pooch = pooch.create(\n    path=cache_dir,\n    base_url=\"https://github.com/fatiando/pooch/raw/{version}/pooch/tests/data/\",\n    version=\"v1.9.0\", # Match the version of the data you want to fetch\n    registry=registry\n)\n\n# Fetch the data file\nfile_path = my_pooch.fetch(\"tiny-data.txt\")\n\nprint(f\"Data file downloaded to: {file_path}\")\n\nwith open(file_path, \"r\") as f:\n    content = f.read()\n    print(f\"Content of the file: {content.strip()}\")\n\n# Clean up the temporary directory if it was created\nif temp_dir:\n    temp_dir.cleanup()\n","lang":"python","description":"This quickstart demonstrates how to set up a `Pooch` instance, register a data file with its URL and SHA256 hash, and then fetch it. `Pooch` will automatically download the file if it's not present or if its hash doesn't match, otherwise it will return the path to the cached file. The example uses a temporary directory for the cache for demonstration purposes; in a production library, `pooch.os_cache('your_library_name')` is recommended for persistent caching."},"warnings":[{"fix":"Use a reliable method to compute the SHA256 hash of your hosted file. Pooch's `get_sha256()` utility can assist in verifying or generating hashes for local files before adding them to the registry.","message":"Incorrect SHA256 hash will trigger re-downloads or errors. Always ensure the hash in your registry exactly matches the file's content.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Structure your remote data, for example on GitHub, so that specific versions (e.g., tags like `v1.0.0`) correspond to paths used in `base_url`. The `{version}` placeholder in `base_url` is crucial for this.","message":"When using `base_url` with versioning, ensure your remote data repository structure reflects the version string provided to `pooch.create()`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify the integrity of the remote file and update the hash in your `pooch.create()` registry if the file has legitimately changed. If the file is corrupted, re-upload a correct version.","message":"If `Pooch` downloads a file but its hash doesn't match the registry, it will raise an exception (usually `ValueError`), indicating possible data corruption or an outdated hash.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Document clearly for users any environment variables that can override the cache location. Ensure the target directory is writable by the user running the application.","message":"Using an environment variable to override the cache `path` (e.g., `MYPACKAGE_DATA_DIR`) can lead to unexpected behavior if not managed carefully, as it might point to a non-existent or inaccessible directory.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-05T00:00:00.000Z","next_check":"2026-07-04T00:00:00.000Z"}