CacheControl
CacheControl provides an HTTP caching layer for the popular `requests` library, mimicking the caching algorithms found in `httplib2`. It aims to make `requests` sessions thread-safe and efficient by persisting HTTP responses according to cache-control headers. The library is actively maintained, with frequent updates addressing Python version compatibility, bug fixes, and serialization improvements.
Common errors
-
ModuleNotFoundError: No module named 'pip._vendor.cachecontrol'
cause This error typically occurs when your `pip` installation is corrupted or an internal dependency (like `cachecontrol`, which `pip` uses) is missing or cannot be found within `pip`'s vendorized packages. It's not usually a direct issue with the `cachecontrol` library itself but rather with `pip`.fixReinstalling or upgrading `pip` often resolves this. You can try `python -m ensurepip` or `python -m pip install --upgrade pip --force-reinstall`. -
AttributeError: 'dict' object has no attribute 'cache_control'
cause This error arises when you attempt to access a `.cache_control` attribute on a dictionary object, usually when trying to get caching information directly from a `requests` response object or a generic dictionary instead of the `CacheControl` wrapped session or its response object, which exposes such attributes.fixEnsure you are interacting with a `CacheControl` wrapped session and accessing the cache control headers appropriately, typically through HTTP response headers or methods provided by `CacheControl`, not directly as an attribute on a raw response dictionary. If attempting to check `Cache-Control` HTTP headers, access `response.headers['Cache-Control']` on a standard `requests` response. -
CacheControl not caching (requests are not being cached as expected)
cause By default, `CacheControl` uses an in-memory cache, meaning the cache is cleared when the program exits. If you expect caching to persist across different runs of your application, you need to explicitly configure a persistent cache backend (e.g., `FileCache`).fixInitialize `CacheControl` with a persistent cache backend, such as `FileCache`. ```python import requests from cachecontrol import CacheControl from cachecontrol.caches import FileCache sess = requests.session() cached_sess = CacheControl(sess, cache=FileCache('.web_cache')) response = cached_sess.get('https://example.com') ```
Warnings
- breaking Python 3.8 support was dropped in v0.14.3. Python versions older than 3.10 are no longer officially supported as of v0.14.4. Ensure your environment meets the `>=3.10` requirement.
- breaking Serialization format changes: Version `0.13.1` removed support for older serialization formats (v1 and v2). Caches created with very old versions of `cachecontrol` (before `msgpack` was introduced around v0.12.0) will be unreadable after upgrading.
- gotcha The `msgpack` dependency has a version constraint (`<2.0.0`) since `v0.14.0`. If other libraries in your project require `msgpack >= 2.0.0`, you might encounter dependency conflicts.
- gotcha Older versions of `cachecontrol` (pre-`v0.12.13`/`v0.13.0`) might have compatibility issues with `requests` sessions using `urllib3 2.0+`, leading to `IncompleteRead` errors.
- gotcha A race condition when overwriting cache entries was fixed in `v0.14.2`. Concurrent writes to the same cache file could lead to corruption in earlier versions.
- gotcha Memory usage with `DictCache` or older `FileCache` implementations can be excessive for large binary responses. `SeparateBodyFileCache` was introduced for better memory efficiency by streaming large bodies.
Install
-
pip install cachecontrol -
pip install cachecontrol[filecache]
Imports
- CacheControl
from cachecontrol import CacheControl
- FileCache
from cachecontrol.caches.file_cache import FileCache
- CacheControlAdapter
from cachecontrol.adapter import CacheControlAdapter
Quickstart
import requests
from cachecontrol import CacheControl
from cachecontrol.caches.file_cache import FileCache
# Create a standard requests session
sess = requests.Session()
# Wrap the session with CacheControl using a FileCache for persistent storage
# Replace '.web_cache' with your desired cache directory
cached_sess = CacheControl(sess, cache=FileCache('.web_cache'))
# Make a request - the response will be cached if HTTP headers allow
response = cached_sess.get('https://httpbin.org/cache/60')
print(f"First request status: {response.status_code}")
print(f"From cache (should be False): {getattr(response, 'from_cache', False)}")
# Make the same request again - it should now be served from cache
response = cached_sess.get('https://httpbin.org/cache/60')
print(f"Second request status: {response.status_code}")
print(f"From cache (should be True): {getattr(response, 'from_cache', False)}")
# Clean up the cache directory (optional for a real app)
# import shutil
# shutil.rmtree('.web_cache', ignore_errors=True)