Asyncio DataLoader for Python
Asyncio DataLoader is a Python port of the JavaScript DataLoader, a generic utility for efficient data fetching. It provides a consistent API over various data sources, leveraging batching to coalesce multiple individual load requests into a single operation within an event loop tick and per-request caching to prevent redundant data loads. The current version is 0.4.3, with releases occurring periodically to address bug fixes and add minor features, typically a few times a year.
Warnings
- breaking Python 3.6 support was dropped in `v0.3.0` and `v0.4.0`. Users on Python 3.6 must upgrade to Python 3.7 or newer to use these versions.
- breaking In `v0.4.0`, the `key` argument to `DataLoader.load()` no longer has a default value of `None`. Code explicitly passing `key=None` may now raise a `TypeError`.
- gotcha `aiodataloader` implements per-request, in-memory caching, not an application-wide shared cache. Creating a single, long-lived `DataLoader` instance and sharing it across multiple distinct requests or users can lead to incorrect data being served (stale data, cross-user data leaks). Instances should typically be created per web request or GraphQL execution context.
- gotcha The `batch_load_fn` must return a list of values that directly correspond (one-to-one, same order) to the list of keys it received. If a key cannot be resolved to a value, `None` must be returned at that key's corresponding position in the list.
- gotcha After a data mutation or update, any existing cached values in `DataLoader` for the modified keys may become stale. To ensure fresh data is loaded, explicitly call `loader.clear(key)` for specific keys or `loader.clear_all()` to invalidate the entire loader's cache.
- gotcha If `DataLoader` is instantiated with `cache=False` (disabling memoization caching), the `batch_load_fn` may receive duplicate keys. In this scenario, the batch function is responsible for returning a value for *each instance* of the requested key, not just unique keys.
Install
-
pip install aiodataloader
Imports
- DataLoader
from aiodataloader import DataLoader
Quickstart
import asyncio
from aiodataloader import DataLoader
# A mock batch loading function for demonstration
async def fetch_users_from_db(user_ids: list[int]) -> list[dict | None]:
print(f"Fetching users with IDs: {user_ids}")
# Simulate an async database call
await asyncio.sleep(0.01)
# In a real scenario, this would query a database (e.g., ORM, API)
users_data = {
1: {"id": 1, "name": "Alice"},
2: {"id": 2, "name": "Bob"},
3: {"id": 3, "name": "Charlie"},
}
# Important: return values in the same order as keys, with None for missing
return [users_data.get(uid) for uid in user_ids]
class UserLoader(DataLoader):
def __init__(self):
super().__init__(self.batch_load_fn)
async def batch_load_fn(self, keys: list[int]) -> list[dict | None]:
return await fetch_users_from_db(keys)
async def main():
user_loader = UserLoader()
# Load individual users concurrently
# These three loads will be coalesced into a single call to fetch_users_from_db
user1_task = user_loader.load(1)
user2_task = user_loader.load(2)
user3_task = user_loader.load(1) # This will be served from cache (for ID 1) from the first load
user1, user2, user3 = await asyncio.gather(user1_task, user2_task, user3_task)
print(f"User 1 (from first load): {user1}")
print(f"User 2: {user2}")
print(f"User 3 (from cache): {user3}")
# Example of loading many
users_many = await user_loader.load_many([2, 3, 4]) # ID 4 will result in None
print(f"Users (many): {users_many}")
if __name__ == "__main__":
asyncio.run(main())