Dask Cloud Provider
Dask Cloud Provider (dask-cloudprovider) is a Python library that enables native cloud integration for Dask. It provides classes for constructing and managing ephemeral Dask clusters on various cloud platforms, including AWS, GCP, Azure, DigitalOcean, Hetzner, IBM Cloud, OpenStack, and Nebius. It also includes plugins that make Dask components cloud-aware. The library aims to simplify the deployment and operation of Dask clusters on the cloud. As of its latest version 2025.9.0, released in September 2025, it is actively maintained with releases published automatically when tags are pushed to GitHub.
Common errors
-
ModuleNotFoundError: No module named 'dask_cloudprovider.aws'
cause The cloud-specific dependencies for AWS were not installed. The base `dask-cloudprovider` package only installs common components.fixInstall `dask-cloudprovider` with the `aws` extra: `pip install dask-cloudprovider[aws]`. -
KeyError: 'Could not find a Dask configuration value at cloudprovider.aws.region'
cause The required region for AWS (or similar configuration for other providers) was not specified or found in environment variables/Dask config.fixSet the region via an environment variable (e.g., `export DASK_CLOUDPROVIDER__AWS__REGION='us-east-1'`), in a Dask configuration file, or pass it directly to the cluster manager constructor (e.g., `EC2Cluster(region='us-east-1')`). -
TimeoutError: ... Failed to connect to scheduler
cause The Dask client could not connect to the scheduler. This is often due to network issues (firewall/security groups blocking ports 8786/8787), incorrect IP addresses, or the scheduler failing to start.fixVerify that your cloud provider's security groups allow inbound TCP traffic on ports 8786 (scheduler) and 8787 (dashboard) from your client's IP. Check cloud logs for scheduler startup errors. Ensure the `Client(cluster)` call correctly resolves the scheduler address. -
An error occurred during resource creation: ... InsufficientInstanceCapacity ...
cause The cloud provider could not provision the requested instances, often due to a lack of available resources in the specified region/zone, or hitting service limits.fixTry a different instance type, a different availability zone/region, or request a service limit increase from your cloud provider. Reduce the number of requested workers.
Warnings
- gotcha Failing to explicitly close cluster resources can lead to unexpected cloud costs. While dask-cloudprovider attempts garbage collection, it is not guaranteed.
- gotcha Cloud provider-specific dependencies (e.g., `boto3` for AWS, `google-cloud-sdk` for GCP) are not installed by default with `pip install dask-cloudprovider`. Attempting to use a cluster manager for an uninstalled provider will result in a `ModuleNotFoundError` or similar import error.
- gotcha By default, many cluster managers expose the Dask scheduler and dashboard to the internet via a public IP address for ease of use. This can pose a security risk in production environments.
- gotcha Authentication credentials for your chosen cloud provider (e.g., AWS access keys, GCP project ID/service accounts, Azure service principals) must be pre-configured in your environment or via Dask's configuration system for cluster creation to succeed.
Install
-
pip install dask-cloudprovider -
pip install dask-cloudprovider[all] -
pip install dask-cloudprovider[aws]
Imports
- EC2Cluster
from dask_cloudprovider.aws import EC2Cluster
- FargateCluster
from dask_cloudprovider.aws import FargateCluster
- GCPCluster
from dask_cloudprovider.gcp import GCPCluster
- AzureVMCluster
from dask_cloudprovider.azure import AzureVMCluster
- Client
from dask.distributed import Client
Quickstart
import os
from dask_cloudprovider.aws import FargateCluster
from dask.distributed import Client
# Ensure AWS credentials are configured (e.g., via AWS CLI or env vars)
# For a real deployment, consider setting DASK_CLOUDPROVIDER__AWS__REGION
# and other specifics via environment variables or a Dask config file.
# Example: os.environ['AWS_ACCESS_KEY_ID'] = 'YOUR_ACCESS_KEY'
# os.environ['AWS_SECRET_ACCESS_KEY'] = 'YOUR_SECRET_KEY'
# os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
try:
# Create a Dask cluster using AWS Fargate
# This will provision cloud resources
# Using a context manager ensures resources are closed automatically
with FargateCluster(n_workers=1, worker_cpu=1024, worker_memory=2048) as cluster:
print(f"Dask Dashboard link: {cluster.dashboard_link}")
# Connect a Dask client to the cluster
client = Client(cluster)
print("Dask Client connected.")
# Perform some Dask computation
futures = client.map(lambda x: x * x, range(10))
results = client.gather(futures)
print(f"Computation results: {results}")
client.close()
print("Dask Client closed.")
print("Dask Cluster resources automatically closed (due to context manager).")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your AWS credentials are configured and that you have sufficient permissions.")