{"id":8926,"library":"dask-cloudprovider","title":"Dask Cloud Provider","description":"Dask Cloud Provider (dask-cloudprovider) is a Python library that enables native cloud integration for Dask. It provides classes for constructing and managing ephemeral Dask clusters on various cloud platforms, including AWS, GCP, Azure, DigitalOcean, Hetzner, IBM Cloud, OpenStack, and Nebius. It also includes plugins that make Dask components cloud-aware. The library aims to simplify the deployment and operation of Dask clusters on the cloud. As of its latest version 2025.9.0, released in September 2025, it is actively maintained with releases published automatically when tags are pushed to GitHub.","status":"active","version":"2025.9.0","language":"en","source_language":"en","source_url":"https://github.com/dask/dask-cloudprovider","tags":["dask","cloud","distributed computing","aws","gcp","azure","digitalocean","hetzner","ibm","openstack","nebius"],"install":[{"cmd":"pip install dask-cloudprovider","lang":"bash","label":"Basic Install"},{"cmd":"pip install dask-cloudprovider[all]","lang":"bash","label":"All Cloud Providers"},{"cmd":"pip install dask-cloudprovider[aws]","lang":"bash","label":"Specific Provider (e.g., AWS)"}],"dependencies":[{"reason":"Core dependency for distributed computing framework.","package":"dask"},{"reason":"Core dependency for Dask's distributed scheduler and workers.","package":"distributed"},{"reason":"Required for AWS cluster managers (e.g., EC2Cluster, FargateCluster).","package":"boto3","optional":true},{"reason":"Required for Google Cloud Platform cluster managers (e.g., GCPCluster).","package":"google-cloud-sdk","optional":true},{"reason":"Required for Azure cluster managers (e.g., AzureVMCluster).","package":"azure-mgmt-compute","optional":true},{"reason":"Required for DigitalOcean cluster managers (e.g., DropletCluster).","package":"digitalocean","optional":true}],"imports":[{"symbol":"EC2Cluster","correct":"from dask_cloudprovider.aws import EC2Cluster"},{"symbol":"FargateCluster","correct":"from dask_cloudprovider.aws import FargateCluster"},{"symbol":"GCPCluster","correct":"from dask_cloudprovider.gcp import GCPCluster"},{"symbol":"AzureVMCluster","correct":"from dask_cloudprovider.azure import AzureVMCluster"},{"symbol":"Client","correct":"from dask.distributed import Client"}],"quickstart":{"code":"import os\nfrom dask_cloudprovider.aws import FargateCluster\nfrom dask.distributed import Client\n\n# Ensure AWS credentials are configured (e.g., via AWS CLI or env vars)\n# For a real deployment, consider setting DASK_CLOUDPROVIDER__AWS__REGION\n# and other specifics via environment variables or a Dask config file.\n# Example: os.environ['AWS_ACCESS_KEY_ID'] = 'YOUR_ACCESS_KEY'\n# os.environ['AWS_SECRET_ACCESS_KEY'] = 'YOUR_SECRET_KEY'\n# os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n\ntry:\n    # Create a Dask cluster using AWS Fargate\n    # This will provision cloud resources\n    # Using a context manager ensures resources are closed automatically\n    with FargateCluster(n_workers=1, worker_cpu=1024, worker_memory=2048) as cluster:\n        print(f\"Dask Dashboard link: {cluster.dashboard_link}\")\n        \n        # Connect a Dask client to the cluster\n        client = Client(cluster)\n        print(\"Dask Client connected.\")\n\n        # Perform some Dask computation\n        futures = client.map(lambda x: x * x, range(10))\n        results = client.gather(futures)\n        print(f\"Computation results: {results}\")\n\n        client.close()\n        print(\"Dask Client closed.\")\n    print(\"Dask Cluster resources automatically closed (due to context manager).\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure your AWS credentials are configured and that you have sufficient permissions.\")","lang":"python","description":"This quickstart demonstrates creating an ephemeral Dask cluster on AWS Fargate, connecting a client, running a simple computation, and ensuring resources are properly de-provisioned using a context manager. Users must have their cloud provider credentials configured (e.g., AWS CLI `aws configure` for AWS) for the cluster to provision successfully."},"warnings":[{"fix":"Always call `cluster.close()` when done with the cluster, or use the cluster object within a `with` statement (context manager) to ensure automatic cleanup.","message":"Failing to explicitly close cluster resources can lead to unexpected cloud costs. While dask-cloudprovider attempts garbage collection, it is not guaranteed.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `dask-cloudprovider` with the appropriate extras, e.g., `pip install dask-cloudprovider[aws]` for AWS, `pip install dask-cloudprovider[gcp]` for Google Cloud, or `pip install dask-cloudprovider[all]` for all providers.","message":"Cloud provider-specific dependencies (e.g., `boto3` for AWS, `google-cloud-sdk` for GCP) are not installed by default with `pip install dask-cloudprovider`. Attempting to use a cluster manager for an uninstalled provider will result in a `ModuleNotFoundError` or similar import error.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the `security` section of the dask-cloudprovider documentation for your specific cluster manager. Configure appropriate security groups to restrict access (e.g., to a specific VPC or IP range), or disable public exposure if running within a trusted network.","message":"By default, many cluster managers expose the Dask scheduler and dashboard to the internet via a public IP address for ease of use. This can pose a security risk in production environments.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Before running dask-cloudprovider code, ensure your cloud credentials are set up. This typically involves using the cloud provider's CLI tools (e.g., `aws configure`, `gcloud auth login`), setting environment variables (e.g., `AWS_ACCESS_KEY_ID`), or configuring a Dask `cloudprovider.yaml` file.","message":"Authentication credentials for your chosen cloud provider (e.g., AWS access keys, GCP project ID/service accounts, Azure service principals) must be pre-configured in your environment or via Dask's configuration system for cluster creation to succeed.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install `dask-cloudprovider` with the `aws` extra: `pip install dask-cloudprovider[aws]`.","cause":"The cloud-specific dependencies for AWS were not installed. The base `dask-cloudprovider` package only installs common components.","error":"ModuleNotFoundError: No module named 'dask_cloudprovider.aws'"},{"fix":"Set the region via an environment variable (e.g., `export DASK_CLOUDPROVIDER__AWS__REGION='us-east-1'`), in a Dask configuration file, or pass it directly to the cluster manager constructor (e.g., `EC2Cluster(region='us-east-1')`).","cause":"The required region for AWS (or similar configuration for other providers) was not specified or found in environment variables/Dask config.","error":"KeyError: 'Could not find a Dask configuration value at cloudprovider.aws.region'"},{"fix":"Verify that your cloud provider's security groups allow inbound TCP traffic on ports 8786 (scheduler) and 8787 (dashboard) from your client's IP. Check cloud logs for scheduler startup errors. Ensure the `Client(cluster)` call correctly resolves the scheduler address.","cause":"The Dask client could not connect to the scheduler. This is often due to network issues (firewall/security groups blocking ports 8786/8787), incorrect IP addresses, or the scheduler failing to start.","error":"TimeoutError: ... Failed to connect to scheduler"},{"fix":"Try a different instance type, a different availability zone/region, or request a service limit increase from your cloud provider. Reduce the number of requested workers.","cause":"The cloud provider could not provision the requested instances, often due to a lack of available resources in the specified region/zone, or hitting service limits.","error":"An error occurred during resource creation: ... InsufficientInstanceCapacity ..."}]}