Coiled Python Client
Coiled is a Python client library that simplifies scaling Python code and Dask clusters on the cloud (AWS, GCP, Azure). It handles cloud resource management, networking, and software environments, allowing data engineers and scientists to focus on their code. The library is actively maintained, with frequent updates to support new features and cloud capabilities. Its current version is 1.134.0, and it generally follows a continuous release cadence with frequent minor updates.
Warnings
- gotcha Environment synchronization can be tricky. By default, Coiled attempts to replicate your local Python environment on the remote VMs. This can sometimes lead to discrepancies if local packages are not perfectly available or compatible in the cloud environment. Explicitly defining a software environment (e.g., using `software='my-env'` or a `container='my-docker-image'`) is often more robust, especially for production or complex setups.
- gotcha Authentication and cloud setup are prerequisites. Users must run `coiled login` and `coiled setup <aws|gcp|azure>` from their CLI before using the Python client to provision resources. Skipping these steps will result in authentication or permission errors.
- gotcha Resource shutdown is not automatic by default for all resources. While Coiled clusters and functions have idle timeouts, explicitly closing `client` and `cluster` objects in your script (`client.close()`, `cluster.close()`) is a good practice to release resources promptly and avoid unexpected cloud costs, especially in interactive sessions or long-running scripts.
- gotcha Specifying cloud regions is important for data locality and cost. Not setting a specific region might lead to clusters being provisioned in a default region that is geographically distant from your data sources, incurring higher data transfer costs and increased latency.
Install
-
pip install "coiled[dask]" -
pip install coiled
Imports
- Cluster
from coiled import Cluster
- function
from coiled import function
Quickstart
import coiled
import dask.dataframe as dd
import os
# Ensure you are logged into Coiled via 'coiled login' in your terminal
# and have connected your cloud account via 'coiled setup <aws|gcp|azure>'.
# Quickstart for Dask Cluster
try:
cluster = coiled.Cluster(
n_workers=5,
software="dask-2023.12.0", # Specify a software environment or Coiled will try to sync your local env
region=os.environ.get('COILED_REGION', 'us-east-1') # Use an environment variable for region
)
client = cluster.get_client()
print(f"Dask Dashboard link: {client.dashboard_link}")
# Example Dask computation
df = dd.read_csv("s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv", assume_missing=True)
result = df.groupby('passenger_count').tip_amount.mean().compute()
print("Dask Cluster computation result:")
print(result)
client.close()
cluster.close()
print("Dask Cluster closed.")
except Exception as e:
print(f"Error with Dask Cluster quickstart: {e}")
print("Please ensure you have run 'coiled login' and 'coiled setup <cloud>' and set COILED_REGION if needed.")
# Quickstart for Serverless Function
# For serverless functions, you can also specify memory, cpu, region, etc.
# e.g., @coiled.function(memory='512 GB', cpu=128, region='us-east-2')
@coiled.function()
def my_serverless_function(x):
import time
time.sleep(2) # Simulate work
return x * 2
try:
print("\nRunning serverless function...")
future = my_serverless_function.submit(10)
serverless_result = future.result()
print(f"Serverless function result: {serverless_result}")
except Exception as e:
print(f"Error with Serverless Function quickstart: {e}")
print("Serverless functions also require 'coiled login' and cloud setup.")