SkyPilot
SkyPilot is an open-source framework that allows users to run AI workloads on any cloud, simplifying multi-cloud deployments, optimizing costs, and improving performance. It provides a unified interface for launching, managing, and scaling GPU clusters and jobs across AWS, GCP, Azure, Kubernetes, and more. SkyPilot is actively developed, with frequent patch releases and major feature updates every few weeks to months. The current stable version is 0.12.0.
Warnings
- breaking After upgrading the SkyPilot CLI/SDK (e.g., `pip install --upgrade skypilot[...]`), you almost always need to upgrade your SkyPilot API server. New features, stability fixes, and bug resolutions often require the API server to be in sync. Neglecting this can lead to 'database is locked' errors, unresponsive API servers, or unexpected behavior.
- gotcha Installing SkyPilot without specifying cloud provider extras (e.g., `[aws,gcp]`) will result in a minimal installation that lacks the necessary cloud-specific drivers and integrations to launch resources on any cloud. This means core functionality will not work.
- gotcha SkyPilot leverages existing cloud provider CLIs (e.g., AWS CLI, `gcloud` for GCP, Azure CLI) for authentication and resource management. These CLIs must be installed, configured, and authenticated with valid credentials on the machine where SkyPilot CLI/SDK commands are executed.
- gotcha SkyPilot can have extensive and complex dependency trees due to its integration with many cloud providers and ML tools. This can occasionally lead to dependency conflicts, as seen with past issues like specific `uvicorn` versions.
Install
-
pip install 'skypilot[aws,gcp]' # Add other clouds as needed: azure, kubernetes, oci, lambda, gcp_tpu, slurm -
pip install 'skypilot[all]' # Install with all cloud providers
Imports
- sky
import sky
- Task
from skypilot import Task
Quickstart
import sky
# Define a simple task to run on a cluster
# This task will echo a message and try to list a GPU (if available)
task = sky.Task(
'echo "Hello from SkyPilot on $(hostname) with GPU $(nvidia-smi -L)"',
workdir='.',
num_gpus=1, # Request 1 GPU. SkyPilot will find a cloud with a GPU instance.
setup='pip install pandas' # Example setup; replace with your actual dependencies
)
# Launch a cluster named 'my-quickstart-cluster' and run the task.
# SkyPilot will provision resources on a chosen cloud (e.g., AWS, GCP)
# based on your configured cloud credentials and resource availability.
# Ensure your cloud provider CLIs (e.g., AWS CLI, gcloud CLI) are installed and authenticated.
print("Launching SkyPilot cluster (this may take a few minutes)...")
sky.launch(task, cluster_name='my-quickstart-cluster')
print("Cluster launched and task executed. Run `sky status` to check.")
# To tear down the cluster after use:
# print("Tearing down cluster...")
# sky.down('my-quickstart-cluster')
# print("Cluster torn down.")