SkyPilot Nightly
SkyPilot is a system designed to run, manage, and scale AI workloads on any AI infrastructure. It offers a unified interface to leverage reserved GPUs, Kubernetes clusters, Slurm clusters, or over 20 cloud providers, abstracting away complex infrastructure burdens. The `skypilot-nightly` package provides the very latest features, bug fixes, and development builds, focusing on maximizing cost savings, GPU availability, and providing managed execution for AI tasks.
Common errors
-
sqlite3.OperationalError: database is locked
cause The SQLite database used by the local SkyPilot API server is experiencing high contention or is locked by another process/thread. [cite: 0 (release notes for v0.11.1), 36, 38, 39]fixRestart the SkyPilot API server (`sky api stop; sky api start`). If the issue persists with heavy usage, configure SkyPilot to use an external PostgreSQL database for the API server. For temporary local fixes, reduce concurrent `sky` commands. -
AttributeError: 'Config' object has no attribute 'setup_event_loop'
cause This error occurs due to a breaking change in `uvicorn` (version 0.36.0 and higher) where `Config.setup_event_loop` was removed and replaced. SkyPilot v0.10.3.post1 explicitly pinned `uvicorn` to mitigate this. [cite: 0 (release notes for v0.10.3.post1), 23, 27, 34, 37]fixUpgrade your `skypilot-nightly` installation to the latest version, which should include the necessary `uvicorn` dependency pinning or compatibility fixes. If not, manually pin `uvicorn<0.36.0`. -
Permission denied (publickey)
cause SSH authentication failed, often due to incorrect file permissions on SSH keys or the `.ssh` directory on the local machine or remote host.fixSet appropriate permissions: `chmod 700 ~/.ssh/` for the directory and `chmod 600 ~/.ssh/id_rsa` for your private key. Ensure the public key on the remote server (`~/.ssh/authorized_keys`) also has correct permissions (`chmod 600 ~/.ssh/authorized_keys`). -
Error from server (Forbidden): roles.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:..." cannot list resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "..." (or similar 403 Kubernetes API errors)
cause SkyPilot's Kubernetes integration is encountering a 403 Forbidden error when trying to interact with the Kubernetes API, typically due to insufficient RBAC permissions for the service account being used. This could be transient or persistent. [cite: 0 (release notes for v0.10.3.post2), 31, 32, 33]fixEnsure that the Kubernetes service account SkyPilot uses (or your `kubeconfig` context if running locally) has the necessary `Role` and `RoleBinding` permissions to manage resources (pods, services, deployments, etc.) in the target namespace. For transient errors, SkyPilot v0.10.3.post2 and later include retries and fallbacks. Verify `kubectl get nodes` works without errors.
Warnings
- breaking SkyPilot v0.12.0 and newer releases (including nightly) require Python 3.9 or higher. Python 3.7 and 3.8 are no longer supported.
- breaking The `sky.jobs.queue(version=1)` API is deprecated in v0.12.0 and will be removed in v0.13. It returns simpler metadata.
- gotcha When using a remote SkyPilot API server, upgrading the client library (skypilot-nightly) often requires upgrading the API server deployment as well to ensure compatibility and access new features/fixes.
- gotcha Nightly builds (`skypilot-nightly`) are development versions and may contain instabilities, bugs, or undocumented breaking changes. They are not recommended for production environments.
- gotcha SkyPilot's local API server uses SQLite by default, which can lead to `sqlite3.OperationalError: database is locked` under high concurrency or many simultaneous operations. [cite: 0 (release notes for v0.11.1), 36, 38, 39]
- gotcha Incorrect file permissions for SSH keys can cause `Permission denied (publickey)` errors when SkyPilot tries to SSH into remote clusters. [cite: 0 (release notes for v0.11.2rc1), 28, 29, 30]
Install
-
pip install skypilot-nightly -
pip install "skypilot-nightly[aws,gcp,azure,kubernetes]" -
uv pip install "skypilot-nightly[kubernetes]"
Imports
- sky
import sky
- Task
from sky import Task
- Resources
from sky import Resources
Quickstart
import sky
task = sky.Task(
run='echo "Hello, SkyPilot!"',
resources=sky.Resources(cloud=sky.AWS())
)
# Launch the task on AWS. SkyPilot will provision resources and run the command.
# For multi-cloud optimization, remove 'cloud=sky.AWS()' and let SkyPilot choose.
cluster_name = "my-first-sky-cluster"
request_id = sky.launch(task, cluster_name=cluster_name)
print(f"Launched cluster '{cluster_name}' with request ID: {request_id}")
print(f"To view logs: sky logs {cluster_name}")
print(f"To stop and delete: sky down {cluster_name}")
# Example of checking status asynchronously
# from sky.client import sdk_async as sdk
# import asyncio
# async def get_status():
# status = await sdk.status()
# print(status)
# asyncio.run(get_status())