{"id":7723,"library":"skypilot-nightly","title":"SkyPilot Nightly","description":"SkyPilot is a system designed to run, manage, and scale AI workloads on any AI infrastructure. It offers a unified interface to leverage reserved GPUs, Kubernetes clusters, Slurm clusters, or over 20 cloud providers, abstracting away complex infrastructure burdens. The `skypilot-nightly` package provides the very latest features, bug fixes, and development builds, focusing on maximizing cost savings, GPU availability, and providing managed execution for AI tasks.","status":"active","version":"1.0.0.dev20260415","language":"en","source_language":"en","source_url":"https://github.com/skypilot-org/skypilot","tags":["cloud","mlops","gpu","multi-cloud","ai","llm","orchestration","nightly","kubernetes","slurm"],"install":[{"cmd":"pip install skypilot-nightly","lang":"bash","label":"Basic install"},{"cmd":"pip install \"skypilot-nightly[aws,gcp,azure,kubernetes]\"","lang":"bash","label":"Install with common cloud/orchestration extras"},{"cmd":"uv pip install \"skypilot-nightly[kubernetes]\"","lang":"bash","label":"Install with uv (Python 3.7-3.13 supported)"}],"dependencies":[{"reason":"Enables support for AWS cloud resources.","package":"skypilot[aws]","optional":true},{"reason":"Enables support for Google Cloud Platform resources.","package":"skypilot[gcp]","optional":true},{"reason":"Enables support for Microsoft Azure cloud resources.","package":"skypilot[azure]","optional":true},{"reason":"Enables support for Kubernetes clusters.","package":"skypilot[kubernetes]","optional":true},{"reason":"Installs dependencies for all supported cloud providers and orchestration systems. Use with caution due to large number of dependencies.","package":"skypilot[all]","optional":true}],"imports":[{"note":"The primary entry point for SkyPilot's Python SDK, used to access tasks, resources, and launch functions.","symbol":"sky","correct":"import sky"},{"note":"Represents a SkyPilot task, defining run commands, setup, and dependencies.","symbol":"Task","correct":"from sky import Task"},{"note":"Used to specify the cloud resources (e.g., accelerators, instance types) required for a Task.","symbol":"Resources","correct":"from sky import Resources"}],"quickstart":{"code":"import sky\n\ntask = sky.Task(\n    run='echo \"Hello, SkyPilot!\"',\n    resources=sky.Resources(cloud=sky.AWS())\n)\n\n# Launch the task on AWS. SkyPilot will provision resources and run the command.\n# For multi-cloud optimization, remove 'cloud=sky.AWS()' and let SkyPilot choose.\ncluster_name = \"my-first-sky-cluster\"\nrequest_id = sky.launch(task, cluster_name=cluster_name)\n\nprint(f\"Launched cluster '{cluster_name}' with request ID: {request_id}\")\nprint(f\"To view logs: sky logs {cluster_name}\")\nprint(f\"To stop and delete: sky down {cluster_name}\")\n\n# Example of checking status asynchronously\n# from sky.client import sdk_async as sdk\n# import asyncio\n# async def get_status():\n#     status = await sdk.status()\n#     print(status)\n# asyncio.run(get_status())","lang":"python","description":"This quickstart demonstrates how to define a simple task using the SkyPilot Python SDK and launch it on AWS. The task runs a basic 'Hello, SkyPilot!' command. SkyPilot automatically handles resource provisioning, setup, and execution. You can omit the `cloud=sky.AWS()` specification to allow SkyPilot to automatically select the cheapest and most available cloud resource."},"warnings":[{"fix":"Upgrade your Python environment to 3.9, 3.10, 3.11, 3.12, or 3.13. Create a new virtual environment if necessary.","message":"SkyPilot v0.12.0 and newer releases (including nightly) require Python 3.9 or higher. Python 3.7 and 3.8 are no longer supported.","severity":"breaking","affected_versions":"v0.12.0+"},{"fix":"Migrate to `sky.jobs.queue(version=2)` to receive richer job metadata as dictionaries.","message":"The `sky.jobs.queue(version=1)` API is deprecated in v0.12.0 and will be removed in v0.13. It returns simpler metadata.","severity":"breaking","affected_versions":"v0.12.0+"},{"fix":"Follow the API server upgrade instructions provided in the release notes or documentation for your specific deployment method (e.g., Helm upgrade).","message":"When using a remote SkyPilot API server, upgrading the client library (skypilot-nightly) often requires upgrading the API server deployment as well to ensure compatibility and access new features/fixes.","severity":"gotcha","affected_versions":"All versions with API server deployments"},{"fix":"For stability, use the stable `skypilot` package (`pip install skypilot`). If using nightly, frequently check GitHub issues and releases for updates and potential workarounds.","message":"Nightly builds (`skypilot-nightly`) are development versions and may contain instabilities, bugs, or undocumented breaking changes. They are not recommended for production environments.","severity":"gotcha","affected_versions":"All nightly builds"},{"fix":"Reduce concurrent SkyPilot operations. For team deployments or high-concurrency use cases, consider configuring SkyPilot to use an external PostgreSQL database instead of SQLite.","message":"SkyPilot's local API server uses SQLite by default, which can lead to `sqlite3.OperationalError: database is locked` under high concurrency or many simultaneous operations. [cite: 0 (release notes for v0.11.1), 36, 38, 39]","severity":"gotcha","affected_versions":"All versions using default SQLite for local API server"},{"fix":"Ensure your private SSH key (`~/.ssh/id_rsa` or similar) has `600` permissions (`chmod 600 ~/.ssh/id_rsa`) and the `.ssh` directory has `700` permissions (`chmod 700 ~/.ssh`).","message":"Incorrect file permissions for SSH keys can cause `Permission denied (publickey)` errors when SkyPilot tries to SSH into remote clusters. [cite: 0 (release notes for v0.11.2rc1), 28, 29, 30]","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Restart the SkyPilot API server (`sky api stop; sky api start`). If the issue persists with heavy usage, configure SkyPilot to use an external PostgreSQL database for the API server. For temporary local fixes, reduce concurrent `sky` commands.","cause":"The SQLite database used by the local SkyPilot API server is experiencing high contention or is locked by another process/thread. [cite: 0 (release notes for v0.11.1), 36, 38, 39]","error":"sqlite3.OperationalError: database is locked"},{"fix":"Upgrade your `skypilot-nightly` installation to the latest version, which should include the necessary `uvicorn` dependency pinning or compatibility fixes. If not, manually pin `uvicorn<0.36.0`.","cause":"This error occurs due to a breaking change in `uvicorn` (version 0.36.0 and higher) where `Config.setup_event_loop` was removed and replaced. SkyPilot v0.10.3.post1 explicitly pinned `uvicorn` to mitigate this. [cite: 0 (release notes for v0.10.3.post1), 23, 27, 34, 37]","error":"AttributeError: 'Config' object has no attribute 'setup_event_loop'"},{"fix":"Set appropriate permissions: `chmod 700 ~/.ssh/` for the directory and `chmod 600 ~/.ssh/id_rsa` for your private key. Ensure the public key on the remote server (`~/.ssh/authorized_keys`) also has correct permissions (`chmod 600 ~/.ssh/authorized_keys`).","cause":"SSH authentication failed, often due to incorrect file permissions on SSH keys or the `.ssh` directory on the local machine or remote host.","error":"Permission denied (publickey)"},{"fix":"Ensure that the Kubernetes service account SkyPilot uses (or your `kubeconfig` context if running locally) has the necessary `Role` and `RoleBinding` permissions to manage resources (pods, services, deployments, etc.) in the target namespace. For transient errors, SkyPilot v0.10.3.post2 and later include retries and fallbacks. Verify `kubectl get nodes` works without errors.","cause":"SkyPilot's Kubernetes integration is encountering a 403 Forbidden error when trying to interact with the Kubernetes API, typically due to insufficient RBAC permissions for the service account being used. This could be transient or persistent. [cite: 0 (release notes for v0.10.3.post2), 31, 32, 33]","error":"Error from server (Forbidden): roles.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:...\" cannot list resource \"roles\" in API group \"rbac.authorization.k8s.io\" in the namespace \"...\" (or similar 403 Kubernetes API errors)"}]}