Databricks SDK for Python
The Databricks SDK for Python (Beta) provides a comprehensive client for interacting with the Databricks Lakehouse. It covers all public Databricks REST API operations, offering a robust internal HTTP client that handles intelligent retries. While in Beta, it is supported for production use cases, though future releases are expected to introduce some interface changes. The library is actively developed with frequent releases.
Warnings
- breaking The SDK is in Beta, and future releases are expected to introduce interface changes. Databricks recommends pinning dependencies to specific minor versions (e.g., `databricks-sdk==0.102.*`) to avoid unexpected breaking changes during upgrades.
- gotcha Authentication errors, often presenting as 'Error: Unable to parse response', typically indicate issues with Databricks host configuration, insufficient permissions for the API operation, or network/firewall problems (e.g., private link redirecting to a login page).
- gotcha When used in Databricks notebooks, `from databricks.sdk.runtime import *` can lead to namespace pollution, potentially overwriting important objects like the `spark` (SparkSession) object.
- gotcha Within Databricks notebooks, the SparkSession object is automatically initialized by Databricks Runtime. Explicitly initializing a `SparkSession` (e.g., `SparkSession.builder.getOrCreate()`) is redundant and generally unnecessary.
Install
-
pip install databricks-sdk
Imports
- WorkspaceClient
from databricks.sdk import WorkspaceClient
- AccountClient
from databricks.sdk import AccountClient
- Service-specific classes
from databricks.sdk.service.compute import ClusterInfo
Quickstart
import os
from databricks.sdk import WorkspaceClient
# Databricks SDK uses Databricks unified authentication.
# It prioritizes environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN)
# or a .databrickscfg file. For this example to run outside Databricks,
# ensure these are set.
# Example: export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
# Example: export DATABRICKS_TOKEN=dapi********************************
host = os.environ.get('DATABRICKS_HOST', 'https://your-workspace.cloud.databricks.com')
token = os.environ.get('DATABRICKS_TOKEN', 'dapi_your_token_here')
try:
# Initialize WorkspaceClient, which will pick up credentials automatically.
# For explicit config:
# w = WorkspaceClient(host=host, token=token)
w = WorkspaceClient()
print(f"Listing clusters in Databricks workspace: {w.config.host}")
for c in w.clusters.list():
print(f" - {c.cluster_name} (ID: {c.cluster_id})")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure DATABRICKS_HOST and DATABRICKS_TOKEN environment variables are set ")
print("or a valid .databrickscfg file exists with proper authentication.")