Databricks API Wrapper
The `databricksapi` library provides a Python wrapper for the Databricks REST API, simplifying interactions with Databricks workspaces, clusters, jobs, and more. It leverages the `requests` module for HTTP communication. Currently at version 1.1.8, its release cadence is moderate, with updates typically addressing new Databricks API features or bug fixes.
Common errors
-
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: ...
cause The provided Databricks API token is either missing, invalid, or lacks the necessary permissions for the requested operation.fixEnsure `DATABRICKS_TOKEN` is correctly set and has 'Account Admin', 'Workspace Admin', or specific feature permissions (e.g., 'clusters/list', 'jobs/run') as required by the API call. -
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='your-databricks-host.cloud.databricks.com', port=443): Max retries exceeded with url: ... (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at ...>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))cause The Databricks host URL is incorrect, misspelled, or inaccessible due to network issues (e.g., proxy, firewall, DNS).fixVerify the `DATABRICKS_HOST` environment variable or parameter is correct, includes `https://`, and is reachable from your execution environment. Check for typos in the hostname. -
AttributeError: 'DatabricksAPI' object has no attribute 'some_nonexistent_api_group'
cause Attempting to access a non-existent or misspelled Databricks API group (e.g., `databricks.workspaces` instead of `databricks.workspace`, or `databricks.clusters_api` instead of `databricks.clusters`). The library uses singular nouns for API groups.fixConsult the `databricksapi` GitHub repository's `databricks_api/api.py` or quickstart examples to confirm the correct top-level API object names (e.g., `databricks.clusters`, `databricks.jobs`, `databricks.workspace`).
Warnings
- gotcha Authentication failures are common. Ensure your `host` parameter includes the `https://` prefix (e.g., `https://dbc-xxxx.cloud.databricks.com`) and your `token` has the necessary permissions for the specific API calls you're making. Different API endpoints require different token scopes.
- gotcha The library's internal structure mirrors Databricks API groups (e.g., `databricks.clusters`, `databricks.jobs`). While generally stable, breaking changes in the underlying Databricks API or internal refactoring in `databricksapi` could lead to `AttributeError` if an API endpoint path changes.
- gotcha Error responses from the Databricks API are wrapped in `requests.exceptions.HTTPError`. These errors typically contain detailed JSON messages in their response content. Generic `try-except` blocks might obscure specific API errors.
Install
-
pip install databricksapi
Imports
- DatabricksAPI
from databricks_api import DatabricksAPI
Quickstart
import os
from databricks_api import DatabricksAPI
databricks_host = os.environ.get('DATABRICKS_HOST', 'https://your-databricks-host.cloud.databricks.com') # e.g., 'https://dbc-xxxx.cloud.databricks.com'
databricks_token = os.environ.get('DATABRICKS_TOKEN', 'dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
if not databricks_host or 'your-databricks-host' in databricks_host:
print("Error: Please set DATABRICKS_HOST environment variable or replace placeholder in code.")
elif not databricks_token or 'dapi' not in databricks_token:
print("Error: Please set DATABRICKS_TOKEN environment variable or replace placeholder in code.")
else:
try:
# Initialize the Databricks API client
databricks = DatabricksAPI(host=databricks_host, token=databricks_token)
# Example: List active clusters
# Requires 'clusters/list' permission on the token
clusters_response = databricks.clusters.list_all_clusters()
clusters = clusters_response.get('clusters', [])
print(f"Found {len(clusters)} clusters.")
if clusters:
print(f"First cluster name: {clusters[0]['cluster_name']}")
# Example: List jobs (uncomment to run, requires 'jobs/list' permission)
# jobs_response = databricks.jobs.list_jobs()
# jobs = jobs_response.get('jobs', [])
# print(f"Found {len(jobs)} jobs.")
except Exception as e:
print(f"An error occurred: {e}")
if hasattr(e, 'response') and hasattr(e.response, 'json'):
print(f"API Error Details: {e.response.json()}")