SQLAlchemy Dialect for Databricks
SQLAlchemy-Databricks is a SQLAlchemy dialect for connecting to Databricks. It enables Python applications to interact with Databricks SQL Endpoints and Unity Catalog using the familiar SQLAlchemy ORM or Core API. The current version is 0.2.0, supporting Python 3.8 and above. The release cadence appears to be infrequent, with two major releases to date.
Warnings
- gotcha Incorrect `server_hostname` or `http_path` format in the connection string is a frequent cause of connection failures. The dialect expects these to be correctly extracted from your Databricks SQL Endpoint details.
- gotcha Version 0.2.0 fixed handling of `server_hostname` and `http_path` in the connection string. Users upgrading from 0.1.0 who had workarounds for previously incorrect parsing might experience connection issues if their connection string no longer aligns with the corrected parsing logic.
- gotcha While `sqlalchemy-databricks` pins a compatible range, breaking changes or specific issues in the underlying `databricks-sql-connector` can impact connections or query execution.
- gotcha Connections will fail if the target Databricks SQL Endpoint is not running. This is an external dependency issue.
Install
-
pip install sqlalchemy-databricks
Imports
- create_engine
from sqlalchemy import create_engine
- text
from sqlalchemy import text
Quickstart
from sqlalchemy import create_engine, text
import os
# Databricks connection details
# These are typically found in the Databricks SQL Endpoint connection details.
# server_hostname is like 'dbc-xxxxxxxx-yyyy.cloud.databricks.com'
# http_path is like '/sql/1.0/endpoints/zzzzzzzzzzzzzzzz'
# token is a Databricks Personal Access Token or Azure AD Token
DATABRICKS_SERVER_HOSTNAME = os.environ.get('DATABRICKS_SERVER_HOSTNAME', 'your_databricks_hostname')
DATABRICKS_HTTP_PATH = os.environ.get('DATABRICKS_HTTP_PATH', 'your_http_path')
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN', 'your_databricks_token')
# Construct the connection string
connection_string = f"databricks://token:{DATABRICKS_TOKEN}@{DATABRICKS_SERVER_HOSTNAME}/{DATABRICKS_HTTP_PATH}"
# Create the engine
try:
engine = create_engine(connection_string)
# Establish a connection and execute a simple query
with engine.connect() as connection:
result = connection.execute(text("SELECT 1")) # Simple test query
print("Connection successful! Query result:", result.scalar())
print("\nSuccessfully connected to Databricks and executed a query.")
except Exception as e:
print(f"Error connecting to Databricks: {e}")
print("Please ensure DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN environment variables are set correctly.")