Databricks SQLAlchemy plugin for Python

2.0.9 · active · verified Sun Mar 29

Databricks SQLAlchemy is a Python library that provides a SQLAlchemy dialect for connecting to Databricks SQL warehouses and clusters. It serves as a bridge between SQLAlchemy's ORM capabilities and the Databricks SQL Python driver, enabling Python applications to interact with Databricks data using standard SQLAlchemy patterns. The library is currently at version 2.0.9 and regularly receives updates, including recent support for complex data types like VARIANT, ARRAY, and MAP.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to establish a connection to Databricks using the `databricks-sqlalchemy` dialect and `create_engine`. It utilizes environment variables for secure authentication and connection details, then executes a simple SELECT query. It also includes an example of a parameterized query, highlighting the named paramstyle required by the dialect.

import os
from sqlalchemy import create_engine, text

# Ensure these environment variables are set:
# DATABRICKS_SERVER_HOSTNAME: Hostname of your Databricks workspace or SQL warehouse
# DATABRICKS_HTTP_PATH: HTTP path of your SQL warehouse or cluster
# DATABRICKS_TOKEN: Your Databricks Personal Access Token
# DATABRICKS_CATALOG (optional): Initial Unity Catalog catalog
# DATABRICKS_SCHEMA (optional): Initial Unity Catalog schema

host = os.environ.get('DATABRICKS_SERVER_HOSTNAME', 'your_databricks_host.cloud.databricks.com')
http_path = os.environ.get('DATABRICKS_HTTP_PATH', '/sql/1.0/endpoints/your_http_path')
access_token = os.environ.get('DATABRICKS_TOKEN', 'dapi...')
catalog = os.environ.get('DATABRICKS_CATALOG', 'default')
schema = os.environ.get('DATABRICKS_SCHEMA', 'default')

try:
    connection_string = (
        f"databricks://token:{access_token}@{host}?"
        f"http_path={http_path}&catalog={catalog}&schema={schema}"
    )
    engine = create_engine(connection_string)

    with engine.connect() as connection:
        # Example: Execute a simple query
        result = connection.execute(text("SELECT 1 AS one, 'hello' AS two"))
        for row in result:
            print(f"Row: {row.one}, {row.two}")

        # Example with parameterized query (requires DBR 14.2+)
        # Note: Databricks dialect uses 'named' paramstyle ':param'
        param_value = "world"
        result = connection.execute(text("SELECT :val AS greeting"), {'val': param_value})
        for row in result:
            print(f"Greeting: {row.greeting}")

    print("Successfully connected and executed queries.")
except Exception as e:
    print(f"Error connecting to Databricks: {e}")
    print("Please ensure environment variables are correctly set and Databricks resources are accessible.")

view raw JSON →