Databricks SQLAlchemy plugin for Python

2.0.9 verified Tue May 12 auth: no python install: draft

Databricks SQLAlchemy is a Python library that provides a SQLAlchemy dialect for connecting to Databricks SQL warehouses and clusters. It serves as a bridge between SQLAlchemy's ORM capabilities and the Databricks SQL Python driver, enabling Python applications to interact with Databricks data using standard SQLAlchemy patterns. The library is currently at version 2.0.9 and regularly receives updates, including recent support for complex data types like VARIANT, ARRAY, and MAP.

pip install databricks-sqlalchemy

Common errors

error sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:databricks ↓

cause This error occurs when the SQLAlchemy library cannot find or load the Databricks dialect because the `databricks-sqlalchemy` package is not correctly installed or registered with SQLAlchemy.

fix

Ensure databricks-sqlalchemy is installed in your environment: pip install databricks-sqlalchemy. Also, verify that SQLAlchemy itself is installed and up-to-date.

error ModuleNotFoundError: No module named 'databricks' ↓

cause This typically means that the `databricks-sqlalchemy` package or its core dependency, `databricks-sql-connector`, has not been installed in the Python environment.

fix

Install the library using pip: pip install databricks-sqlalchemy. If databricks-sql-connector is missing, install it explicitly: pip install databricks-sql-connector.

error AttributeError: module 'sqlalchemy.types' has no attribute 'Uuid' ↓

cause This error usually indicates a version incompatibility between SQLAlchemy and a dependent library (like `langchain`) or between `databricks-sqlalchemy` and the installed SQLAlchemy version, where the 'Uuid' type is expected but not found in the current SQLAlchemy version.

fix

Upgrade SQLAlchemy to version 2.0 or higher: pip install --upgrade SQLAlchemy or specifically to a compatible version that includes the Uuid type if you are using an older version of SQLAlchemy or related libraries that require it.

error sqlalchemy.exc.OperationalError: (databricks.sql.exc.RequestError) Error during request to server: BAD_REQUEST: Parameterized query has too many parameters: 2000 parameters were given but the limit is 256. ↓

cause This specific `OperationalError` occurs when attempting to insert too many rows or columns at once using methods like `pandas.to_sql` with `method='multi'`, exceeding the maximum parameter limit enforced by the Databricks SQL endpoint.

fix

When using pandas.to_sql, avoid setting method='multi' or break down the DataFrame into smaller chunks before inserting to stay within the parameter limit. Alternatively, try inserting row-by-row, though this might be slower.

Warnings

breaking The `databricks-sqlalchemy` library has distinct major versions for SQLAlchemy 1.x and SQLAlchemy 2.x. Version `1.x` of `databricks-sqlalchemy` supports SQLAlchemy 1.x, while `2.x` supports SQLAlchemy 2.x. Upgrading `databricks-sqlalchemy` to `2.x` requires a corresponding upgrade of `SQLAlchemy` to `2.x`, and vice-versa, to ensure compatibility. ↓

fix Ensure `databricks-sqlalchemy` and `SQLAlchemy` major versions are compatible. For SQLAlchemy 2.x, use `pip install databricks-sqlalchemy`. For SQLAlchemy 1.x, use `pip install databricks-sqlalchemy~=1.0`.

gotcha There are two similarly named, but distinct, Python packages: `databricks-sqlalchemy` (the official dialect, this package) and `sqlalchemy-databricks` (an older, separate community-driven package). Always ensure you install and use `databricks-sqlalchemy` for official and up-to-date support. ↓

fix Verify your `pip install` command is `pip install databricks-sqlalchemy` and check your `requirements.txt` to avoid `sqlalchemy-databricks`.

gotcha The SQLAlchemy 2.0 dialect for Databricks *always* uses native parameterized queries with the 'named' paramstyle (`:param`) and requires Databricks Runtime 14.2 or above. Using a different paramstyle or older DBR versions will result in query failures. ↓

fix Ensure your Databricks Runtime is 14.2 or newer, and use the `:param_name` syntax for parameters in your SQL queries when executing via SQLAlchemy.

gotcha SQLAlchemy's `DateTime()` type is mapped to Databricks' timezone-agnostic `TIMESTAMP_NTZ()`. If you declare `DateTime(timezone=True)`, the `timezone` argument will be ignored by the Databricks dialect. For timezone-aware timestamps, you must explicitly import and use `databricks.sqlalchemy.TIMESTAMP()`. ↓

fix Use `databricks.sqlalchemy.TIMESTAMP()` for timezone-aware columns. For timezone-agnostic columns, `sqlalchemy.DateTime()` will correctly map to `TIMESTAMP_NTZ()`.

gotcha Older versions of `databricks-sqlalchemy` (e.g., 1.0.5) had issues with unquoted identifiers in Unity Catalog, particularly for catalog or schema names containing hyphens. This could lead to `[INVALID_IDENTIFIER]` errors during metadata operations. ↓

fix Upgrade to `databricks-sqlalchemy` version 2.x to resolve identifier quoting issues in Unity Catalog. Always quote identifiers in your raw SQL if they contain special characters.

gotcha The `autoincrement` argument on SQLAlchemy type declarations (e.g., `Integer(autoincrement=True)`) is currently ignored by the Databricks dialect. To create an auto-incrementing field, you must explicitly use `sqlalchemy.Identity()` and combine it with the `databricks.sqlalchemy.BigInteger()` type, as only BIGINT fields support auto-increment in Databricks Runtime. ↓

fix For auto-incrementing fields, define them as `Column(BigInteger, Identity())`.

breaking Installation in minimal environments (e.g., Alpine Linux) may fail for packages with C extensions if essential build tools are missing. Many Python packages, including some dependencies of `databricks-sqlalchemy`, require a C compiler and Python development headers to be built from source during installation. ↓

fix Ensure your environment includes necessary build tools. For Alpine Linux, install `build-base` and `python3-dev` using `apk add build-base python3-dev`. For Debian/Ubuntu, use `apt-get install build-essential python3-dev`.

Install compatibility draft last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) build_error - - - -

3.10 alpine (musl) - - - -

3.10 slim (glibc) sdist 13.5s 0.35s 350M

3.10 slim (glibc) - - 0.36s 346M

3.11 alpine (musl) build_error - - - -

3.11 alpine (musl) - - - -

3.11 slim (glibc) sdist 12.9s 0.72s 379M

3.11 slim (glibc) - - 0.68s 374M

3.12 alpine (musl) build_error - - - -

3.12 alpine (musl) - - - -

3.12 slim (glibc) sdist 14.2s 0.81s 360M

3.12 slim (glibc) - - 0.99s 356M

3.13 alpine (musl) build_error - - - -

3.13 alpine (musl) - - - -

3.13 slim (glibc) sdist 14.1s 0.78s 359M

3.13 slim (glibc) - - 0.84s 355M

3.9 alpine (musl) build_error - - - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) sdist 15.9s 0.41s 347M

3.9 slim (glibc) - - 0.43s 347M

Imports

create_engine
```
from sqlalchemy import create_engine
```
DatabricksDialect (implicit)
```
engine = create_engine('databricks://...')
```
The dialect is automatically registered with SQLAlchemy upon installation; no direct class import is typically needed. The 'databricks://' scheme is used in the connection URL.
TIMESTAMP
```
from databricks.sqlalchemy import TIMESTAMP
```
For timezone-aware timestamps, as SQLAlchemy's default DateTime() is mapped to Databricks' timezone-agnostic TIMESTAMP_NTZ().
TIMESTAMP_NTZ
```
from databricks.sqlalchemy import TIMESTAMP_NTZ
```
Explicitly import if you need the timezone-agnostic timestamp type.

TINYINT

from databricks.sqlalchemy import TINYINT

Databricks-specific integer type.

Quickstart last tested: 2026-04-24

This quickstart demonstrates how to establish a connection to Databricks using the `databricks-sqlalchemy` dialect and `create_engine`. It utilizes environment variables for secure authentication and connection details, then executes a simple SELECT query. It also includes an example of a parameterized query, highlighting the named paramstyle required by the dialect.

import os
from sqlalchemy import create_engine, text

# Ensure these environment variables are set:
# DATABRICKS_SERVER_HOSTNAME: Hostname of your Databricks workspace or SQL warehouse
# DATABRICKS_HTTP_PATH: HTTP path of your SQL warehouse or cluster
# DATABRICKS_TOKEN: Your Databricks Personal Access Token
# DATABRICKS_CATALOG (optional): Initial Unity Catalog catalog
# DATABRICKS_SCHEMA (optional): Initial Unity Catalog schema

host = os.environ.get('DATABRICKS_SERVER_HOSTNAME', 'your_databricks_host.cloud.databricks.com')
http_path = os.environ.get('DATABRICKS_HTTP_PATH', '/sql/1.0/endpoints/your_http_path')
access_token = os.environ.get('DATABRICKS_TOKEN', 'dapi...')
catalog = os.environ.get('DATABRICKS_CATALOG', 'default')
schema = os.environ.get('DATABRICKS_SCHEMA', 'default')

try:
    connection_string = (
        f"databricks://token:{access_token}@{host}?"
        f"http_path={http_path}&catalog={catalog}&schema={schema}"
    )
    engine = create_engine(connection_string)

    with engine.connect() as connection:
        # Example: Execute a simple query
        result = connection.execute(text("SELECT 1 AS one, 'hello' AS two"))
        for row in result:
            print(f"Row: {row.one}, {row.two}")

        # Example with parameterized query (requires DBR 14.2+)
        # Note: Databricks dialect uses 'named' paramstyle ':param'
        param_value = "world"
        result = connection.execute(text("SELECT :val AS greeting"), {'val': param_value})
        for row in result:
            print(f"Greeting: {row.greeting}")

    print("Successfully connected and executed queries.")
except Exception as e:
    print(f"Error connecting to Databricks: {e}")
    print("Please ensure environment variables are correctly set and Databricks resources are accessible.")