Databricks SQL Connector for Python

raw JSON →
4.2.5 verified Tue May 12 auth: no python install: draft quickstart: stale

The Databricks SQL Connector for Python is a Python library that enables running SQL commands on Databricks clusters and SQL warehouses. It is a Thrift-based client, conforms to the Python DB API 2.0 specification, and uses Apache Arrow for efficient data exchange. The library is actively maintained with frequent releases, often multiple times a month.

pip install databricks-sql-connector
error ModuleNotFoundError: No module named 'databricks'
cause This error occurs when you try to import `databricks.sql` but the `databricks-sql-connector` package is not installed, or you are attempting to use an outdated import path or a different Databricks-related package like `databricks-connect` which has a different import structure.
fix
Ensure databricks-sql-connector is installed using pip install databricks-sql-connector. The correct import for the SQL connector is from databricks.sql import connect.
error databricks.sql.exc.RequestError: Error during request to server
cause This is a generic connection error, often indicating issues with the provided `server_hostname`, `http_path`, or an invalid or expired `access_token` (Personal Access Token).
fix
Double-check that server_hostname, http_path, and access_token are correct and have the necessary permissions. Ensure there are no typos, leading/trailing spaces, or incorrect environment variable configurations. Also, confirm network connectivity to the Databricks workspace.
error AttributeError: 'NoneType' object has no attribute 'startswith'
cause This error typically occurs when one of the required connection parameters (like `server_hostname`, `http_path`, or `access_token`) is passed as `None` to the `sql.connect()` function, often because an environment variable was not set or retrieved correctly.
fix
Verify that all connection parameters are explicitly set and are not None. For example, if using environment variables, ensure they are defined in your execution environment: os.getenv("DATABRICKS_SERVER_HOSTNAME") must return a string, not None.
error ModuleNotFoundError: No module named 'packaging'
cause This error means that a required dependency, the `packaging` module, is missing from your Python environment, which `databricks-sql-connector` relies on.
fix
Install the missing packaging library: pip install packaging.
error AttributeError: 'Cursor' object has no attribute 'active_op_handle'
cause This `AttributeError` indicates an incompatibility, typically with older versions of libraries that depend on `databricks-sql-connector` (e.g., `dbt-databricks`). These dependent libraries might be trying to access the `active_op_handle` attribute on the `Cursor` object, which was present in earlier versions of `databricks-sql-connector` but has since been removed or changed.
fix
Pin the databricks-sql-connector version to an earlier compatible version (e.g., pip install databricks-sql-connector==4.0.5) or update the dependent library (like dbt-databricks) to a version that is compatible with your current databricks-sql-connector version.
breaking PyArrow is no longer a default dependency since version 4.0.0. Users must explicitly install `databricks-sql-connector[pyarrow]` or `pip install pyarrow` to enable Arrow-based features like CloudFetch and `fetchmany_arrow`. Failing to do so may impact performance for large datasets.
fix Install with `pip install databricks-sql-connector[pyarrow]` or `pip install pyarrow` separately.
breaking The SQLAlchemy dialect for Databricks was split into a separate `databricks-sqlalchemy` package in version 4.0.0. Users leveraging SQLAlchemy must now explicitly install `databricks-sqlalchemy` alongside the core connector.
fix Install the separate SQLAlchemy dialect: `pip install databricks-sqlalchemy`.
gotcha When using `pandas.read_sql` directly with a `databricks-sql-connector` connection object (v3.0.0+), a `UserWarning` regarding DB API 2.0 support might appear. While this warning can generally be ignored due to PyArrow's efficiency, using `databricks-sqlalchemy` as the engine provides a warning-free experience and broader compatibility.
fix Optionally, use the `databricks-sqlalchemy` engine for `pandas.read_sql` to suppress the warning or if broader SQLAlchemy compatibility is needed.
breaking The `pandas.DataFrame.to_sql()` method effectively broke for inserts exceeding 255 values (not rows) into Delta tables with `databricks-sql-connector` versions 3.0.0 and above, due to the introduction of native parameters and a server-side limitation.
fix Consider using alternative methods for large inserts, or a dedicated fix package like `pandas-tosql-dbx-fix` which compiles the SQL query before sending it.
breaking Version 4.2.0 introduced changes to `autocommit` properties on the connection object. Directly setting `connection.autocommit = False` via API calls (e.g., in an Airflow hook) can now lead to `TransactionError: [CONFIG_NOT_AVAILABLE] Configuration AUTOCOMMIT is not available. SQLSTATE: 42K0I`. Multi-statement transaction control should be managed through `connection.autocommit = False` then `connection.commit()` and `connection.rollback()`.
fix Ensure `autocommit` is managed correctly by the connector's transaction methods. For explicit control, set `autocommit=False` during connection establishment or directly on the connection object, then use `commit()`/`rollback()`. Avoid direct setting if it's causing the `CONFIG_NOT_AVAILABLE` error.
gotcha While PyPI metadata states Python >=3.8.0 is supported, the official GitHub README for the latest versions (4.x.x) recommends Python 3.9 or above for full compatibility and access to the latest features and stability.
fix Ensure your environment uses Python 3.9 or newer when working with `databricks-sql-connector` versions 4.x.x.
breaking Installation fails with `error: command 'gcc' failed` when building the `lz4` dependency (e.g., in minimal environments like Alpine Linux). This occurs because `lz4` requires C compilation, and build tools are not present.
fix Ensure C build tools (e.g., `gcc`, `build-base` for Alpine) are installed in your environment before installing `databricks-sql-connector`.
pip install databricks-sql-connector[pyarrow]
python os / libc variant status wheel install import disk
3.10 alpine (musl) databricks-sql-connector - - - -
3.10 alpine (musl) pyarrow - - - -
3.10 slim (glibc) databricks-sql-connector - - 0.02s 177M
3.10 slim (glibc) pyarrow - - 0.02s 324M
3.11 alpine (musl) databricks-sql-connector - - - -
3.11 alpine (musl) pyarrow - - - -
3.11 slim (glibc) databricks-sql-connector - - 0.06s 199M
3.11 slim (glibc) pyarrow - - 0.05s 348M
3.12 alpine (musl) databricks-sql-connector - - - -
3.12 alpine (musl) pyarrow - - - -
3.12 slim (glibc) databricks-sql-connector - - 0.04s 182M
3.12 slim (glibc) pyarrow - - 0.04s 330M
3.13 alpine (musl) databricks-sql-connector - - - -
3.13 alpine (musl) pyarrow - - - -
3.13 slim (glibc) databricks-sql-connector - - 0.05s 181M
3.13 slim (glibc) pyarrow - - 0.04s 329M
3.9 alpine (musl) databricks-sql-connector - - - -
3.9 alpine (musl) pyarrow - - - -
3.9 slim (glibc) databricks-sql-connector - - 0.03s 186M
3.9 slim (glibc) pyarrow - - 0.03s 325M

This quickstart demonstrates how to establish a connection to a Databricks SQL warehouse using a Personal Access Token (PAT) and execute a simple query. Ensure that `DATABRICKS_SERVER_HOSTNAME`, `DATABRICKS_HTTP_PATH`, and `DATABRICKS_TOKEN` environment variables are set with your Databricks connection details.

import os
from databricks import sql

# Ensure these environment variables are set:
# DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN

host = os.environ.get('DATABRICKS_SERVER_HOSTNAME', 'your_server_hostname.databricks.com')
http_path = os.environ.get('DATABRICKS_HTTP_PATH', '/sql/1.0/endpoints/your_sql_warehouse_id')
access_token = os.environ.get('DATABRICKS_TOKEN', 'dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')

if not all([host, http_path, access_token]):
    print("Please set DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN environment variables.")
else:
    try:
        with sql.connect(
            server_hostname=host,
            http_path=http_path,
            access_token=access_token
        ) as connection:
            with connection.cursor() as cursor:
                cursor.execute("SELECT 1 as id, 'hello' as message")
                result = cursor.fetchall()
                for row in result:
                    print(row)
    except Exception as e:
        print(f"An error occurred: {e}")