Databricks SQL
Databricks SQL is a Python framework designed for easy interaction with Databricks SQL Endpoints. It provides a fluent API for building and executing SQL queries, simplifying data operations for Python developers. The library recently reached version 1.0.0, indicating a stable API after rapid initial development.
Common errors
-
ModuleNotFoundError: No module named 'databricks_sql'
cause The `databricks-sql` package is not installed in your current Python environment.fixRun `pip install databricks-sql` to install the library. -
AttributeError: 'SelectCommandBuilder' object has no attribute 'id' (or any column name)
cause You are attempting to access data attributes directly from a query builder object instead of the query results. This happens when you forget to call an execution method like `.fetch_all()`.fixEnsure your query chain ends with a method that executes the query and retrieves data, for example: `result = db_sql.select(...).from_table(...).fetch_all()`. Then, iterate over `result` to access rows/columns. -
databricks_api.auth.auth.DatabricksAuthException: Could not authenticate with Databricks. Please check your credentials.
cause The provided Databricks authentication details (server hostname, HTTP path, or access token) are incorrect or missing, preventing a successful connection to the Databricks SQL Endpoint.fixDouble-check the values for `DATABRICKS_SERVER_HOSTNAME`, `DATABRICKS_HTTP_PATH`, and `DATABRICKS_ACCESS_TOKEN` in your environment variables or the `DatabricksSQL` constructor. Ensure the access token is valid and has sufficient permissions.
Warnings
- breaking The internal module name was changed in version 0.0.1. While unlikely to affect many users given the library's recency, it means any code written for version 0.0.0 would have broken import paths.
- gotcha Failing to provide correct Databricks connection parameters (server hostname, HTTP path, access token) will lead to authentication errors from the underlying `databricks-api` library. This is the most common cause of initial connection failures.
- gotcha The query builder methods (e.g., `select`, `from_table`, `where`) return builder objects. You must call a terminal execution method like `.fetch_all()`, `.fetch_one()`, `.fetch_dataframe()`, or `.execute()` to run the query and retrieve results.
- gotcha This library relies on `databricks-api`. Specific error messages or behaviors related to connectivity, authentication, or underlying API calls might originate from `databricks-api` itself. Consult its documentation for deeper troubleshooting if `databricks-sql` errors are unclear.
Install
-
pip install databricks-sql
Imports
- DatabricksSQL
from databricks_sql import DatabricksSQL
Quickstart
import os
from databricks_sql import DatabricksSQL
# Ensure these environment variables are set for authentication
# DATABRICKS_SERVER_HOSTNAME (e.g., 'dbc-xxxx.cloud.databricks.com')
# DATABRICKS_HTTP_PATH (e.g., '/sql/1.0/endpoints/xxxx')
# DATABRICKS_ACCESS_TOKEN (Databricks personal access token)
# Initialize the DatabricksSQL client
db_sql = DatabricksSQL(
server_hostname=os.environ.get("DATABRICKS_SERVER_HOSTNAME", ""),
http_path=os.environ.get("DATABRICKS_HTTP_PATH", ""),
access_token=os.environ.get("DATABRICKS_ACCESS_TOKEN", "")
)
try:
# Example: Select data from a table named 'users'
# Replace 'your_schema.users' with an actual table in your Databricks workspace
result = db_sql.select("id", "name").from_table("your_schema.users").limit(5).fetch_all()
print("Fetched data:")
for row in result:
print(row)
# Example: Insert data (if table allows)
# db_sql.insert().into_table("your_schema.new_users").columns("id", "name").values(1, "Alice").execute()
# print("Data inserted.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your Databricks connection details (server_hostname, http_path, access_token) are correctly configured.")