Lightweight Stateless SQL for Databricks
Databricks-labs-lsql is a Python library that provides lightweight, stateless SQL execution capabilities for Databricks, built on top of the Databricks SDK. It's designed for serverless or containerized short-lived applications where minimal dependencies and faster startup times are critical, fetching results in JSON format through presigned external links. The current version is 0.17.0, and it appears to have an active release cadence with regular updates.
Warnings
- breaking Resource name validation was introduced and refined in versions v0.15.0, v0.15.1, and v0.16.0. Resource names are now restricted to alphanumeric characters, hyphens, and underscores. Older code generating resource names with other special characters may break.
- gotcha The `databricks-labs-lsql` library is explicitly designed for lightweight, stateless SQL execution via REST APIs, best suited for serverless functions or short-lived applications. For traditional SQL Python APIs, cursors, efficient large data transfers (Apache Arrow), and low latency, use the `Databricks SQL Connector for Python` instead.
- deprecated The downstream `ucx` check was disabled in v0.17.0 due to compatibility issues with the latest Databricks SDK. While this resolves immediate failures, users relying on these checks should be aware they are no longer active.
- breaking SQL name escaping functions (`escape_name`, `escape_full_name`) were added in v0.13.0. If your application dynamically constructs SQL queries with identifiers that might contain special characters, older versions of the library would not have provided built-in escaping, potentially leading to SQL injection vulnerabilities or syntax errors.
Install
-
pip install databricks-labs-lsql
Imports
- StatementExecutionExt
from databricks.labs.lsql.core import StatementExecutionExt
- escape_name
from databricks.labs.lsql.escapes import escape_name
- escape_full_name
from databricks.labs.lsql.escapes import escape_full_name
Quickstart
import os
from databricks.sdk import WorkspaceClient
from databricks.labs.lsql.core import StatementExecutionExt
# Ensure DATABRICKS_HOST and DATABRICKS_TOKEN environment variables are set
# For local testing, you might need to export them (e.g., in bash:)
# export DATABRICKS_HOST="https://<your-databricks-workspace-url>"
# export DATABRICKS_TOKEN="<your-personal-access-token>"
w = WorkspaceClient()
# The warehouse_id can be optional if implicitly configured or running within DBR
see = StatementExecutionExt(w, warehouse_id=os.environ.get("TEST_DEFAULT_WAREHOUSE_ID", ""))
# Execute a query and iterate over results
print("Fetching results:")
for pickup_zip, dropoff_zip in see.fetch_all("SELECT pickup_zip, dropoff_zip FROM samples.nyctaxi.trips LIMIT 5"):
print(f' pickup_zip={pickup_zip}, dropoff_zip={dropoff_zip}')
# Execute a DDL/DML statement (e.g., create a temporary view)
print("\nExecuting a DDL statement:")
see.execute("CREATE OR REPLACE TEMPORARY VIEW my_temp_view AS SELECT 'hello' as greeting, 1 as number")
# Fetch from the temporary view
print("\nFetching from temporary view:")
for greeting, number in see.fetch_all("SELECT greeting, number FROM my_temp_view"):
print(f" greeting='{greeting}', number={number}")