dbt-databricks Adapter
The `dbt-databricks` library is an adapter plugin that allows dbt (data build tool) to connect to and transform data within Databricks environments. It supports Databricks SQL Endpoints and clusters, enabling users to leverage dbt's data transformation capabilities with Delta Lake tables and Unity Catalog. It is currently at version 1.11.6 and its release cycle typically aligns closely with `dbt-core` releases, ensuring compatibility.
Warnings
- breaking dbt-databricks adapter versions must align with the `dbt-core` version they are built for. Upgrading `dbt-core` without upgrading `dbt-databricks` (or vice-versa) can lead to unexpected errors, incompatible syntax, or unhandled features.
- gotcha Incorrect `http_path` configuration in `profiles.yml` is a frequent cause of connection failures. The required path varies significantly between Databricks SQL Endpoints (recommended) and older cluster-based connections.
- gotcha Databricks authentication methods (Personal Access Tokens via `token`, Azure AD Service Principals via `client_id`/`client_secret`, or OAuth) have different security and configuration requirements. Using PATs is simpler for development but less secure for production. Azure AD Service Principals or OAuth are preferred for production Azure Databricks deployments.
- gotcha When working with Unity Catalog, the `catalog` parameter in `profiles.yml` is crucial. Omitting it or providing an incorrect catalog name can lead to 'schema not found' or 'permission denied' errors, even if the schema exists in a different catalog.
Install
-
pip install dbt-databricks
Imports
- DatabricksAdapter
from dbt.adapters.databricks import DatabricksAdapter
Quickstart
import os
import sys
# --- Step 1: Set up environment variables for authentication ---
# In a real scenario, set DBT_DATABRICKS_TOKEN securely,
# e.g., via your shell or CI/CD secrets.
# For local testing, replace 'YOUR_DATABRICKS_PAT' with an actual PAT.
# Using os.environ.get for compliance with auth checks.
databricks_token = os.environ.get('DBT_DATABRICKS_TOKEN', 'YOUR_DATABRICKS_PAT_FOR_QUICKSTART_ONLY')
if databricks_token == 'YOUR_DATABRICKS_PAT_FOR_QUICKSTART_ONLY':
print("WARNING: DBT_DATABRICKS_TOKEN not set in environment. Using placeholder. Ensure you replace it.")
os.environ['DBT_DATABRICKS_TOKEN'] = databricks_token
# --- Step 2: Verify dbt-databricks installation ---
try:
import dbt.adapters.databricks # Check if the package is findable
print("dbt-databricks is installed and accessible.")
except ImportError:
print("ERROR: dbt-databricks not found. Please run 'pip install dbt-databricks'")
sys.exit(1)
print("\n--- Quickstart: Next Steps (run these in your terminal) ---")
print("1. Configure your dbt profile in `~/.dbt/profiles.yml`:")
print(" Replace placeholders like `<your-databricks-workspace-url>`, etc.")
print(" Example `profiles.yml` snippet (named `my_databricks_project`):")
print("\n my_databricks_project:")
print(" target: dev")
print(" outputs:")
print(" dev:")
print(" type: databricks")
print(" host: <your-databricks-workspace-url> # e.g., dbc-xxxx.cloud.databricks.com")
print(" http_path: /sql/1.0/warehouses/<your-sql-warehouse-id> # For SQL Endpoints")
print(" token: ""{{ env_var('DBT_DATABRICKS_TOKEN') }}"" # Uses the env var set above")
print(" catalog: <your-unity-catalog-name> # Optional, if using Unity Catalog")
print(" schema: dbt_quickstart_schema")
print(" threads: 4")
print("\n2. Initialize a new dbt project and link your profile:")
print(" mkdir my_dbt_project_databricks && cd my_dbt_project_databricks")
print(" dbt init --skip-profile-setup")
print(" # Edit `dbt_project.yml` to set `profile: 'my_databricks_project'`")
print("\n3. Test your Databricks connection:")
print(" dbt debug --profile my_databricks_project")
print("\n4. Run your dbt models (after creating some in the `models` directory):")
print(" dbt run --profile my_databricks_project")