dbt-databricks Adapter

1.11.6 · active · verified Thu Apr 09

The `dbt-databricks` library is an adapter plugin that allows dbt (data build tool) to connect to and transform data within Databricks environments. It supports Databricks SQL Endpoints and clusters, enabling users to leverage dbt's data transformation capabilities with Delta Lake tables and Unity Catalog. It is currently at version 1.11.6 and its release cycle typically aligns closely with `dbt-core` releases, ensuring compatibility.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates setting up environment variables for Databricks authentication (using a Personal Access Token) and outlines the necessary `profiles.yml` configuration and dbt CLI commands to initialize a project, test connectivity, and run dbt models on Databricks. Ensure you replace placeholder values with your actual Databricks workspace URL, SQL warehouse ID, and Unity Catalog name (if applicable).

import os
import sys

# --- Step 1: Set up environment variables for authentication ---
# In a real scenario, set DBT_DATABRICKS_TOKEN securely,
# e.g., via your shell or CI/CD secrets.
# For local testing, replace 'YOUR_DATABRICKS_PAT' with an actual PAT.
# Using os.environ.get for compliance with auth checks.
databricks_token = os.environ.get('DBT_DATABRICKS_TOKEN', 'YOUR_DATABRICKS_PAT_FOR_QUICKSTART_ONLY')
if databricks_token == 'YOUR_DATABRICKS_PAT_FOR_QUICKSTART_ONLY':
    print("WARNING: DBT_DATABRICKS_TOKEN not set in environment. Using placeholder. Ensure you replace it.")
os.environ['DBT_DATABRICKS_TOKEN'] = databricks_token

# --- Step 2: Verify dbt-databricks installation ---
try:
    import dbt.adapters.databricks # Check if the package is findable
    print("dbt-databricks is installed and accessible.")
except ImportError:
    print("ERROR: dbt-databricks not found. Please run 'pip install dbt-databricks'")
    sys.exit(1)

print("\n--- Quickstart: Next Steps (run these in your terminal) ---")
print("1. Configure your dbt profile in `~/.dbt/profiles.yml`:")
print("   Replace placeholders like `<your-databricks-workspace-url>`, etc.")
print("   Example `profiles.yml` snippet (named `my_databricks_project`):")
print("\n      my_databricks_project:")
print("        target: dev")
print("        outputs:")
print("          dev:")
print("            type: databricks")
print("            host: <your-databricks-workspace-url> # e.g., dbc-xxxx.cloud.databricks.com")
print("            http_path: /sql/1.0/warehouses/<your-sql-warehouse-id> # For SQL Endpoints")
print("            token: ""{{ env_var('DBT_DATABRICKS_TOKEN') }}"" # Uses the env var set above")
print("            catalog: <your-unity-catalog-name> # Optional, if using Unity Catalog")
print("            schema: dbt_quickstart_schema")
print("            threads: 4")
print("\n2. Initialize a new dbt project and link your profile:")
print("   mkdir my_dbt_project_databricks && cd my_dbt_project_databricks")
print("   dbt init --skip-profile-setup")
print("   # Edit `dbt_project.yml` to set `profile: 'my_databricks_project'`")
print("\n3. Test your Databricks connection:")
print("   dbt debug --profile my_databricks_project")
print("\n4. Run your dbt models (after creating some in the `models` directory):")
print("   dbt run --profile my_databricks_project")

view raw JSON →