dbt-spark

1.10.1 · active · verified Thu Apr 09

dbt-spark is the Apache Spark adapter plugin for dbt (data build tool), enabling data analysts and engineers to transform data in Apache Spark using SQL. It leverages Spark's distributed computing capabilities for efficient data transformation. The current version is 1.10.1, and it typically releases new versions in alignment with `dbt-core`'s major and minor releases.

Warnings

Install

Imports

Quickstart

This quickstart outlines the `profiles.yml` configuration for connecting dbt to a local Spark Thrift server, often set up via docker-compose (as demonstrated in the dbt-spark repository README). It also suggests a basic SQL model for validation.

import os

# This quickstart demonstrates configuring dbt-spark with a local Spark Thrift server.
# First, ensure you have Docker installed and the dbt-spark local environment set up.
# From the dbt-adapters/dbt-spark directory, run:
# docker-compose up -d

# Create a profiles.yml file in your dbt project's ~/.dbt/ directory or project root
profiles_content = '''
spark_local_dev:
  target: dev
  outputs:
    dev:
      type: spark
      method: thrift
      host: 127.0.0.1
      port: 10000
      user: dbt
      schema: analytics
      connect_retries: 5
      connect_timeout: 60
      retry_all: true
'''

# For demonstration, we'll write it to a temporary location
# In a real scenario, this goes to ~/.dbt/profiles.yml
# or in your dbt project folder directly.
profile_path = os.path.expanduser('~/.dbt/profiles.yml') # For a real setup
# Or for a quick test in a temporary project directory:
# profile_path = 'dbt_project/profiles.yml'

# Ensure the directory exists if writing to ~/.dbt/
os.makedirs(os.path.dirname(profile_path), exist_ok=True)

with open(profile_path, 'w') as f:
    f.write(profiles_content)

print(f"profiles.yml created at {profile_path} (or its content suggested for it).")
print("Next, initialize a dbt project: dbt init my_spark_project")
print("Select 'spark_local_dev' as your profile when prompted.")
print("Then, create a model, e.g., models/my_model.sql:")
print("---\nSELECT 1 AS id, 'hello dbt-spark' AS message\n---")
print("Run your dbt models: dbt run --profile spark_local_dev")

view raw JSON →