dbt-fabricspark
dbt-fabricspark is a Microsoft Fabric Spark adapter plugin for dbt (data build tool), enabling data analysts and engineers to transform data within Microsoft Fabric Lakehouses. It connects to Fabric Lakehouses via Livy endpoints, supports both schema-enabled and non-schema configurations, and includes Livy session management. The library is actively maintained by Microsoft and its current version is 1.9.5, typically aligning with dbt-core release cycles.
Common errors
-
Authentication has expired. Please log in again. OR HTTP 401 Unauthorized errors.
cause The Azure CLI token used for authentication has expired, or the service principal credentials are incorrect/expired.fixRun `az login` in your terminal to refresh your Azure CLI session. For service principal authentication, ensure your client ID, tenant ID, and client secret/certificate are valid and correctly configured in `profiles.yml` or environment variables. -
Compilation Error: 'None' has no attribute 'table' OR 'str object' has no attribute 'COLUMN_NAME'
cause This typically means a Jinja macro or variable reference is incorrect or returns an unexpected `None` or string type when an object/dictionary is expected. This can happen if a macro doesn't return a value or a variable is referenced improperly (e.g., `{{ var.value }}` instead of `{{ var }}`).fixInspect the macro or variable in the specified file. Check its return type and how it's being accessed. Ensure all package macros are available by running `dbt deps`. -
Database Error: SQL compilation error: syntax error line X at position Y unexpected 'keyword' OR Invalid data type cast.
cause dbt's compile phase processes Jinja but doesn't fully validate underlying SQL syntax or data type compatibility until runtime. This error indicates an issue with the generated SQL that Spark cannot execute. Common causes include typos, incorrect SQL functions for Spark, or incompatible data types in operations.fixReview the compiled SQL in `target/compiled/<project_name>/models/...` and `target/run/<project_name>/models/...`. Run the compiled SQL directly in a Spark SQL client to debug syntax. Use Spark-compatible SQL and ensure data types are correctly handled (e.g., using `CAST` functions). -
Error creating relation 'my_table'. Table 'my_table' already exists. OR dbt is trying to create a table that already exists.
cause This usually occurs with incremental models where the `incremental_strategy` or `unique_key` is misconfigured, leading dbt to attempt a full table creation instead of an incremental update or merge, or when `is_incremental()` logic is missing or flawed.fixFor incremental models, verify the `materialized='incremental'` configuration. Ensure `incremental_strategy` is set correctly (`merge` or `insert_overwrite`) and `unique_key` and `partition_by` (if applicable) are defined in your model config. Validate the `is_incremental()` logic in your SQL for proper conditional execution.
Warnings
- breaking Breaking change in dbt-fabricspark v1.8: Prior to version 1.8, installing `dbt-fabricspark` automatically installed `dbt-core` and its dependencies. From version 1.8 onwards, `dbt-core` must be installed separately alongside the adapter.
- gotcha Do not use 'database' configuration: Microsoft Fabric Spark uses the terms 'schema' and 'database' interchangeably, but dbt treats 'database' as a higher level. When using `dbt-fabricspark`, you should never use or set `database` as a node configuration or in your `profiles.yml` target profile.
- gotcha Incremental strategy default ('append') can cause duplicates: The default `incremental_strategy` for dbt-fabricspark is `append`. This strategy inserts new records without updating or overwriting existing data, which can lead to duplicate records in your models if not managed carefully.
- gotcha Livy session management for local development: In 'Fabric mode' (`livy_mode: fabric`, default), enabling `reuse_session: true` can persist the Livy session ID to a local file, allowing dbt to reuse an existing Spark session on subsequent runs instead of creating a new one. This is key for faster development workflows but requires proper configuration of `session_id_file`.
Install
-
pip install dbt-core dbt-fabricspark -
az login
Quickstart
# 1. Ensure dbt-core and dbt-fabricspark are installed and Azure CLI is logged in.
# pip install dbt-core dbt-fabricspark
# az login
# 2. Configure your ~/.dbt/profiles.yml file (replace placeholders):
# fabricspark-dev:
# target: dev
# outputs:
# dev:
# type: fabricspark
# method: livy
# authentication: CLI
# endpoint: https://api.fabric.microsoft.com/v1
# workspaceid: <your-workspace-guid>
# lakehouseid: <your-lakehouse-guid>
# lakehouse: <your-lakehouse-name>
# schema: <your-schema-name> # Optional, defaults to target schema, or lakehouse name for non-schema lakehouses
# threads: 1
# connect_retries: 2
# connect_timeout: 10
# retry_all: true # Recommended for production
# 3. Create a dbt project (if you don't have one):
# dbt init my_fabric_project
# 4. In your dbt project's dbt_project.yml, set the profile:
# name: 'my_fabric_project'
# profile: 'fabricspark-dev'
# 5. Create a sample model (e.g., models/my_first_model.sql):
# -- models/my_first_model.sql
# {{ config(materialized='table') }}
# SELECT
# 1 as id,
# 'dbt fabricspark test' as message
# 6. Run your dbt models:
# dbt run
# This command will connect to your Fabric Lakehouse via Spark Livy and execute the SQL.
print("dbt-fabricspark quickstart involves configuring profiles.yml and running dbt CLI commands.")
print("Authentication relies on an active Azure CLI login session.")