Airflow dbt Python

raw JSON →
3.5.0 verified Tue Apr 14 auth: no python

airflow-dbt-python is a Python library providing Airflow operators, hooks, and utilities to execute dbt commands. Unlike solutions wrapping the dbt CLI, it directly interfaces with dbt-core, enabling features like using Airflow connections as dbt targets and pushing dbt artifacts to XCom. The library is currently at version 3.5.0 and actively maintained, with a focus on supporting recent versions of Airflow and dbt.

pip install airflow-dbt-python
error ModuleNotFoundError: No module named 'dbt'
cause The 'dbt' module is not installed or not available in the Python environment.
fix
Ensure that 'dbt-core' is installed in your environment by running 'pip install dbt-core'.
error ModuleNotFoundError: No module named 'airflow'
cause The 'apache-airflow' package is not installed or not available in the Python environment.
fix
Install Apache Airflow by running 'pip install apache-airflow'.
error ModuleNotFoundError: No module named 'google'
cause The 'google' module, required by dbt, is not installed in the Python environment.
fix
Install the 'google' module by running 'pip install google'.
error AttributeError: module 'dbt.flags' has no attribute 'PROFILES_DIR'
cause The 'PROFILES_DIR' attribute has been removed or renamed in recent versions of dbt.
fix
Update your code to use the correct attribute or method for accessing the profiles directory in the current version of dbt.
error AttributeError: 'SeedNode' object has no attribute 'depends_on'
cause The 'SeedNode' class in dbt no longer has a 'depends_on' attribute in recent versions.
fix
Downgrade to a compatible version of dbt-core and dbt-databricks by running 'pip install dbt-core<=1.3.1 dbt-databricks<=1.3.1'.
breaking With dbt-core v1.0.0 and later, the way dbt is installed changed significantly. Instead of `pip install dbt`, you now install `dbt-core` and then specific database adapters (e.g., `dbt-redshift`, `dbt-snowflake`). This also impacted how `DbtTestOperator` handled test types.
fix Ensure `dbt-core` and its adapters are installed separately. For `DbtTestOperator`, use `singular` or `generic` arguments instead of `data` or `schema` for dbt-core v1.0.0+.
gotcha In multi-machine or cloud Airflow installations (e.g., AWS MWAA, GCP Cloud Composer), workers may not have a shared local filesystem. Storing dbt project files directly on the worker is unreliable. `airflow-dbt-python` requires dbt project files to be accessible.
fix Store dbt projects in remote storage (e.g., S3, GCS, Git repositories) and use `project_dir` and `profiles_dir` with URL schemes (e.g., `s3://bucket/project/`) or configure Airflow Connections for dbt targets. `airflow-dbt-python` will download files to a temporary directory for execution.
breaking New versions of Apache Airflow and dbt-core may introduce breaking changes. The `airflow-dbt-python` library aims to keep up with the latest releases, but compatibility issues can arise.
fix Always test new versions of Airflow, dbt-core, and `airflow-dbt-python` in a staging environment before upgrading production systems. Report any issues to the library maintainers.
gotcha When omitting `profiles_dir` in operators, `airflow-dbt-python` will first check if the `project_dir` URL includes a `profiles.yml`. If not found, it will attempt to find an Airflow Connection using the `target` argument.
fix Explicitly set `profiles_dir` or ensure your dbt project remote URL includes `profiles.yml`. Alternatively, define an Airflow Connection with the ID matching your dbt `target` name.
pip install airflow-dbt-python[redshift]
pip install airflow-dbt-python[snowflake]

This example DAG demonstrates a basic dbt workflow using airflow-dbt-python operators. It includes seeding data, running dbt models with specific tags, and executing tests. Replace `/path/to/my/dbt/project/` and `~/.dbt/` with your actual dbt project and profiles directories, or configure remote storage as needed for multi-machine/cloud environments. Ensure your Airflow connections for dbt targets are configured if not using `profiles.yml`.

import datetime as dt

from airflow import DAG
from airflow.utils.dates import days_ago
from airflow_dbt_python.operators.dbt import (
    DbtRunOperator,
    DbtSeedOperator,
    DbtTestOperator,
)

default_args = {
    "owner": "airflow",
    "start_date": days_ago(1),
    "depends_on_past": False,
    "email_on_failure": False,
    "email_on_retry": False,
    "retries": 1,
}

with DAG(
    dag_id="example_dbt_workflow",
    schedule_interval="0 0 * * *",
    catchup=False,
    dagrun_timeout=dt.timedelta(minutes=60),
    default_args=default_args,
    tags=["dbt", "example"],
) as dag:
    dbt_seed = DbtSeedOperator(
        task_id="dbt_seed_task",
        project_dir="/path/to/my/dbt/project/",
        profiles_dir="~/.dbt/",
        target="production",
        profile="my-project",
    )

    dbt_run = DbtRunOperator(
        task_id="dbt_run_task",
        project_dir="/path/to/my/dbt/project/",
        profiles_dir="~/.dbt/",
        target="production",
        profile="my-project",
        select=["+tag:daily"],
        exclude=["tag:deprecated"],
        full_refresh=False,
    )

    dbt_test = DbtTestOperator(
        task_id="dbt_test_task",
        project_dir="/path/to/my/dbt/project/",
        profiles_dir="~/.dbt/",
        target="production",
        profile="my-project",
        singular=True, # For dbt-core v1.0.0+ tests
    )

    dbt_seed >> dbt_run >> dbt_test