Apache Airflow Tableau Provider
The Apache Airflow Tableau Provider enables seamless integration between Apache Airflow and Tableau Server/Online. It offers operators and hooks to automate various Tableau-related tasks, such as refreshing data sources, publishing workbooks, and managing server content, directly within Airflow DAGs. The provider is actively maintained as part of the Apache Airflow project, with regular releases aligning with Airflow's provider release cycles, currently at version 5.3.4.
Warnings
- deprecated Authentication by personal access token (PAT) is officially deprecated in provider version 2.1.0 and later due to concurrency issues. Tableau automatically invalidates PATs if multiple parallel connections use the same token, leading to job failures.
- breaking Provider version 4.0.0 removed deprecated classes paths, specifically `tableau_job_status` and `tableau_refresh_workbook`.
- breaking Provider version 3.0.0 requires Apache Airflow 2.2.0 or newer. Installing this provider version on older Airflow instances may automatically upgrade Airflow.
- gotcha Failure to provide the correct `site_id` for Tableau Online or a non-default Tableau Server site can lead to `NotSignedInError: Missing site ID` errors, even if other credentials are correct.
- gotcha Using self-signed SSL certificates can result in `SSLCertVerificationError` even if `openssl s_client` validates the certificate, indicating potential deeper issues with how `tableauserverclient` handles custom CA bundles within Airflow.
Install
-
pip install apache-airflow-providers-tableau
Imports
- TableauOperator
from airflow.providers.tableau.operators.tableau import TableauOperator
- TableauHook
from airflow.providers.tableau.hooks.tableau import TableauHook
- TableauJobStatusSensor
from airflow.providers.tableau.sensors.tableau_job_status import TableauJobStatusSensor
Quickstart
import os
from datetime import datetime
from airflow import DAG
from airflow.providers.tableau.operators.tableau import TableauOperator
with DAG(
dag_id='example_tableau_refresh_workbook',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False,
tags=['tableau', 'example'],
params={'workbook_id': os.environ.get('AIRFLOW_WORKBOOK_ID', 'your_workbook_id')}
) as dag:
refresh_workbook_blocking = TableauOperator(
task_id='refresh_tableau_workbook_blocking',
resource='workbooks',
method='refresh',
find="{{ params.workbook_id }}",
site_id=os.environ.get('AIRFLOW_TABLEAU_SITE_ID', ''), # Use '' for default site
blocking_refresh=True, # Waits until refresh is complete
tableau_conn_id=os.environ.get('AIRFLOW_TABLEAU_CONN_ID', 'tableau_default')
)