Airflow PowerBI Plugin
The airflow-powerbi-plugin, currently at version 1.0.1, is an Apache Airflow plugin designed to automate the refresh of Microsoft Power BI datasets. It provides a custom operator that supports Service Principal Name (SPN) authentication and includes logic to check for existing refreshes before triggering new ones. The project maintains an active release cadence, with updates focused on enhancing Power BI integration within Airflow workflows.
Common errors
-
Authentication failed: AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '...''.
cause The client secret used in the Airflow connection's 'Password' field is incorrect, expired, or the client secret ID was used instead of the value.fixVerify that the Client Secret in your Azure AD app registration is valid and has not expired. Regenerate a new client secret if needed and update the 'Password' field in your Airflow Generic connection. Ensure you are using the *value* of the secret, not its ID. -
ModuleNotFoundError: No module named 'airflow_powerbi_plugin'
cause The `airflow-powerbi-plugin` package is not installed in the Airflow environment or the Airflow scheduler/webserver has not been restarted after installation.fixRun `pip install airflow-powerbi-plugin` in your Airflow environment. If running in a managed service (e.g., Cloud Composer), ensure the package is listed in environment requirements and the environment has updated. Restart Airflow scheduler and webserver processes after installation. -
PowerBIDatasetRefreshException: Dataset refresh failed to complete.
cause This generic error indicates that the Power BI service reported a failure during the dataset refresh operation, often due to underlying data source issues, gateway problems, or data model errors within Power BI.fixCheck the Power BI refresh history for the specific dataset to get more detailed error messages. Common causes include 'Data Source Not Found or Accessible', 'Unable to Refresh Data Due to Gateway Issues', or 'Incorrect Data Type' within Power BI itself. Address the root cause directly in Power BI.
Warnings
- gotcha The plugin relies on Service Principal (SPN) authentication for Power BI. Correct configuration in Azure AD (Client ID, Client Secret, Tenant ID) and granting 'Contributor' role to the Service Principal in the Power BI workspace are critical. Incorrect permissions or expired secrets will lead to authentication failures.
- gotcha Due to limitations in Airflow plugins regarding custom connection forms, you must use a 'Generic' connection type in Airflow for Power BI. The Client ID goes into 'Login', Client Secret into 'Password', and the Tenant ID must be passed as JSON in the 'Extra' field like `{"tenantId": "YOUR_TENANT_ID"}`.
- gotcha There are reports of the PowerBIDatasetRefreshOperator (or similar Power BI operators) sometimes failing in Airflow even when the Power BI dataset refresh itself succeeds. This can indicate a race condition or an issue with how Airflow polls for the refresh status.
- gotcha Airflow workers can sometimes cache authentication tokens incorrectly, especially when multiple DAGs use the same connection, leading to intermittent authentication failures for Power BI tasks.
Install
-
pip install airflow-powerbi-plugin
Imports
- PowerBIDatasetRefreshOperator
from airflow_powerbi_plugin.operators.powerbi import PowerBIDatasetRefreshOperator
- PowerBIHook
from airflow_powerbi_plugin.hooks.powerbi import PowerBIHook
Quickstart
import os
from datetime import datetime
from airflow import DAG
from airflow_powerbi_plugin.operators.powerbi import PowerBIDatasetRefreshOperator
with DAG(
dag_id='powerbi_dataset_refresh_example',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False,
tags=['powerbi', 'example']
) as dag:
refresh_powerbi_dataset = PowerBIDatasetRefreshOperator(
task_id='refresh_powerbi_dataset',
dataset_id=os.environ.get('POWERBI_DATASET_ID', 'YOUR_DATASET_ID'),
group_id=os.environ.get('POWERBI_WORKSPACE_ID', 'YOUR_WORKSPACE_ID'),
powerbi_conn_id='powerbi_default_conn', # This connection must be configured in Airflow
wait_for_termination=True,
timeout=3600, # 1 hour timeout for refresh
check_interval=30 # Check status every 30 seconds
)