Airflow PowerBI Plugin

1.0.1 · active · verified Thu Apr 16

The airflow-powerbi-plugin, currently at version 1.0.1, is an Apache Airflow plugin designed to automate the refresh of Microsoft Power BI datasets. It provides a custom operator that supports Service Principal Name (SPN) authentication and includes logic to check for existing refreshes before triggering new ones. The project maintains an active release cadence, with updates focused on enhancing Power BI integration within Airflow workflows.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up a DAG to refresh a Power BI dataset using the `PowerBIDatasetRefreshOperator`. Before running, ensure you have: 1. **Installed the plugin**: `pip install airflow-powerbi-plugin` 2. **Configured an Airflow Connection**: Create a Generic Airflow connection with `Conn Id`: `powerbi_default_conn`. Set `Conn Type`: `Generic`. In `Login` provide your Service Principal's Client ID (e.g., `os.environ.get('POWERBI_CLIENT_ID')`), in `Password` your Client Secret (e.g., `os.environ.get('POWERBI_CLIENT_SECRET')`), and in `Extra` add `{"tenantId": "YOUR_TENANT_ID"}` (e.g., `{"tenantId": "` + `os.environ.get('POWERBI_TENANT_ID')` + `"}`). 3. **Power BI Permissions**: Ensure your Service Principal has 'Contributor' role in the Power BI workspace.

import os
from datetime import datetime
from airflow import DAG
from airflow_powerbi_plugin.operators.powerbi import PowerBIDatasetRefreshOperator

with DAG(
    dag_id='powerbi_dataset_refresh_example',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False,
    tags=['powerbi', 'example']
) as dag:
    refresh_powerbi_dataset = PowerBIDatasetRefreshOperator(
        task_id='refresh_powerbi_dataset',
        dataset_id=os.environ.get('POWERBI_DATASET_ID', 'YOUR_DATASET_ID'),
        group_id=os.environ.get('POWERBI_WORKSPACE_ID', 'YOUR_WORKSPACE_ID'),
        powerbi_conn_id='powerbi_default_conn', # This connection must be configured in Airflow
        wait_for_termination=True,
        timeout=3600, # 1 hour timeout for refresh
        check_interval=30 # Check status every 30 seconds
    )

view raw JSON →