Apache Airflow Microsoft Fabric Plugin

raw JSON →
1.0.3 verified Wed May 13 auth: no python install: stale

The `apache-airflow-microsoft-fabric-plugin` provides an operator and hook to interact with Microsoft Fabric items, such as Lakehouse, Notebook, Datafactory, and Datawarehouse, directly from Apache Airflow. It enables running Spark jobs (like notebooks) within Fabric. The current version is 1.0.3, and it is a community-driven project updated as needed.

pip install apache-airflow-microsoft-fabric-plugin
error ModuleNotFoundError: No module named 'airflow.providers.microsoft.fabric'
cause The 'apache-airflow-microsoft-fabric-plugin' is not installed or not in the Python path.
fix
Install the plugin using 'pip install apache-airflow-microsoft-fabric-plugin'.
error ImportError: cannot import name 'MicrosoftFabricHook' from 'airflow.providers.microsoft.fabric.hooks.fabric'
cause The import path is incorrect or the module is not installed.
fix
Ensure the correct import path: 'from airflow.providers.microsoft.fabric.hooks.fabric import MicrosoftFabricHook'.
error ModuleNotFoundError: No module named 'pyspark'
cause The 'pyspark' library is not installed in the environment.
fix
Install 'pyspark' using 'pip install pyspark'.
error ModuleNotFoundError: No module named 'azure'
cause The 'azure' library is not installed in the environment.
fix
Install the required Azure package using 'pip install azure'.
error ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner'
cause The 'SUPERVISOR_COMMS' attribute is missing due to an outdated or incompatible Airflow version.
fix
Upgrade Airflow to the latest version using 'pip install --upgrade apache-airflow'.
gotcha The Airflow Connection must be configured with 'Microsoft Fabric' as the 'Conn Type'. Selecting a generic or incorrect connection type will lead to authentication and API errors.
fix When creating or editing the Airflow Connection for Microsoft Fabric, ensure you select 'Microsoft Fabric' from the 'Conn Type' dropdown in the Airflow UI.
gotcha Authentication details (tenant_id, client_id, client_secret for service principal, or managed_identity_client_id for Managed Identity) must be correctly specified in the 'Extra' JSON field of the Airflow Connection.
fix Verify that your 'Extra' field JSON is valid and contains the correct credentials. For example: `{"tenant_id": "YOUR_TENANT_ID", "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET"}`.
gotcha This plugin requires Apache Airflow version 2.4.0 or higher. Older Airflow versions may not support the necessary provider interfaces or features.
fix Ensure your Airflow environment is running version 2.4.0 or newer. Upgrade Airflow if necessary.
gotcha The 'pendulum' Python package is a required dependency and must be installed in the environment for the library or its Airflow provider to function correctly.
fix Install the 'pendulum' package in your environment: `pip install pendulum`.
breaking The required Python package 'pendulum' is not found in the environment. This is a core dependency typically installed with Apache Airflow.
fix Ensure that Apache Airflow and its relevant provider package (e.g., `apache-airflow-providers-microsoft-fabric`) are correctly installed in your Python environment. For example, `pip install apache-airflow[microsoft.fabric]`.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) wheel - - 17.9M - broken
3.10 alpine (musl) - - - - - -
3.10 slim (glibc) wheel 1.4s - 18M - broken
3.10 slim (glibc) - - - - - -
3.11 alpine (musl) wheel - - 19.7M - broken
3.11 alpine (musl) - - - - - -
3.11 slim (glibc) wheel 1.5s - 20M - broken
3.11 slim (glibc) - - - - - -
3.12 alpine (musl) wheel - - 11.6M - broken
3.12 alpine (musl) - - - - - -
3.12 slim (glibc) wheel 1.4s - 12M - broken
3.12 slim (glibc) - - - - - -
3.13 alpine (musl) wheel - - 11.3M - broken
3.13 alpine (musl) - - - - - -
3.13 slim (glibc) wheel 1.4s - 12M - broken
3.13 slim (glibc) - - - - - -
3.9 alpine (musl) wheel - - 17.4M - broken
3.9 alpine (musl) - - - - - -
3.9 slim (glibc) wheel 1.7s - 18M - broken
3.9 slim (glibc) - - - - - -

This example DAG demonstrates how to use the `FabricRunSparkJobOperator` to execute a Microsoft Fabric Spark Notebook. Before running, configure an Airflow Connection with `Conn Id: azure_fabric_default` (or your custom ID), `Conn Type: Microsoft Fabric`, and necessary authentication details (e.g., `tenant_id`, `client_id`, `client_secret` for service principal, or `managed_identity_client_id` for Managed Identity) in its `Extra` JSON field. Replace placeholder IDs with your actual Fabric Workspace, Lakehouse (if applicable), and Notebook IDs.

from __future__ import annotations

import os
import pendulum

from airflow.models.dag import DAG
from apache_airflow_microsoft_fabric_plugin.operators.fabric import FabricRunSparkJobOperator

# Configure your Airflow connection 'azure_fabric_default' with type 'Microsoft Fabric'
# and extra fields like {"tenant_id": "...", "client_id": "...", "client_secret": "..."}
# You can use environment variables for sensitive data in real scenarios.

with DAG(
    dag_id="microsoft_fabric_notebook_execution_dag",
    schedule=None,
    start_date=pendulum.datetime(2023, 10, 26, tz="UTC"),
    catchup=False,
    tags=["microsoft_fabric", "notebook"],
) as dag:
    run_spark_job = FabricRunSparkJobOperator(
        task_id="run_spark_notebook_task",
        fabric_conn_id="azure_fabric_default", # Ensure this matches your Airflow connection ID
        workspace_id=os.environ.get("FABRIC_WORKSPACE_ID", "your_fabric_workspace_id"),
        lakehouse_id=os.environ.get("FABRIC_LAKEHOUSE_ID", "your_fabric_lakehouse_id"), # Optional, if notebook interacts with a specific lakehouse
        notebook_id=os.environ.get("FABRIC_NOTEBOOK_ID", "your_fabric_notebook_id"),
        job_parameters={
            "param1": "airflow_run",
            "dag_run_id": "{{ dag_run.run_id }}"
        } # Optional: pass parameters to your notebook
    )