Apache Airflow Microsoft Fabric Plugin
The `apache-airflow-microsoft-fabric-plugin` provides an operator and hook to interact with Microsoft Fabric items, such as Lakehouse, Notebook, Datafactory, and Datawarehouse, directly from Apache Airflow. It enables running Spark jobs (like notebooks) within Fabric. The current version is 1.0.3, and it is a community-driven project updated as needed.
Warnings
- gotcha The Airflow Connection must be configured with 'Microsoft Fabric' as the 'Conn Type'. Selecting a generic or incorrect connection type will lead to authentication and API errors.
- gotcha Authentication details (tenant_id, client_id, client_secret for service principal, or managed_identity_client_id for Managed Identity) must be correctly specified in the 'Extra' JSON field of the Airflow Connection.
- gotcha This plugin requires Apache Airflow version 2.4.0 or higher. Older Airflow versions may not support the necessary provider interfaces or features.
Install
-
pip install apache-airflow-microsoft-fabric-plugin
Imports
- FabricRunSparkJobOperator
from apache_airflow_microsoft_fabric_plugin.operators.fabric import FabricRunSparkJobOperator
- FabricHook
from apache_airflow_microsoft_fabric_plugin.hooks.fabric import FabricHook
Quickstart
from __future__ import annotations
import os
import pendulum
from airflow.models.dag import DAG
from apache_airflow_microsoft_fabric_plugin.operators.fabric import FabricRunSparkJobOperator
# Configure your Airflow connection 'azure_fabric_default' with type 'Microsoft Fabric'
# and extra fields like {"tenant_id": "...", "client_id": "...", "client_secret": "..."}
# You can use environment variables for sensitive data in real scenarios.
with DAG(
dag_id="microsoft_fabric_notebook_execution_dag",
schedule=None,
start_date=pendulum.datetime(2023, 10, 26, tz="UTC"),
catchup=False,
tags=["microsoft_fabric", "notebook"],
) as dag:
run_spark_job = FabricRunSparkJobOperator(
task_id="run_spark_notebook_task",
fabric_conn_id="azure_fabric_default", # Ensure this matches your Airflow connection ID
workspace_id=os.environ.get("FABRIC_WORKSPACE_ID", "your_fabric_workspace_id"),
lakehouse_id=os.environ.get("FABRIC_LAKEHOUSE_ID", "your_fabric_lakehouse_id"), # Optional, if notebook interacts with a specific lakehouse
notebook_id=os.environ.get("FABRIC_NOTEBOOK_ID", "your_fabric_notebook_id"),
job_parameters={
"param1": "airflow_run",
"dag_run_id": "{{ dag_run.run_id }}"
} # Optional: pass parameters to your notebook
)