{"id":2148,"library":"openlineage-airflow","title":"OpenLineage Airflow Integration","description":"The `openlineage-airflow` library provides an integration for Apache Airflow to emit lineage metadata to an OpenLineage backend. It captures information about DAGs, tasks, and data interactions, contributing to a comprehensive data lineage graph. The latest version is 1.45.0, and new versions are released frequently, often bi-weekly, aligning with the broader OpenLineage project.","status":"active","version":"1.45.0","language":"en","source_language":"en","source_url":"https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow","tags":["airflow","lineage","data-governance","metadata","etl","orchestration"],"install":[{"cmd":"pip install openlineage-airflow","lang":"bash","label":"Install core package"}],"dependencies":[{"reason":"Required for the integration, must be version 2.0.0 or greater.","package":"apache-airflow","optional":false}],"imports":[{"note":"For sending custom OpenLineage events or facets. Basic lineage collection is automatic via the Airflow plugin and does not require explicit imports in DAGs.","symbol":"OpenLineageClient","correct":"from openlineage.client.OpenLineageClient import OpenLineageClient"}],"quickstart":{"code":"from __future__ import annotations\n\nimport pendulum\nimport os\n\nfrom airflow.models.dag import DAG\nfrom airflow.operators.bash import BashOperator\n\n# Ensure OPENLINEAGE_URL is set in your Airflow environment for events to be sent.\n# For example: export OPENLINEAGE_URL=\"http://localhost:5000\"\n# If your OpenLineage backend requires authentication, also set OPENLINEAGE_API_KEY.\n\nwith DAG(\n    dag_id=\"openlineage_example_dag\",\n    start_date=pendulum.datetime(2023, 1, 1, tz=\"UTC\"),\n    schedule=None,\n    catchup=False,\n    tags=[\"openlineage\", \"example\"],\n) as dag:\n    start_task = BashOperator(\n        task_id=\"start_task\",\n        bash_command=\"echo 'Starting OpenLineage example DAG'\",\n    )\n\n    process_data = BashOperator(\n        task_id=\"process_data_task\",\n        bash_command=\"\"\"\n            echo \"Simulating data processing...\"\n            # In a real scenario, this would interact with data sources (e.g., SQL, Spark).\n            # The OpenLineage Airflow plugin automatically captures dataset information\n            # from supported operators and frameworks.\n            # Example for a SQL task:\n            # airflow tasks run <dag_id> process_data_task 2023-01-01\n            # For a more realistic example with SQL:\n            # from airflow.providers.postgres.operators.postgres import PostgresOperator\n            # PostgresOperator(task_id='insert_data', sql='INSERT INTO output_table SELECT * FROM input_table;')\n            sleep 5\n            echo \"Data processed!\"\n        \"\"\",\n    )\n\n    end_task = BashOperator(\n        task_id=\"end_task\",\n        bash_command=\"echo 'OpenLineage example DAG finished'\",\n    )\n\n    start_task >> process_data >> end_task\n","lang":"python","description":"This quickstart defines a basic Airflow DAG. The `openlineage-airflow` plugin, once installed and configured with `OPENLINEAGE_URL` and optionally `OPENLINEAGE_API_KEY` in the Airflow environment, will automatically capture and emit lineage events for this DAG's runs and tasks. No explicit OpenLineage imports are needed within the DAG file for basic functionality."},"warnings":[{"fix":"Upgrade to `openlineage-airflow==1.40.1` or any newer version to restore access to the `__version__` attribute.","message":"Version 1.40.0 temporarily removed `__version__` attributes from top-level modules, which was fixed in 1.40.1. If your codebase relies on programmatic access to the library's version string (e.g., `openlineage_airflow.__version__`), it would have failed in 1.40.0.","severity":"breaking","affected_versions":"1.40.0"},{"fix":"Ensure your Apache Airflow environment is running version 2.0.0 or later (e.g., `pip install apache-airflow>=2.0.0`). Python 3.9+ is also required.","message":"The OpenLineage Airflow integration requires Apache Airflow 2.0.0 or newer. Using it with older Airflow versions will lead to compatibility issues or outright failures.","severity":"gotcha","affected_versions":"all versions"},{"fix":"Set `OPENLINEAGE_URL` (e.g., `http://localhost:5000`) and `OPENLINEAGE_API_KEY` (if authentication is enabled on your backend) in your Airflow worker/scheduler environment, or configure them in the `[openlineage]` section of your `airflow.cfg`.","message":"For OpenLineage events to be successfully sent, you must configure the OpenLineage backend URL and, if applicable, an API key. This is typically done via environment variables (`OPENLINEAGE_URL`, `OPENLINEAGE_API_KEY`) or within `airflow.cfg`. Misconfiguration will result in events not reaching your OpenLineage collector.","severity":"gotcha","affected_versions":"all versions"},{"fix":"No fix needed, this is intended behavior. The plugin automatically instruments supported operators. For custom event emission or advanced use cases, the `OpenLineageClient` can be imported and used within DAGs or custom operators.","message":"The `openlineage-airflow` integration functions as an Airflow plugin and is automatically loaded by Airflow on startup. You generally do not need to add any specific imports or decorators to your DAG files for basic lineage collection to work. Users expecting explicit Python code to activate the integration might overlook this implicit behavior.","severity":"gotcha","affected_versions":"all versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}