{"id":2385,"library":"apache-airflow-providers-openlineage","title":"OpenLineage Airflow Provider","description":"The OpenLineage Airflow Provider integrates Apache Airflow with OpenLineage, an open framework for data lineage collection and analysis. It automatically extracts metadata from Airflow DAGs and tasks, implementing Airflow listener hooks to send lineage events to an OpenLineage backend. The provider is currently at version 2.13.0 and follows the Apache Airflow providers support policy, with release cycles tied to Airflow's own release schedule, typically bumping minimum Airflow version requirements approximately every 12 months.","status":"active","version":"2.13.0","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/openlineage","tags":["airflow","openlineage","data-lineage","provider","metadata"],"install":[{"cmd":"pip install apache-airflow-providers-openlineage","lang":"bash","label":"Install the provider"}],"dependencies":[{"reason":"Core Airflow dependency. Provider version 2.13.0 requires Airflow >=2.11.0.","package":"apache-airflow","optional":false},{"reason":"The underlying OpenLineage client responsible for building and sending lineage events. Provider version 2.13.0 requires openlineage-python >=1.41.0.","package":"openlineage-python","optional":false},{"reason":"Shared components for OpenLineage integrations. Provider version 2.13.0 requires openlineage-integration-common >=1.41.0.","package":"openlineage-integration-common","optional":false}],"imports":[{"note":"Required for implementing custom OpenLineage extractors for Airflow operators.","symbol":"BaseExtractor","correct":"from airflow.providers.openlineage.extractors.base import BaseExtractor"}],"quickstart":{"code":"# 1. Install the provider (see 'install' section).\n# 2. Configure the OpenLineage transport via environment variable or airflow.cfg.\n#    This example sends events to a local Marquez instance (http://localhost:5000).\nimport os\n\n# Recommended method: Environment variable\nos.environ['AIRFLOW__OPENLINEAGE__TRANSPORT'] = '{\"type\": \"http\", \"url\": \"http://localhost:5000\", \"endpoint\": \"api/v1/lineage\"}'\n\n# Alternatively, add this to your airflow.cfg under the [openlineage] section:\n# [openlineage]\n# transport = {\"type\": \"http\", \"url\": \"http://localhost:5000\", \"endpoint\": \"api/v1/lineage\"}\n\n# No changes to user DAG files are typically required for basic lineage collection.\n# The provider automatically hooks into Airflow to extract metadata.\n\nfrom airflow import DAG\nfrom airflow.operators.bash import BashOperator\nfrom airflow.utils.dates import days_ago\n\nwith DAG(\n    dag_id='openlineage_example_dag',\n    start_date=days_ago(1),\n    schedule_interval=None,\n    catchup=False,\n    tags=['openlineage', 'example'],\n) as dag:\n    start_task = BashOperator(\n        task_id='start_task',\n        bash_command='echo \"Starting lineage test...\"',\n    )\n\n    process_data = BashOperator(\n        task_id='process_data',\n        bash_command='echo \"Processing some data...\" && sleep 5',\n    )\n\n    end_task = BashOperator(\n        task_id='end_task',\n        bash_command='echo \"Lineage test complete!\"',\n    )\n\n    start_task >> process_data >> end_task\n\nprint(\"OpenLineage Airflow Provider configured. Run an Airflow DAG to see lineage events.\")","lang":"python","description":"After installing the provider, the core setup involves configuring the OpenLineage transport to specify where lineage events should be sent. This is typically done by setting the `AIRFLOW__OPENLINEAGE__TRANSPORT` environment variable or by adding a `transport` entry in the `[openlineage]` section of your `airflow.cfg`. No modifications to existing DAGs are generally necessary, as the provider operates via Airflow's listener mechanism."},"warnings":[{"fix":"Review the OpenLineage provider changelog when upgrading to version 2.0.0 or later. Ensure your Airflow environment meets the minimum version requirement (>=2.9.0) and update any custom code that relied on removed features or deprecated APIs.","message":"Provider version 2.0.0 introduced significant breaking changes. All previously deprecated classes, parameters, and features were removed. Notably, the `normalize_sql` function was removed from the `openlineage.utils` module. This version also increased the minimum supported Apache Airflow version to 2.9.0.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Ensure that all imports from Airflow within custom extractor code are local (i.e., placed inside the `extract` or `extract_on_complete` methods). For type checking imports, guard them using `typing.TYPE_CHECKING`.","message":"When developing custom OpenLineage extractors, be aware of potential cyclical import issues if importing from Airflow modules. OpenLineage code is instantiated during Airflow worker startup, which differs from DAG code loading, leading to subtle circular import problems.","severity":"gotcha","affected_versions":"All versions with custom extractors"},{"fix":"Verify that the provided path to your custom extractor is an exact, importable Python path from the Airflow worker's perspective. Ensure the extractor code is accessible within Airflow's Python environment.","message":"Incorrectly specifying the path to custom extractors via the `extractors` option in `airflow.cfg` or the `AIRFLOW__OPENLINEAGE__EXTRACTORS` environment variable will prevent the extractor from loading. This results in OpenLineage events missing operator-specific lineage for affected tasks.","severity":"gotcha","affected_versions":"All versions with custom extractors"},{"fix":"For Airflow 2.7+, always use `apache-airflow-providers-openlineage`. If upgrading from an older Airflow version (<2.7) that used `openlineage-airflow`, uninstall the old package and install the new provider. Consult the native provider documentation for Airflow 2.7+ and newer.","message":"The OpenLineage integration for Airflow underwent a significant migration with Airflow 2.7+. For Airflow versions <2.7, the integration was an external package (`openlineage-airflow`). For Airflow 2.7 and newer, it is the official `apache-airflow-providers-openlineage` provider. The legacy `openlineage-airflow` is no longer actively maintained.","severity":"breaking","affected_versions":"Pre-Airflow 2.7 to Post-Airflow 2.7 migration"},{"fix":"For tasks using operators without built-in OpenLineage support or insufficient metadata, consider implementing a custom extractor to provide rich lineage information. Refer to the documentation on 'Implementing OpenLineage in Operators' and 'Pursuing Lineage from Airflow using Custom Extractors'.","message":"If an Airflow operator does not have a corresponding OpenLineage extractor, or if an extractor cannot determine input/output datasets (e.g., from `inlets` and `outlets` that are not Airflow Assets or lack specific lineage methods), the OpenLineage events for that task may be empty regarding inputs/outputs and operator-specific facets. General Airflow facets will still be emitted.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Avoid using Airflow standalone mode with the OpenLineage provider if encountering this issue. If a standalone environment is critical, consider using provider versions prior to 1.8.0, though these are older and may lack features/fixes. It is recommended to use the provider in a supported distributed Airflow setup.","message":"There are known issues with the OpenLineage provider when running with Airflow in standalone mode, particularly concerning the scheduler shutting down due to `OpenLineageListener` pickling failures. This issue has been observed with provider versions 1.8.0 and above. This is typically not reproducible in distributed Airflow environments (e.g., Breeze, Google Composer, Astro Cloud).","severity":"gotcha","affected_versions":">=1.8.0 when used with Airflow Standalone"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}