{"id":2863,"library":"apache-airflow-providers-airbyte","title":"Airbyte Apache Airflow Provider","description":"The `apache-airflow-providers-airbyte` package provides Apache Airflow operators and sensors to interact with Airbyte, an open-source data integration platform. It enables users to trigger and monitor Airbyte synchronization jobs directly from Airflow DAGs. The current version is 5.4.0, supporting Airflow >=2.11.0 and Python >=3.10, and it maintains a regular release cadence with ongoing development.","status":"active","version":"5.4.0","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/airbyte","tags":["apache-airflow","airbyte","etl","data-integration","provider","orchestration"],"install":[{"cmd":"pip install apache-airflow-providers-airbyte","lang":"bash","label":"Base Installation"},{"cmd":"pip install apache-airflow-providers-airbyte[http]","lang":"bash","label":"With HTTP dependencies (often needed)"}],"dependencies":[{"reason":"Core Airflow functionality; provider version 5.4.0 requires Airflow >=2.11.0.","package":"apache-airflow","optional":false},{"reason":"Required for establishing HTTP connections, which Airbyte APIs utilize.","package":"apache-airflow-providers-http","optional":true}],"imports":[{"symbol":"AirbyteTriggerSyncOperator","correct":"from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator"},{"symbol":"AirbyteJobSensor","correct":"from airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor"},{"note":"Typically used internally by operators/sensors or for custom interactions, less common directly in DAGs.","symbol":"AirbyteHook","correct":"from airflow.providers.airbyte.hooks.airbyte import AirbyteHook"}],"quickstart":{"code":"import os\nfrom datetime import datetime, timedelta\n\nfrom airflow import DAG\nfrom airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator\nfrom airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor\nfrom airflow.utils.dates import days_ago\n\nAIRBYTE_CONNECTION_ID = os.environ.get('AIRBYTE_CONN_ID', 'your_airflow_airbyte_connection_id')\nAIRBYTE_SYNC_CONNECTION_ID = os.environ.get('AIRBYTE_SYNC_CONN_ID', 'your_airbyte_workspace_connection_id')\n\nwith DAG(\n    dag_id='example_airbyte_sync_dag',\n    start_date=days_ago(1),\n    schedule_interval=None,\n    catchup=False,\n    tags=['airbyte', 'example'],\n    dagrun_timeout=timedelta(minutes=60),\n    default_args={\n        'owner': 'airflow',\n    }\n) as dag:\n    trigger_airbyte_sync = AirbyteTriggerSyncOperator(\n        task_id='trigger_airbyte_connection_sync',\n        airbyte_conn_id=AIRBYTE_CONNECTION_ID,\n        connection_id=AIRBYTE_SYNC_CONNECTION_ID, # This is the UUID of the Airbyte connection to trigger\n        asynchronous=True, # Recommended to use with a sensor for long-running jobs\n    )\n\n    monitor_airbyte_sync = AirbyteJobSensor(\n        task_id='monitor_airbyte_connection_sync',\n        airbyte_conn_id=AIRBYTE_CONNECTION_ID,\n        airbyte_job_id=trigger_airbyte_sync.output,\n        poke_interval=5, # Check every 5 seconds\n        timeout=3600, # Timeout after 1 hour\n    )\n\n    trigger_airbyte_sync >> monitor_airbyte_sync","lang":"python","description":"This quickstart DAG demonstrates how to trigger and monitor an Airbyte synchronization job using the `AirbyteTriggerSyncOperator` and `AirbyteJobSensor`. Before running, you need to:\n1. Install the provider: `pip install apache-airflow-providers-airbyte[http]`.\n2. Configure an Airflow 'Airbyte' connection (e.g., `AIRBYTE_CONNECTION_ID`) pointing to your Airbyte instance's API (e.g., `http://localhost:8001`).\n3. Obtain the UUID of the specific Airbyte connection you wish to sync from the Airbyte UI (this is `AIRBYTE_SYNC_CONNECTION_ID`).\n4. Set `AIRBYTE_CONN_ID` and `AIRBYTE_SYNC_CONN_ID` as environment variables or replace them directly in the DAG code."},"warnings":[{"fix":"Update your Airflow Airbyte connection configuration to use `client_id` and `client_secret` for authentication and ensure the host is a complete FQDN. Remove `api_type` parameter if present.","message":"Authentication mechanism for Airbyte connections changed in provider version 4.0.0. It now uses `client_id` and `client_secret` instead of `login` and `password`. The `host` parameter for the Airflow Airbyte connection must be a Fully Qualified Domain Name (FQDN) including schema (e.g., `https://my.company:8000/airbyte/v1/`). The `api_type` parameter was also removed.","severity":"breaking","affected_versions":">=4.0.0"},{"fix":"Ensure your Airflow environment is at least 2.11.0. Replace `polling_interval` with `poke_interval` in `AirbyteJobSensor`.","message":"The minimum required Apache Airflow version has increased over time. For provider version 5.4.0, Airflow 2.11.0+ is required. Additionally, provider version 5.4.0 removed the `polling_interval` parameter from `AirbyteJobSensor`, favoring `poke_interval`.","severity":"breaking","affected_versions":">=5.4.0"},{"fix":"Upgrade Airflow to at least 2.1.0 (preferably the latest compatible version for your provider) before upgrading the Airbyte provider. Run `airflow upgrade db` if automatic upgrade occurs.","message":"Provider version 2.0.0 (and subsequently 2.1.0+) required Airflow 2.1.0+ due to the removal of the `apply_default` decorator. If upgrading the provider on an older Airflow, this could lead to automatic Airflow package upgrades and require a `airflow upgrade db` command.","severity":"breaking","affected_versions":"2.0.0 - <2.2.0"},{"fix":"Understand Airbyte's sync configuration (e.g., incremental vs. full refresh) for the connection being triggered and design your DAGs accordingly to handle potential non-idempotent behavior.","message":"The `AirbyteTriggerSyncOperator` is not idempotent by design. Re-triggering the operator may initiate a new sync job in Airbyte, depending on Airbyte's configuration for the connection. Users should be aware of the Airbyte source/destination sync mode.","severity":"gotcha","affected_versions":"All"},{"fix":"If using Airbyte Cloud, consider implementing `airflow.providers.http.operators.http.SimpleHttpOperator` for more direct API calls to the Airbyte Cloud API instead of the dedicated Airbyte provider operators.","message":"The Airbyte operator in this provider is primarily designed to work with Airbyte self-managed instances (using its internal Config API). For orchestrating Airbyte Cloud, it's generally recommended to use Airflow's generic HTTP operators to interact with the newer Airbyte API directly.","severity":"gotcha","affected_versions":"All"},{"fix":"In the Airbyte UI, for any connection orchestrated by Airflow, set its replication frequency to 'Manual'.","message":"To prevent conflicts and ensure Airflow maintains control, it's highly recommended to set the replication frequency for Airbyte connections triggered by Airflow to 'Manual' within the Airbyte UI.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}