{"id":6984,"library":"apache-airflow-providers-apache-druid","title":"Apache Airflow Druid Provider","description":"The Apache Airflow Druid Provider enables Airflow to interact with Apache Druid, a high-performance, real-time analytics database. It includes hooks and operators for executing Druid queries and loading data. The current version is 4.5.2, and Airflow providers generally follow Airflow's release cadence, with updates for new features and bug fixes.","status":"active","version":"4.5.2","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/apache/druid","tags":["Airflow","ETL","Druid","Database","Provider"],"install":[{"cmd":"pip install apache-airflow-providers-apache-druid","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"Old import path from Airflow 1.x / contrib, not compatible with Airflow 2.x providers.","wrong":"from airflow.contrib.hooks.druid_hook import DruidHook","symbol":"DruidHook","correct":"from airflow.providers.apache.druid.hooks.druid import DruidHook"},{"note":"Old import path from Airflow 1.x / contrib, not compatible with Airflow 2.x providers.","wrong":"from airflow.contrib.operators.druid_operator import DruidOperator","symbol":"DruidOperator","correct":"from airflow.providers.apache.druid.operators.druid import DruidOperator"},{"symbol":"DruidToS3Operator","correct":"from airflow.providers.apache.druid.operators.druid_to_s3 import DruidToS3Operator"}],"quickstart":{"code":"import os\nfrom datetime import datetime\n\nfrom airflow.models.dag import DAG\nfrom airflow.providers.apache.druid.operators.druid import DruidOperator\n\n# Ensure you have an Airflow connection named 'druid_default'\n# Or set one up: airflow connections add --conn-id druid_default --conn-type Druid --conn-host localhost --conn-port 8082\n# For local testing, Druid usually runs on localhost:8082 by default\nDRUID_CONN_ID = os.environ.get('AIRFLOW_DRUID_CONN_ID', 'druid_default')\n\nwith DAG(\n    dag_id='druid_example_dag',\n    start_date=datetime(2023, 1, 1),\n    schedule=None,\n    catchup=False,\n    tags=['druid', 'example'],\n) as dag:\n    # Example of a simple Druid SQL query\n    run_druid_sql_query = DruidOperator(\n        task_id='run_druid_sql_query',\n        druid_conn_id=DRUID_CONN_ID,\n        sql_query=\"SELECT COUNT(*) FROM wikipedia WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY\",\n    )\n\n    # Example of a more complex Druid JSON query\n    # This assumes a 'wikipedia' datasource exists\n    run_druid_json_query = DruidOperator(\n        task_id='run_druid_json_query',\n        druid_conn_id=DRUID_CONN_ID,\n        json_query={\n            \"queryType\": \"timeseries\",\n            \"dataSource\": \"wikipedia\",\n            \"granularity\": \"day\",\n            \"intervals\": [\"2016-06-01T00:00:00.000Z/2016-06-02T00:00:00.000Z\"],\n            \"aggregations\": [\n                {\"type\": \"count\", \"name\": \"total_events\"}\n            ]\n        },\n    )\n\n    run_druid_sql_query >> run_druid_json_query","lang":"python","description":"This quickstart demonstrates how to create an Airflow DAG that uses the `DruidOperator` to execute both SQL and native JSON queries against a Druid cluster. It assumes a Druid connection named `druid_default` is configured in Airflow. The examples show querying a 'wikipedia' datasource."},"warnings":[{"fix":"Install `apache-airflow-providers-apache-druid` explicitly (`pip install ...`) and update all imports from `airflow.contrib.*` to `airflow.providers.apache.druid.*`.","message":"Migration from Airflow 1.x to 2.x requires installing providers separately and updating import paths.","severity":"breaking","affected_versions":"<2.0.0 (Airflow core) / <1.0.0 (provider)"},{"fix":"Use either `sql_query='SELECT ...'` for SQL or `json_query={...}` for native Druid queries, but not both simultaneously. Validate your JSON query structure against Druid's API documentation.","message":"DruidOperator supports both `sql_query` and `json_query`. Ensure you use only one and provide the correct query format.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify your Airflow connection: `Admin -> Connections`. Ensure 'Conn Id' matches `druid_conn_id`, 'Conn Type' is 'Druid', and 'Host' and 'Port' are correct for your Druid router/broker.","message":"The `druid_conn_id` must reference a correctly configured Airflow connection of type 'Druid'. Misconfigured connections lead to errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the provider (`pip install apache-airflow-providers-apache-druid`) and update your import statement to `from airflow.providers.apache.druid.operators.druid import DruidOperator`.","cause":"Using an old import path from Airflow 1.x or before the provider was separated.","error":"ModuleNotFoundError: No module named 'airflow.contrib.operators.druid_operator'"},{"fix":"Create an Airflow connection with the matching `conn_id` (e.g., 'my_druid_conn') and 'Conn Type' set to 'Druid' via the Airflow UI (Admin -> Connections) or CLI.","cause":"The specified `druid_conn_id` does not exist in your Airflow connections.","error":"AirflowException: The Druid connection with conn_id 'my_druid_conn' is not found."},{"fix":"Ensure the `json_query` argument is a valid Python dictionary representing the Druid native query structure, or a properly formatted JSON string. If using `sql_query`, do not provide `json_query`.","cause":"The `json_query` parameter in `DruidOperator` was provided with an invalid JSON string or object.","error":"json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)"}]}