Apache Airflow Provider for Apache Drill
raw JSON → 3.3.2 verified Fri May 01 auth: no python
The Apache Airflow provider for Apache Drill allows you to integrate Drill's SQL query engine with Airflow, enabling execution of Drill queries via the DrillOperator and DrillHook. Version 3.3.2 is the latest, supporting Airflow 2.x+ and Python >=3.10. Release cadence follows Airflow's provider schedule.
pip install apache-airflow-providers-apache-drill Common errors
error ModuleNotFoundError: No module named 'airflow.providers.apache.drill' ↓
cause Provider package not installed or Airflow version too old (pre-2.0).
fix
Install the provider: pip install apache-airflow-providers-apache-drill
error sqlalchemy.exc.InvalidRequestError: Could not reflect: could not get column names for table ↓
cause Drill connection string misconfiguration or missing SQLAlchemy dialect.
fix
Ensure drill_sqlalchemy or sqlalchemy-drill is installed and connection URI is correct (e.g., drill+sadrill://...).
Warnings
breaking In Airflow 2.0, all provider operators/hooks moved under airflow.providers.apache.drill. The old paths (airflow.operators.drill_operator) will raise ImportError. ↓
fix Use correct imports: from airflow.providers.apache.drill.operators.drill import DrillOperator
deprecated The DrillHook's get_conn method may be deprecated in favor of get_connection in future releases. Check provider version changelog. ↓
fix Use get_connection if available, or refer to provider docs.
Imports
- DrillOperator wrong
from airflow.operators.drill_operator import DrillOperatorcorrectfrom airflow.providers.apache.drill.operators.drill import DrillOperator - DrillHook wrong
from airflow.hooks.drill_hook import DrillHookcorrectfrom airflow.providers.apache.drill.hooks.drill import DrillHook
Quickstart
from airflow import DAG
from airflow.providers.apache.drill.operators.drill import DrillOperator
from datetime import datetime
with DAG(
dag_id='drill_example',
start_date=datetime(2023, 1, 1),
schedule_interval='@daily',
catchup=False,
) as dag:
drill_query = DrillOperator(
task_id='query_drill',
sql='SELECT * FROM cp.`employee.json` LIMIT 10',
drill_conn_id='drill_default',
)