Apache Airflow Impala Provider
The Apache Airflow Impala Provider integrates Apache Airflow with Apache Impala, allowing users to programmatically author, schedule, and monitor workflows that interact with Impala databases. It provides hooks and operators to execute SQL queries and manage Impala connections within Airflow DAGs. The current version is 1.9.1, released on 2026-03-28, and follows the release cadence of other Airflow providers, typically updated every few months.
Warnings
- breaking Provider versions have minimum Apache Airflow core version requirements. For provider version 1.9.x, you must be running Airflow 2.11.0 or newer. Installing an incompatible provider version may lead to dependency conflicts or runtime errors.
- deprecated The dedicated `ImpalaOperator` is deprecated. Users should migrate to the more generic and flexible `SQLExecuteQueryOperator` from `airflow.providers.common.sql.operators.sql` for executing SQL queries against Impala.
- gotcha The `ImpalaHook.sqlalchemy_url` property requires the `sqlalchemy` library to be installed. It is an optional dependency for the provider and needs to be installed explicitly via an extra.
- breaking Support for older Python versions has been dropped. Provider 1.9.1 requires Python >=3.10. Older provider versions dropped support for Python 3.9 (e.g., 1.7.1) and Python 3.7.
Install
-
pip install apache-airflow-providers-apache-impala -
pip install apache-airflow-providers-apache-impala[sqlalchemy]
Imports
- ImpalaHook
from airflow.providers.apache.impala.hooks.impala import ImpalaHook
- SQLExecuteQueryOperator
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
Quickstart
from __future__ import annotations
import datetime
from airflow import DAG
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
# Ensure you have an Airflow connection named 'my_impala_conn'
# with appropriate Impala host, port (default 21050), and credentials.
# Example Extra JSON: {'auth_mechanism': 'NOSASL'}
with DAG(
dag_id="example_impala_dag",
start_date=datetime.datetime(2023, 1, 1),
default_args={
"conn_id": "my_impala_conn", # Airflow connection ID for Impala
"owner": "airflow"
},
schedule="@once",
catchup=False,
tags=["impala", "sql"],
) as dag:
create_table_task = SQLExecuteQueryOperator(
task_id="create_impala_table",
sql="""CREATE TABLE IF NOT EXISTS my_impala_table (id INT, name STRING)"""
)
insert_data_task = SQLExecuteQueryOperator(
task_id="insert_impala_data",
sql="""INSERT INTO my_impala_table VALUES (1, 'Alice'), (2, 'Bob')"""
)
select_data_task = SQLExecuteQueryOperator(
task_id="select_impala_data",
sql="""SELECT COUNT(*) FROM my_impala_table""",
handler=lambda x: print(f"Row count: {x[0][0]}")
)
drop_table_task = SQLExecuteQueryOperator(
task_id="drop_impala_table",
sql="""DROP TABLE IF EXISTS my_impala_table"""
)
(create_table_task >> insert_data_task >> select_data_task >> drop_table_task)