Apache Airflow Apache Hive Provider

9.4.2 · active · verified Thu Apr 16

The `apache-airflow-providers-apache-hive` package provides Apache Airflow operators, hooks, and sensors for interacting with Apache Hive. It supports both HiveServer2 connections (via `HiveHook`) and direct Hive CLI execution (via `HiveCliHook`). Currently at version 9.4.2, it follows the Apache Airflow providers release cycle, typically releasing new versions quarterly or as needed with Airflow major/minor releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a basic DAG using `HiveOperator` to execute HQL (Hive Query Language). It uses `hive_cli_conn_id='hive_cli_default'` which typically relies on the `hive` CLI being available in the Airflow worker's environment. For connecting to HiveServer2, configure a Hive connection in Airflow UI (e.g., `hive_default`) and use `hive_conn_id='hive_default'` in the operator, ensuring `pyhive` is installed.

from __future__ import annotations

import pendulum

from airflow.models.dag import DAG
from airflow.providers.apache.hive.operators.hive import HiveOperator


with DAG(
    dag_id='hive_example_dag',
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    schedule=None,
    tags=['hive', 'example'],
) as dag:
    # Example of running a Hive query via HiveServer2 (requires 'hive_conn_id' and PyHive)
    run_hive_query = HiveOperator(
        task_id='run_hive_query',
        hive_cli_conn_id='hive_cli_default', # Or 'hive_default' for HiveServer2 connection
        hql='''
            CREATE TABLE IF NOT EXISTS my_test_table (
                id INT,
                name STRING
            );
            INSERT INTO TABLE my_test_table VALUES (1, 'Alice');
            SELECT COUNT(*) FROM my_test_table;
        ''',
        # schema='default' # Optional: Specify the target schema
    )

view raw JSON →