{"id":6985,"library":"apache-airflow-providers-apache-hive","title":"Apache Airflow Apache Hive Provider","description":"The `apache-airflow-providers-apache-hive` package provides Apache Airflow operators, hooks, and sensors for interacting with Apache Hive. It supports both HiveServer2 connections (via `HiveHook`) and direct Hive CLI execution (via `HiveCliHook`). Currently at version 9.4.2, it follows the Apache Airflow providers release cycle, typically releasing new versions quarterly or as needed with Airflow major/minor releases.","status":"active","version":"9.4.2","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/apache/hive","tags":["apache-airflow","hive","data-pipeline","etl","database","big-data"],"install":[{"cmd":"pip install apache-airflow-providers-apache-hive","lang":"bash","label":"Install base provider"},{"cmd":"pip install apache-airflow-providers-apache-hive[jdbc,kerberos,presto,s3,samba,sasl]","lang":"bash","label":"Install with all extras"},{"cmd":"pip install apache-airflow-providers-apache-hive[kerberos]","lang":"bash","label":"Install with Kerberos support"}],"dependencies":[{"reason":"Core Apache Airflow framework is required for all providers.","package":"apache-airflow","optional":false},{"reason":"Required for `HiveHook` to connect to HiveServer2 via Thrift. Not a direct provider dependency, but an underlying requirement for common use cases.","package":"pyhive","optional":true},{"reason":"Required for SASL authentication, often used with Kerberos for `HiveHook`.","package":"thrift-sasl","optional":true}],"imports":[{"note":"Old contrib import for Airflow 1.x","wrong":"from airflow.contrib.operators.hive_operator import HiveOperator","symbol":"HiveOperator","correct":"from airflow.providers.apache.hive.operators.hive import HiveOperator"},{"note":"Old contrib import for Airflow 1.x","wrong":"from airflow.contrib.operators.hive_cli_operator import HiveCliOperator","symbol":"HiveCliOperator","correct":"from airflow.providers.apache.hive.operators.hive_cli import HiveCliOperator"},{"note":"Old contrib import for Airflow 1.x","wrong":"from airflow.contrib.hooks.hive_hook import HiveHook","symbol":"HiveHook","correct":"from airflow.providers.apache.hive.hooks.hive import HiveHook"},{"note":"Old contrib import for Airflow 1.x","wrong":"from airflow.contrib.hooks.hive_cli_hook import HiveCliHook","symbol":"HiveCliHook","correct":"from airflow.providers.apache.hive.hooks.hive_cli import HiveCliHook"},{"note":"Old contrib import for Airflow 1.x","wrong":"from airflow.contrib.sensors.hive_partition_sensor import HivePartitionSensor","symbol":"HivePartitionSensor","correct":"from airflow.providers.apache.hive.sensors.hive_partition import HivePartitionSensor"}],"quickstart":{"code":"from __future__ import annotations\n\nimport pendulum\n\nfrom airflow.models.dag import DAG\nfrom airflow.providers.apache.hive.operators.hive import HiveOperator\n\n\nwith DAG(\n    dag_id='hive_example_dag',\n    start_date=pendulum.datetime(2023, 1, 1, tz=\"UTC\"),\n    catchup=False,\n    schedule=None,\n    tags=['hive', 'example'],\n) as dag:\n    # Example of running a Hive query via HiveServer2 (requires 'hive_conn_id' and PyHive)\n    run_hive_query = HiveOperator(\n        task_id='run_hive_query',\n        hive_cli_conn_id='hive_cli_default', # Or 'hive_default' for HiveServer2 connection\n        hql='''\n            CREATE TABLE IF NOT EXISTS my_test_table (\n                id INT,\n                name STRING\n            );\n            INSERT INTO TABLE my_test_table VALUES (1, 'Alice');\n            SELECT COUNT(*) FROM my_test_table;\n        ''',\n        # schema='default' # Optional: Specify the target schema\n    )","lang":"python","description":"This quickstart demonstrates a basic DAG using `HiveOperator` to execute HQL (Hive Query Language). It uses `hive_cli_conn_id='hive_cli_default'` which typically relies on the `hive` CLI being available in the Airflow worker's environment. For connecting to HiveServer2, configure a Hive connection in Airflow UI (e.g., `hive_default`) and use `hive_conn_id='hive_default'` in the operator, ensuring `pyhive` is installed."},"warnings":[{"fix":"Update all imports from `airflow.contrib.operators.hive_operator` or similar to `airflow.providers.apache.hive.operators.hive` and corresponding paths for hooks and sensors.","message":"Airflow 1.x `contrib` operators/hooks were moved to provider packages in Airflow 2.x. Direct imports from `airflow.contrib` will fail.","severity":"breaking","affected_versions":"Airflow 2.0.0 and newer, apache-airflow-providers-apache-hive versions 1.0.0 and newer."},{"fix":"`HiveCliOperator` (and `HiveOperator` using `hive_cli_conn_id`) expects the `hive` command-line tool to be available on the Airflow worker and configured correctly. `HiveOperator` using `hive_conn_id` (HiveServer2) requires a DBAPI driver like `pyhive` and a proper HiveServer2 connection setup in Airflow UI.","message":"Distinction between `HiveCliOperator` (or `HiveOperator` with `hive_cli_conn_id`) and `HiveOperator` (with `hive_conn_id`). They use different underlying mechanisms and require different connection configurations.","severity":"gotcha","affected_versions":"All versions of `apache-airflow-providers-apache-hive`."},{"fix":"Install the necessary extras or packages: `pip install apache-airflow-providers-apache-hive[kerberos]` or manually `pip install pyhive thrift-sasl`. The specific requirements depend on the connection type (e.g., Kerberos, LDAP).","message":"Missing required underlying Python libraries for HiveServer2 connections (e.g., `pyhive`, `thrift-sasl`).","severity":"gotcha","affected_versions":"All versions where `HiveHook` (used by `HiveOperator` with `hive_conn_id`) is used."},{"fix":"Ensure Kerberos client (`kinit`) is properly configured on the Airflow worker, keytabs are accessible, and `KRB5_KTNAME` environment variable is set if using a non-default keytab path. Also, install `apache-airflow-providers-apache-hive[kerberos]` and configure the Hive connection in Airflow UI with the 'Auth Mechanism' set to 'Kerberos'.","message":"Complexities with Kerberos authentication for Hive connections.","severity":"gotcha","affected_versions":"All versions."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install `pyhive` along with the provider: `pip install apache-airflow-providers-apache-hive[kerberos]` (if using Kerberos) or `pip install pyhive` if not using any specific extras.","cause":"The `HiveHook` (used for HiveServer2 connections) requires the `pyhive` library, which is not a direct dependency of the provider package.","error":"ModuleNotFoundError: No module named 'pyhive'"},{"fix":"Install Apache Hive client utilities on the Airflow worker machine and ensure the `hive` executable is in the system's PATH environment variable. Alternatively, switch to using `hive_conn_id` and `HiveOperator` with a HiveServer2 connection and `pyhive`.","cause":"The `HiveCliOperator` or `HiveOperator` configured to use `hive_cli_conn_id` cannot locate the `hive` command-line interface on the Airflow worker.","error":"airflow.exceptions.AirflowException: Could not find `hive` command in the PATH. Please ensure Hive CLI is installed and configured."},{"fix":"Verify the Hive connection details (host, port, schema) in the Airflow UI. Ensure HiveServer2 is running and accessible from the Airflow worker. Check firewall rules and network connectivity. Enable debug logging for `pyhive` for more detailed connection errors.","cause":"The `HiveHook` failed to establish a connection to HiveServer2. This can be due to incorrect host/port, network issues, or an inaccessible HiveServer2.","error":"pyhive.exc.OperationalError: TTransportException: Could not connect to ..."},{"fix":"Ensure the Kerberos keytab is valid and accessible, the principal matches the service principal, and the `kinit` command can successfully obtain a ticket. Verify the Hive connection in Airflow UI has 'Auth Mechanism' set to 'Kerberos' and 'Principal' and 'Keytab Path' are correct. Install `apache-airflow-providers-apache-hive[kerberos]`.","cause":"Kerberos authentication failed when `HiveHook` tried to connect to HiveServer2. This is often due to misconfigured keytab, principal, or client environment.","error":"sqlalchemy.exc.DBAPIError: (pyhive.exc.OperationalError) TTransportException: GSS-API (or Kerberos) authentication failed"}]}