Teradata Provider for Apache Airflow
The `apache-airflow-providers-teradata` package provides an official Teradata provider for Apache Airflow, enabling interaction with Teradata databases via Airflow DAGs. It includes hooks and operators for executing SQL queries and managing data. The current version is 3.5.2, and it typically releases updates aligned with Airflow's core release cycle or when significant bug fixes/features are added.
Common errors
-
ModuleNotFoundError: No module named 'teradatasql'
cause The `teradatasql` Python driver, which is a required dependency for the Teradata provider, is not installed in the Airflow environment.fixInstall the `teradatasql` package: `pip install teradatasql` (or reinstall the provider which pulls it in: `pip install apache-airflow-providers-teradata`). -
airflow.exceptions.AirflowException: The teradata_conn_id parameter is missing or invalid.
cause The `teradata_conn_id` specified in the operator or hook does not exist or is empty. This prevents Airflow from establishing a connection to the Teradata database.fixEnsure that the `teradata_conn_id` argument in your operator/hook matches an existing connection ID configured in Airflow. Verify its presence and correctness in the Airflow UI (Admin -> Connections) or environment variables. -
teradatasql.OperationalError: [HY000] [2000] Error while connecting to database. (DBMSG: [2000] [HY000] Cannot resolve host name.)
cause A generic connection error indicating issues reaching the Teradata database. Common causes include incorrect host/port, network firewalls, or the database service being down.fixDouble-check the host and port specified in your Airflow Teradata connection. Verify network connectivity from your Airflow worker to the Teradata server. Consult your network or database administrator if necessary.
Warnings
- gotcha The `teradata_conn_id` parameter in operators and hooks must correspond to an existing Teradata connection configured in Airflow. Misconfigurations (wrong host, port, credentials) are a common source of failures.
- gotcha Specific versions of the `teradatasql` Python driver might be required or recommended for certain Teradata database versions or features. Ensure compatibility, especially when encountering unexpected connection or query execution errors.
- gotcha Secure connections using SSL/TLS might require additional configuration parameters in the Airflow connection, such as `ssl_mode` or `ca_cert_path`. If not configured correctly, it can lead to connection failures.
Install
-
pip install apache-airflow-providers-teradata
Imports
- TeradataHook
from airflow.providers.teradata.hooks.teradata import TeradataHook
- TeradataOperator
from airflow.providers.teradata.operators.teradata import TeradataOperator
Quickstart
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.teradata.operators.teradata import TeradataOperator
with DAG(
dag_id="teradata_example_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["teradata", "example"],
) as dag:
# This task requires an Airflow connection named 'teradata_default'
# configured with appropriate Teradata credentials (host, port, schema, user, password).
run_teradata_query = TeradataOperator(
task_id="run_simple_query",
sql="SELECT 1;",
teradata_conn_id="teradata_default", # Ensure this connection exists in Airflow UI
)