Apache Airflow Vertica Provider

raw JSON →
4.3.2 verified Thu Apr 16 auth: no python

The `apache-airflow-providers-vertica` package is an Apache Airflow provider that enables seamless integration with Vertica databases. It offers hooks and operators to connect to Vertica and execute SQL queries within Airflow DAGs. The current version is 4.3.2 and it follows an independent semantic versioning scheme, with frequent updates that may introduce new features, bug fixes, or compatibility changes with Airflow core.

pip install apache-airflow-providers-vertica
error ModuleNotFoundError: No module named 'airflow.providers.vertica'
cause The `apache-airflow-providers-vertica` package is not installed or not available in the Python environment where Airflow components (scheduler, worker, webserver) are running.
fix
Run pip install apache-airflow-providers-vertica in the correct virtual environment or ensure it's included in your Airflow Docker image build process. Verify the installation by running pip show apache-airflow-providers-vertica.
error AttributeError: module 'airflow.contrib.hooks.vertica_hook' has no attribute 'VerticaHook'
cause This import path is from an older version of Airflow (1.x or early 2.x) and is no longer valid in modern Airflow 2+ provider packages.
fix
Update your import statements to the correct path: from airflow.providers.vertica.hooks.vertica import VerticaHook. For operators, use SQLExecuteQueryOperator from airflow.providers.common.sql.operators.sql.
error This release of provider is only available for Airflow X.Y.Z+
cause You are trying to install or use a version of `apache-airflow-providers-vertica` that requires a newer minimum version of Apache Airflow core than currently installed.
fix
Upgrade your Apache Airflow core installation to at least the version specified in the error message (e.g., pip install 'apache-airflow>=2.11.0'). Alternatively, install an older version of the Vertica provider that is compatible with your current Airflow core version.
breaking The dedicated `VerticaOperator` (`airflow.providers.vertica.operators.vertica.VerticaOperator`) was removed in version 4.0.0 of this provider.
fix Migrate to `SQLExecuteQueryOperator` from `airflow.providers.common.sql.operators.sql`. This operator provides equivalent functionality and is the recommended approach for SQL execution across many database providers.
breaking Minimum Airflow core version requirements increase with provider updates. For provider version 4.3.x, Airflow 2.11.0+ is required.
fix Ensure your Airflow core installation meets or exceeds the minimum version required by the provider. If your Airflow version is older, upgrade Airflow first before installing the latest provider.
gotcha When running multi-statement SQL queries with `split_statements=False` (default for `SQLExecuteQueryOperator`), errors after the first statement might not be detected by older versions of the `VerticaHook`, leading to partial commits without raising an exception.
fix This issue was addressed in provider versions >= 4.2.1 by a customized `fetch_all_handler` within `VerticaHook`. If on an older provider version, consider upgrading or setting `split_statements=True` for SQL operators to ensure each statement is executed and validated independently. Alternatively, explicit transaction management may be needed.
gotcha In containerized Airflow deployments (e.g., Docker, Kubernetes), simply `pip install`ing the provider in one component (like the scheduler) does not make it available to all other components (webserver, workers, etc.).
fix The provider package must be installed into the Docker image used by *all* Airflow components. Modify your `Dockerfile` or `requirements.txt` included in your image build process to ensure the provider is universally available.

This quickstart demonstrates how to define a simple Airflow DAG to interact with Vertica. It creates a table, inserts data, and then selects it, printing the results to the task logs. An Airflow connection with `Conn Id` set to `vertica_default` must be configured in the Airflow UI, providing the necessary host, port, schema, login, and password for your Vertica instance.

import pendulum

from airflow.models.dag import DAG
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator

# Ensure a Vertica connection named 'vertica_default' is configured in Airflow UI
# with host, port, schema, login, and password.

with DAG(
    dag_id="vertica_quickstart_dag",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    schedule=None,
    tags=["vertica", "example"],
) as dag:
    create_table_task = SQLExecuteQueryOperator(
        task_id="create_vertica_table",
        conn_id="vertica_default",
        sql="""CREATE TABLE IF NOT EXISTS public.test_airflow (id INT, name VARCHAR(50));""",
    )

    insert_data_task = SQLExecuteQueryOperator(
        task_id="insert_vertica_data",
        conn_id="vertica_default",
        sql="""INSERT INTO public.test_airflow (id, name) VALUES (1, 'Airflow Test');""",
    )

    select_data_task = SQLExecuteQueryOperator(
        task_id="select_vertica_data",
        conn_id="vertica_default",
        sql="""SELECT * FROM public.test_airflow;""",
        handler=lambda cursor: print(cursor.fetchall()), # Print results to task logs
    )

    create_table_task >> insert_data_task >> select_data_task