Apache Airflow Vertica Provider
The `apache-airflow-providers-vertica` package is an Apache Airflow provider that enables seamless integration with Vertica databases. It offers hooks and operators to connect to Vertica and execute SQL queries within Airflow DAGs. The current version is 4.3.2 and it follows an independent semantic versioning scheme, with frequent updates that may introduce new features, bug fixes, or compatibility changes with Airflow core.
Common errors
-
ModuleNotFoundError: No module named 'airflow.providers.vertica'
cause The `apache-airflow-providers-vertica` package is not installed or not available in the Python environment where Airflow components (scheduler, worker, webserver) are running.fixRun `pip install apache-airflow-providers-vertica` in the correct virtual environment or ensure it's included in your Airflow Docker image build process. Verify the installation by running `pip show apache-airflow-providers-vertica`. -
AttributeError: module 'airflow.contrib.hooks.vertica_hook' has no attribute 'VerticaHook'
cause This import path is from an older version of Airflow (1.x or early 2.x) and is no longer valid in modern Airflow 2+ provider packages.fixUpdate your import statements to the correct path: `from airflow.providers.vertica.hooks.vertica import VerticaHook`. For operators, use `SQLExecuteQueryOperator` from `airflow.providers.common.sql.operators.sql`. -
This release of provider is only available for Airflow X.Y.Z+
cause You are trying to install or use a version of `apache-airflow-providers-vertica` that requires a newer minimum version of Apache Airflow core than currently installed.fixUpgrade your Apache Airflow core installation to at least the version specified in the error message (e.g., `pip install 'apache-airflow>=2.11.0'`). Alternatively, install an older version of the Vertica provider that is compatible with your current Airflow core version.
Warnings
- breaking The dedicated `VerticaOperator` (`airflow.providers.vertica.operators.vertica.VerticaOperator`) was removed in version 4.0.0 of this provider.
- breaking Minimum Airflow core version requirements increase with provider updates. For provider version 4.3.x, Airflow 2.11.0+ is required.
- gotcha When running multi-statement SQL queries with `split_statements=False` (default for `SQLExecuteQueryOperator`), errors after the first statement might not be detected by older versions of the `VerticaHook`, leading to partial commits without raising an exception.
- gotcha In containerized Airflow deployments (e.g., Docker, Kubernetes), simply `pip install`ing the provider in one component (like the scheduler) does not make it available to all other components (webserver, workers, etc.).
Install
-
pip install apache-airflow-providers-vertica
Imports
- VerticaHook
from airflow.providers.vertica.hooks.vertica import VerticaHook
- SQLExecuteQueryOperator
from airflow.providers.vertica.operators.vertica import VerticaOperator
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
Quickstart
import pendulum
from airflow.models.dag import DAG
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
# Ensure a Vertica connection named 'vertica_default' is configured in Airflow UI
# with host, port, schema, login, and password.
with DAG(
dag_id="vertica_quickstart_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["vertica", "example"],
) as dag:
create_table_task = SQLExecuteQueryOperator(
task_id="create_vertica_table",
conn_id="vertica_default",
sql="""CREATE TABLE IF NOT EXISTS public.test_airflow (id INT, name VARCHAR(50));""",
)
insert_data_task = SQLExecuteQueryOperator(
task_id="insert_vertica_data",
conn_id="vertica_default",
sql="""INSERT INTO public.test_airflow (id, name) VALUES (1, 'Airflow Test');""",
)
select_data_task = SQLExecuteQueryOperator(
task_id="select_vertica_data",
conn_id="vertica_default",
sql="""SELECT * FROM public.test_airflow;""",
handler=lambda cursor: print(cursor.fetchall()), # Print results to task logs
)
create_table_task >> insert_data_task >> select_data_task