Apache Airflow Presto Provider

5.11.2 · active · verified Thu Apr 16

The `apache-airflow-providers-presto` package provides Apache Airflow hooks and operators to interact with Presto or Trino. It leverages the `trino` Python client for database connectivity, allowing users to execute SQL queries and manage data in these distributed query engines. The current version is 5.11.2, and providers typically follow a regular release cadence, often aligning with Apache Airflow's major releases or independent bug fix and feature updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a basic Airflow DAG that uses the `PrestoOperator` to execute SQL queries against a Presto/Trino database. It shows how to define tasks with a specified `presto_conn_id` (e.g., `presto_default`) and execute both static and templated SQL. Ensure your Airflow environment has a connection named 'presto_default' configured via the UI or environment variables for this DAG to run successfully.

import os
from datetime import datetime

from airflow.models.dag import DAG
from airflow.providers.presto.operators.presto import PrestoOperator

# For local testing, ensure 'presto_default' connection is set up in Airflow UI,
# or define it via environment variables (e.g., in a .env file or shell):
# export AIRFLOW_CONN_PRESTO_DEFAULT='presto://user:password@localhost:8080/hive/default'
# For Trino specific parameters (e.g., auth, TLS):
# export AIRFLOW_CONN_PRESTO_DEFAULT='trino://user@localhost:8080/?catalog=hive&schema=default&auth=NONE'

with DAG(
    dag_id="presto_example_dag",
    start_date=datetime(2023, 1, 1),
    schedule=None,
    catchup=False,
    tags=["presto", "trino", "example"],
) as dag:
    run_simple_query = PrestoOperator(
        task_id="execute_select_one",
        presto_conn_id="presto_default", # Ensure this connection ID exists in Airflow
        sql="SELECT 1",
    )

    run_templated_query = PrestoOperator(
        task_id="execute_templated_query",
        presto_conn_id="presto_default",
        sql="SELECT '{{ ds }}' as current_date_string, '{{ macros.uuid.uuid4() }}' as random_uuid",
    )

view raw JSON →