Apache Airflow Yandex Provider

4.4.2 · active · verified Sun Apr 12

The `apache-airflow-providers-yandex` package extends Apache Airflow with operators and hooks to interact with various Yandex Cloud services, including Yandex Query and Yandex Data Proc. It is an actively maintained provider, with a regular release cadence to ensure compatibility with new Airflow versions and Yandex Cloud features. The current version is 4.4.2.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `YQExecuteQueryOperator` to run a simple SQL query in Yandex Query. Ensure you have configured a Yandex Cloud connection in your Airflow environment. The operator will use the `yandexcloud_default` connection by default. Folder ID can be specified in the connection's extra field or as an environment variable.

from __future__ import annotations

import os

import pendulum

from airflow.models.dag import DAG
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator

# Ensure you have a Yandex Cloud connection configured in Airflow UI
# with conn_id='yandexcloud_default' or specify 'yandex_conn_id' in the operator.
# Set YANDEX_CLOUD_FOLDER_ID in your Airflow connection's 'extra' field
# or as an environment variable for the operator to pick up.

with DAG(
    dag_id="yandex_query_example",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    schedule=None,
    tags=["yandex", "example"],
) as dag:
    execute_yq_query = YQExecuteQueryOperator(
        task_id="run_simple_yandex_query",
        sql="SELECT 'Hello, world!' AS message;",
        # Optional: Specify a connection ID if not using 'yandexcloud_default'
        # yandex_conn_id='my_yandex_cloud_connection',
        # Optional: Specify a folder ID directly, or it will be picked from connection's extra or env var
        # folder_id=os.environ.get('YANDEX_CLOUD_FOLDER_ID', ''),
    )

view raw JSON →