Apache Airflow Yandex Provider
The `apache-airflow-providers-yandex` package extends Apache Airflow with operators and hooks to interact with various Yandex Cloud services, including Yandex Query and Yandex Data Proc. It is an actively maintained provider, with a regular release cadence to ensure compatibility with new Airflow versions and Yandex Cloud features. The current version is 4.4.2.
Warnings
- breaking Provider version 4.4.1 and later removed `YandexCloudBaseHook.provider_user_agent` and `YandexCloudBaseHook.connection_id` parameter. The `yandex.hooks.yandexcloud_dataproc` module was also removed.
- breaking Provider versions have minimum Apache Airflow version requirements. For example, provider 2.0.0+ requires Airflow 2.1.0+, provider 3.0.0+ requires Airflow 2.2.0+, and provider 3.2.0+ requires Airflow 2.3.0+. The current version 4.4.2 requires Airflow 2.11.0+.
- gotcha When using `YandexCloudBaseHook`, non-prefixed extra fields (e.g., `folder_id`) are supported and preferred over prefixed ones (e.g., `extra__yandexcloud__folder_id`) since provider version 3.2.0.
Install
-
pip install apache-airflow-providers-yandex
Imports
- YQExecuteQueryOperator
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator
- YandexCloudBaseHook
from airflow.providers.yandex.hooks.yandexcloud import YandexCloudBaseHook
Quickstart
from __future__ import annotations
import os
import pendulum
from airflow.models.dag import DAG
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator
# Ensure you have a Yandex Cloud connection configured in Airflow UI
# with conn_id='yandexcloud_default' or specify 'yandex_conn_id' in the operator.
# Set YANDEX_CLOUD_FOLDER_ID in your Airflow connection's 'extra' field
# or as an environment variable for the operator to pick up.
with DAG(
dag_id="yandex_query_example",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["yandex", "example"],
) as dag:
execute_yq_query = YQExecuteQueryOperator(
task_id="run_simple_yandex_query",
sql="SELECT 'Hello, world!' AS message;",
# Optional: Specify a connection ID if not using 'yandexcloud_default'
# yandex_conn_id='my_yandex_cloud_connection',
# Optional: Specify a folder ID directly, or it will be picked from connection's extra or env var
# folder_id=os.environ.get('YANDEX_CLOUD_FOLDER_ID', ''),
)