Apache Airflow Yandex Provider

raw JSON →
4.4.2 verified Sun Apr 12 auth: no python

The `apache-airflow-providers-yandex` package extends Apache Airflow with operators and hooks to interact with various Yandex Cloud services, including Yandex Query and Yandex Data Proc. It is an actively maintained provider, with a regular release cadence to ensure compatibility with new Airflow versions and Yandex Cloud features. The current version is 4.4.2.

pip install apache-airflow-providers-yandex
error ImportError: cannot import name 'YandexCloudBaseHook' from 'airflow.providers.yandex.hooks.yandexcloud'
cause The 'YandexCloudBaseHook' class was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Update your code to use the appropriate replacement class or method as per the latest documentation.
error ModuleNotFoundError: No module named 'airflow.providers.yandex.operators.yandexcloud_dataproc'
cause The 'yandexcloud_dataproc' module was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Refactor your code to use the updated modules and classes provided in the latest version of the package.
error ValueError: user subnet overlaps with service network range 10.248.0.0/13, see documentation for details
cause The selected subnet's IP address range overlaps with the service subnet range '10.248.0.0/13' used by Yandex Managed Service for Apache Airflow.
fix
Choose a subnet with an IP address range that does not overlap with the service subnet range. Refer to the network requirements in the documentation.
error AttributeError: module 'airflow.providers.yandex.hooks' has no attribute 'YandexCloudBaseHook'
cause The 'YandexCloudBaseHook' class was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Update your code to use the appropriate replacement class or method as per the latest documentation.
error ImportError: cannot import name 'DataprocBaseOperator' from 'airflow.providers.yandex.operators.yandexcloud_dataproc'
cause The 'yandexcloud_dataproc' module and its classes were removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Refactor your code to use the updated modules and classes provided in the latest version of the package.
breaking Provider version 4.4.1 and later removed `YandexCloudBaseHook.provider_user_agent` and `YandexCloudBaseHook.connection_id` parameter. The `yandex.hooks.yandexcloud_dataproc` module was also removed.
fix Use `utils.user_agent.provider_user_agent` instead of `YandexCloudBaseHook.provider_user_agent`. Use `yandex_conn_id` instead of `connection_id` for hooks and operators. Update imports for `YandexCloudBaseHook` to `airflow.providers.yandex.hooks.yandexcloud`.
breaking Provider versions have minimum Apache Airflow version requirements. For example, provider 2.0.0+ requires Airflow 2.1.0+, provider 3.0.0+ requires Airflow 2.2.0+, and provider 3.2.0+ requires Airflow 2.3.0+. The current version 4.4.2 requires Airflow 2.11.0+.
fix Ensure your Apache Airflow installation meets or exceeds the minimum version required by the provider package. Upgrade Airflow if necessary, and run `airflow upgrade db` if a major Airflow version upgrade is performed.
gotcha When using `YandexCloudBaseHook`, non-prefixed extra fields (e.g., `folder_id`) are supported and preferred over prefixed ones (e.g., `extra__yandexcloud__folder_id`) since provider version 3.2.0.
fix Prefer using non-prefixed extra fields in your Airflow connection configuration for Yandex Cloud connections. For example, use `folder_id` directly in the 'Extra' JSON instead of `extra__yandexcloud__folder_id`.

This quickstart demonstrates how to use the `YQExecuteQueryOperator` to run a simple SQL query in Yandex Query. Ensure you have configured a Yandex Cloud connection in your Airflow environment. The operator will use the `yandexcloud_default` connection by default. Folder ID can be specified in the connection's extra field or as an environment variable.

from __future__ import annotations

import os

import pendulum

from airflow.models.dag import DAG
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator

# Ensure you have a Yandex Cloud connection configured in Airflow UI
# with conn_id='yandexcloud_default' or specify 'yandex_conn_id' in the operator.
# Set YANDEX_CLOUD_FOLDER_ID in your Airflow connection's 'extra' field
# or as an environment variable for the operator to pick up.

with DAG(
    dag_id="yandex_query_example",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    schedule=None,
    tags=["yandex", "example"],
) as dag:
    execute_yq_query = YQExecuteQueryOperator(
        task_id="run_simple_yandex_query",
        sql="SELECT 'Hello, world!' AS message;",
        # Optional: Specify a connection ID if not using 'yandexcloud_default'
        # yandex_conn_id='my_yandex_cloud_connection',
        # Optional: Specify a folder ID directly, or it will be picked from connection's extra or env var
        # folder_id=os.environ.get('YANDEX_CLOUD_FOLDER_ID', ''),
    )