Apache Airflow Yandex Provider
raw JSON → 4.4.2 verified Sun Apr 12 auth: no python
The `apache-airflow-providers-yandex` package extends Apache Airflow with operators and hooks to interact with various Yandex Cloud services, including Yandex Query and Yandex Data Proc. It is an actively maintained provider, with a regular release cadence to ensure compatibility with new Airflow versions and Yandex Cloud features. The current version is 4.4.2.
pip install apache-airflow-providers-yandex Common errors
error ImportError: cannot import name 'YandexCloudBaseHook' from 'airflow.providers.yandex.hooks.yandexcloud' ↓
cause The 'YandexCloudBaseHook' class was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Update your code to use the appropriate replacement class or method as per the latest documentation.
error ModuleNotFoundError: No module named 'airflow.providers.yandex.operators.yandexcloud_dataproc' ↓
cause The 'yandexcloud_dataproc' module was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Refactor your code to use the updated modules and classes provided in the latest version of the package.
error ValueError: user subnet overlaps with service network range 10.248.0.0/13, see documentation for details ↓
cause The selected subnet's IP address range overlaps with the service subnet range '10.248.0.0/13' used by Yandex Managed Service for Apache Airflow.
fix
Choose a subnet with an IP address range that does not overlap with the service subnet range. Refer to the network requirements in the documentation.
error AttributeError: module 'airflow.providers.yandex.hooks' has no attribute 'YandexCloudBaseHook' ↓
cause The 'YandexCloudBaseHook' class was removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Update your code to use the appropriate replacement class or method as per the latest documentation.
error ImportError: cannot import name 'DataprocBaseOperator' from 'airflow.providers.yandex.operators.yandexcloud_dataproc' ↓
cause The 'yandexcloud_dataproc' module and its classes were removed in version 4.0.0 of the 'apache-airflow-providers-yandex' package.
fix
Refactor your code to use the updated modules and classes provided in the latest version of the package.
Warnings
breaking Provider version 4.4.1 and later removed `YandexCloudBaseHook.provider_user_agent` and `YandexCloudBaseHook.connection_id` parameter. The `yandex.hooks.yandexcloud_dataproc` module was also removed. ↓
fix Use `utils.user_agent.provider_user_agent` instead of `YandexCloudBaseHook.provider_user_agent`. Use `yandex_conn_id` instead of `connection_id` for hooks and operators. Update imports for `YandexCloudBaseHook` to `airflow.providers.yandex.hooks.yandexcloud`.
breaking Provider versions have minimum Apache Airflow version requirements. For example, provider 2.0.0+ requires Airflow 2.1.0+, provider 3.0.0+ requires Airflow 2.2.0+, and provider 3.2.0+ requires Airflow 2.3.0+. The current version 4.4.2 requires Airflow 2.11.0+. ↓
fix Ensure your Apache Airflow installation meets or exceeds the minimum version required by the provider package. Upgrade Airflow if necessary, and run `airflow upgrade db` if a major Airflow version upgrade is performed.
gotcha When using `YandexCloudBaseHook`, non-prefixed extra fields (e.g., `folder_id`) are supported and preferred over prefixed ones (e.g., `extra__yandexcloud__folder_id`) since provider version 3.2.0. ↓
fix Prefer using non-prefixed extra fields in your Airflow connection configuration for Yandex Cloud connections. For example, use `folder_id` directly in the 'Extra' JSON instead of `extra__yandexcloud__folder_id`.
Imports
- YQExecuteQueryOperator
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator - YandexCloudBaseHook wrong
from yandex.hooks.yandexcloud_dataproc import YandexCloudBaseHookcorrectfrom airflow.providers.yandex.hooks.yandexcloud import YandexCloudBaseHook
Quickstart
from __future__ import annotations
import os
import pendulum
from airflow.models.dag import DAG
from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator
# Ensure you have a Yandex Cloud connection configured in Airflow UI
# with conn_id='yandexcloud_default' or specify 'yandex_conn_id' in the operator.
# Set YANDEX_CLOUD_FOLDER_ID in your Airflow connection's 'extra' field
# or as an environment variable for the operator to pick up.
with DAG(
dag_id="yandex_query_example",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["yandex", "example"],
) as dag:
execute_yq_query = YQExecuteQueryOperator(
task_id="run_simple_yandex_query",
sql="SELECT 'Hello, world!' AS message;",
# Optional: Specify a connection ID if not using 'yandexcloud_default'
# yandex_conn_id='my_yandex_cloud_connection',
# Optional: Specify a folder ID directly, or it will be picked from connection's extra or env var
# folder_id=os.environ.get('YANDEX_CLOUD_FOLDER_ID', ''),
)