Apache Airflow Provider for YDB

raw JSON →
2.5.2 verified Sat May 09 auth: no python

Apache Airflow provider that enables integration with Yandex Database (YDB). Current version is 2.5.2, supports Airflow 2.8+ and Python 3.10+. Released on a monthly cadence alongside Airflow.

pip install apache-airflow-providers-ydb
error airflow.exceptions.AirflowException: Failed to create YDB driver. Check your connection parameters.
cause Connection 'ydb_default' is missing required extra fields 'endpoint' or 'database'.
fix
In Airflow UI, edit YDB connection and set Extra to: {"endpoint": "grpcs://ydb.serverless.yandexcloud.net:2135", "database": "/ru-central1/b1g..."}
error ModuleNotFoundError: No module named 'ydb'
cause YDB SDK is not installed. The provider does not include it automatically.
fix
Run: pip install ydb
error TypeError: YdbOperator.__init__() got an unexpected keyword argument 'sql'
cause Wrong version of Airflow provider package (older than v2.0.0) where operator used 'query' parameter.
fix
Upgrade to >=2.0.0: pip install -U apache-airflow-providers-ydb. Use 'sql' parameter.
error ydb.issues.Unavailable: cannot connect to database, session is not ready
cause Connection timeout or network issue; often due to missing TLS certificate or wrong endpoint.
fix
Ensure endpoint is reachable and correct. If using self-signed certs, set extra: {"ssl_verify": false}.
breaking In version 2.0.0, connection parameters changed from token-based to connection string. Existing connections must be updated.
fix Update Airflow connection to use 'extra' field with 'endpoint' and 'database' keys instead of 'token'.
deprecated YdbToDWHOperator is deprecated as of v2.5.0, use YdbToClickhouseOperator or custom transfer.
fix Migrate to YdbToClickhouseOperator for ClickHouse integration.
gotcha SSL verification is enabled by default; internal clusters may need SSL disabled in connection extra.
fix Set 'ssl_verify': 'false' in the connection's extra JSON if using self-signed certificates.

Create a DAG that executes a YDB query using the YdbOperator. Requires an Airflow connection 'ydb_default' configured with YDB endpoint and database.

from datetime import datetime
from airflow import DAG
from airflow.providers.ydb.operators.ydb import YdbOperator

default_args = {\n    'start_date': datetime(2024, 1, 1),\n    'conn_id': 'ydb_default'\n}

with DAG('ydb_dag', default_args=default_args, schedule_interval=None) as dag:\n    task = YdbOperator(
        task_id='execute_query',
        sql='SELECT 1;',
        ydb_conn_id='ydb_default'
    )