Apache Airflow JDBC Provider

raw JSON →
5.4.2 verified Thu May 14 auth: no python

The Apache Airflow JDBC Provider extends Airflow's functionality by enabling interaction with JDBC-compatible databases through specialized hooks and operators. It is part of the larger Apache Airflow ecosystem, with frequent releases that align with new Airflow versions and community contributions. The current version is 5.4.2.

pip install apache-airflow-providers-jdbc
cli airflow
error ModuleNotFoundError: No module named 'airflow.providers.jdbc'
cause The 'apache-airflow-providers-jdbc' package is not installed or not properly configured.
fix
Install the package using 'pip install apache-airflow-providers-jdbc'.
error RuntimeException: Class com.mysql.cj.jdbc.Driver not found
cause The JDBC driver class is not found because the JAR file is missing or the classpath is not set correctly.
fix
Ensure the JDBC driver JAR is placed in the correct directory and the classpath is set properly.
error The conn_id <random string> isn't defined
cause The connection ID specified in the Airflow UI does not exist or is misconfigured.
fix
Verify that the connection ID is correctly defined in Airflow's connections and that all required fields are properly set.
error java.lang.RuntimeException: driver class not found
cause The specified JDBC driver class or its corresponding JAR file is not found in the Java Classpath accessible by the Airflow environment. This often means the driver JAR is missing or the `driver_path` and `driver_class` are incorrectly configured.
fix
Ensure the JDBC driver JAR file is present in a location accessible to Airflow (e.g., in a DAGs folder or a mounted volume), and correctly specify the driver_class and driver_path in your Airflow JDBC connection's 'Extra' field or directly in your JdbcHook instantiation. Also, confirm a Java Virtual Machine (JVM) is installed and JAVA_HOME is set.
error Connection Refused / Password authentication failed for user
cause The Airflow JDBC connection parameters (host, port, username, password, or JDBC URL) are incorrect, preventing a successful connection to the database, or the database server is not running/accessible.
fix
Verify all connection details in the Airflow UI (or your connection string) including host, port, username, and password. Ensure the database server is running and network accessible from where Airflow is running.
breaking The `JdbcOperator` has been deprecated and subsequently removed from the `apache-airflow-providers-jdbc` package. Users should migrate to `airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator` for executing SQL commands via JDBC.
fix Replace `JdbcOperator` imports and usage with `SQLExecuteQueryOperator`. The parameters `sql`, `conn_id`, and `autocommit` are largely compatible.
breaking Minimum Apache Airflow version requirements have increased across provider versions. Ensure your Airflow installation meets the minimum version for the installed provider version to avoid issues related to API changes and decorator removals (e.g., `apply_default`).
fix Upgrade your Apache Airflow installation to at least the minimum required version for the `apache-airflow-providers-jdbc` package you are using. If upgrading from Airflow < 2.1.0, manually run `airflow upgrade db` after the Airflow core upgrade.
gotcha Connecting to JDBC databases requires a properly configured Java environment, including a Java Virtual Machine (JVM), the `JAVA_HOME` environment variable set, and the specific JDBC driver `.jar` file for your database.
fix Install a JVM, set the `JAVA_HOME` environment variable to its installation path, and download the appropriate JDBC driver `.jar` file. Ensure `jaydebeapi` is installed in your Airflow environment.
gotcha For security reasons, `allow_driver_class_in_extra` and `allow_driver_path_in_extra` configuration options in `airflow.cfg` (under `[providers.jdbc]`) are `False` by default. If you need to specify `driver_class` or `driver_path` in the Airflow Connection's 'Extra' field, these options must be explicitly set to `True`.
fix If safe to do so in your environment, set `allow_driver_class_in_extra = True` and/or `allow_driver_path_in_extra = True` in your `airflow.cfg` or via environment variables (e.g., `AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_CLASS_IN_EXTRA=True`). Exercise caution as enabling these allows users to specify custom drivers and paths via the UI, which could have security implications if not properly managed.
gotcha The `JdbcOperator` (when it was in use) and `SQLExecuteQueryOperator` primarily execute SQL statements and do not automatically return query results to logs or XComs. To fetch results from a `SELECT` query, you typically need to use the `JdbcHook` directly or provide a `handler` callable to the operator.
fix To fetch results, use `JdbcHook.get_pandas_df()` within a `PythonOperator` or define a `handler` function for the `SQLExecuteQueryOperator` to process the `cursor` object.
pip install apache-airflow[jdbc]
python os / libc variant status wheel install import disk mem side effects
3.10 alpine (musl) apache-airflow-providers-jdbc build_error - - - - - -
3.10 alpine (musl) apache-airflow-providers-jdbc - - - - - -
3.10 alpine (musl) jdbc build_error - - - - - -
3.10 alpine (musl) jdbc - - - - - -
3.10 slim (glibc) apache-airflow-providers-jdbc wheel 25.3s 3.34s 244M 66.1M clean
3.10 slim (glibc) apache-airflow-providers-jdbc - - 3.95s 243M 66.1M -
3.10 slim (glibc) jdbc wheel 24.8s 3.32s 244M 66.1M clean
3.10 slim (glibc) jdbc - - 3.43s 243M 66.1M -
3.11 alpine (musl) apache-airflow-providers-jdbc build_error - - - - - -
3.11 alpine (musl) apache-airflow-providers-jdbc - - - - - -
3.11 alpine (musl) jdbc build_error - - - - - -
3.11 alpine (musl) jdbc - - - - - -
3.11 slim (glibc) apache-airflow-providers-jdbc wheel 23.6s 5.00s 264M 71.6M clean
3.11 slim (glibc) apache-airflow-providers-jdbc - - 5.36s 263M 71.6M -
3.11 slim (glibc) jdbc wheel 24.0s 5.04s 264M 71.6M clean
3.11 slim (glibc) jdbc - - 5.86s 263M 71.6M -
3.12 alpine (musl) apache-airflow-providers-jdbc build_error - - - - - -
3.12 alpine (musl) apache-airflow-providers-jdbc - - - - - -
3.12 alpine (musl) jdbc build_error - - - - - -
3.12 alpine (musl) jdbc - - - - - -
3.12 slim (glibc) apache-airflow-providers-jdbc wheel 18.8s 5.28s 255M 70.3M clean
3.12 slim (glibc) apache-airflow-providers-jdbc - - 5.99s 254M 70.3M -
3.12 slim (glibc) jdbc wheel 18.8s 5.12s 255M 70.3M clean
3.12 slim (glibc) jdbc - - 5.75s 254M 70.3M -
3.13 alpine (musl) apache-airflow-providers-jdbc build_error - - - - - -
3.13 alpine (musl) apache-airflow-providers-jdbc - - - - - -
3.13 alpine (musl) jdbc build_error - - - - - -
3.13 alpine (musl) jdbc - - - - - -
3.13 slim (glibc) apache-airflow-providers-jdbc wheel 18.5s 4.83s 257M 71.0M clean
3.13 slim (glibc) apache-airflow-providers-jdbc - - 5.71s 257M 70.9M -
3.13 slim (glibc) jdbc wheel 18.8s 4.82s 257M 71.0M clean
3.13 slim (glibc) jdbc - - 6.65s 257M 70.9M -
3.9 alpine (musl) apache-airflow-providers-jdbc build_error - - - - - -
3.9 alpine (musl) apache-airflow-providers-jdbc - - - - - -
3.9 alpine (musl) jdbc build_error - - - - - -
3.9 alpine (musl) jdbc - - - - - -
3.9 slim (glibc) apache-airflow-providers-jdbc build_error - 24.2s - - - -
3.9 slim (glibc) apache-airflow-providers-jdbc - - - - - -
3.9 slim (glibc) jdbc build_error - 24.1s - - - -
3.9 slim (glibc) jdbc - - - - - -

This quickstart demonstrates how to use the `SQLExecuteQueryOperator` to interact with a JDBC-compatible database. It requires a pre-configured Airflow connection, a JVM, the `JAVA_HOME` environment variable, and the specific JDBC driver JAR file for your database. The `SQLExecuteQueryOperator` can execute single or multiple SQL queries. For fetching results, a `handler` callable can be provided. Make sure to replace placeholder values for the JDBC connection details and SQL queries with your actual data.

from __future__ import annotations

import os

from airflow.models.dag import DAG
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
from airflow.utils.dates import days_ago

with DAG(
    dag_id='example_jdbc_sql_query',
    start_date=days_ago(1),
    schedule_interval=None,
    catchup=False,
    tags=['jdbc', 'example', 'sql'],
) as dag:
    # To run this DAG, ensure you have:
    # 1. A JVM installed and JAVA_HOME environment variable set.
    # 2. The specific JDBC driver JAR file for your database available (e.g., /path/to/your/driver.jar).
    # 3. An Airflow JDBC connection configured with:
    #    - Conn Id: 'my_jdbc_connection'
    #    - Conn Type: 'JDBC Connection'
    #    - Host: 'jdbc:<vendor>://<host>:<port>/<database>' (the full JDBC URL)
    #    - Login: '<username>'
    #    - Password: '<password>'
    #    - Extra: {'driver_path': '/path/to/your/driver.jar', 'driver_class': 'com.vendor.DriverClass'}
    #    Refer to official documentation for specific database driver_class and driver_path values.

    # Example 1: Execute a SELECT query
    execute_select_query = SQLExecuteQueryOperator(
        task_id='execute_select_query',
        conn_id='my_jdbc_connection',
        sql="SELECT * FROM example_table WHERE status = 'active';",
        handler=lambda cursor: [row for row in cursor], # Example handler to fetch results
    )

    # Example 2: Execute an INSERT statement
    execute_insert_statement = SQLExecuteQueryOperator(
        task_id='execute_insert_statement',
        conn_id='my_jdbc_connection',
        sql="INSERT INTO log_table (event_time, message) VALUES (NOW(), 'Data processed successfully');",
        autocommit=True,
    )

    execute_select_query >> execute_insert_statement