Apache Airflow ODBC Provider
This provider package enables Apache Airflow to connect to various ODBC data sources, including MS SQL Server, to execute queries and perform database operations. It is released independently from Airflow core and follows semantic versioning, with major version upgrades indicating breaking changes.
Warnings
- breaking The `driver` parameter for `OdbcHook` must now be passed directly via the hook constructor or `hook_params` in SQL Operators, not through the connection's `extra` field, due to a security vulnerability fix.
- breaking This provider version has increased minimum Airflow requirements. For example, provider `3.0.0` requires Airflow `2.2+` and provider `4.12.1` requires Airflow `2.11.0+`. Older provider versions had lower Airflow requirements.
- breaking When passing keyword arguments to the ODBC connection via the `connect_kwargs` key in the connection's `extra` field (e.g., `autocommit`, `ansi`), values must now be booleans (e.g., `true`, `false`) instead of strings (`"true"`, `"false"`).
- gotcha The `apache-airflow-providers-odbc` package and its `pyodbc` dependency require system-level ODBC drivers to be installed on your Airflow worker machines. These are not installed by `pip`.
- gotcha Enabling the `allow_driver_in_extra` configuration (via `AIRFLOW__PROVIDERS_ODBC__ALLOW_DRIVER_IN_EXTRA=true` environment variable or `airflow.cfg`) is a security risk. It allows users to specify arbitrary ODBC drivers via the Airflow Connection's `extra` field, potentially leading to privilege escalation or arbitrary code execution if untrusted users can modify connections.
- gotcha Some users have reported `UnicodeDecodeError` (e.g., `'utf-8' codec can't decode byte 0x91`) when using specific ODBC drivers (like Netezza) with certain Airflow and provider versions.
Install
-
pip install apache-airflow-providers-odbc -
pip install apache-airflow[odbc]
Imports
- OdbcHook
from airflow.providers.odbc.hooks.odbc import OdbcHook
- OdbcOperator
from airflow.providers.odbc.operators.odbc import OdbcOperator
- SQLExecuteQueryOperator
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
Quickstart
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
# Ensure you have an Airflow Connection named 'my_odbc_conn'
# type: ODBC
# host: your_odbc_server
# login: your_username
# password: your_password
# extra: {"driver": "{ODBC Driver 18 for SQL Server}", "autocommit": true}
with DAG(
dag_id="odbc_example_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
schedule=None,
catchup=False,
tags=["odbc", "example"],
) as dag:
create_table = SQLExecuteQueryOperator(
task_id="create_table",
sql="""
CREATE TABLE IF NOT EXISTS my_odbc_table (
id INT IDENTITY(1,1) PRIMARY KEY,
value VARCHAR(255)
);
""",
conn_id="my_odbc_conn",
)
insert_data = SQLExecuteQueryOperator(
task_id="insert_data",
sql="""
INSERT INTO my_odbc_table (value) VALUES ('test_value_{{ ds }}');
""",
conn_id="my_odbc_conn",
)
# Note: For fetching data, you'd typically use OdbcHook in a PythonOperator
# or a custom operator, as SQLExecuteQueryOperator is for execution.
create_table >> insert_data