Apache Airflow SQLite Provider
The `apache-airflow-providers-sqlite` package provides the necessary components to interact with SQLite databases within Apache Airflow DAGs. It includes the `SqliteHook` for programmatic access and `SqliteOperator` for defining tasks. This provider is currently at version 4.3.1 and typically releases alongside major Apache Airflow provider updates.
Warnings
- gotcha SQLite database persistence and location: By default, if the `sqlite_default` connection's 'Host' or 'Extra' field is not configured with a database path, the tasks will use an in-memory SQLite database (`:memory:`). This means all data will be lost after the task completes. For persistent storage, you *must* specify a file path (e.g., `/opt/airflow/dags/data/my_db.db`) in the Airflow connection.
- gotcha Not suitable for distributed Airflow: SQLite is a file-based database. In distributed Airflow setups (e.g., using CeleryExecutor or KubernetesExecutor with multiple workers), the SQLite database file must reside on a shared, mounted filesystem accessible by all workers. Otherwise, each worker will operate on its own independent copy of the database, leading to data inconsistency. It's generally recommended only for single-node Airflow deployments or for ephemeral, task-local data.
- breaking Airflow 1.x incompatibility: This provider package (like all `apache-airflow-providers-*` packages) is designed exclusively for Apache Airflow 2.0 and later. Attempting to install or use it in an Airflow 1.x environment will result in import errors and general incompatibility.
- gotcha No SQLite Sensor: Unlike some other database providers (e.g., Postgres, MySQL), the `apache-airflow-providers-sqlite` package currently only offers `SqliteHook` and `SqliteOperator`. There is no dedicated `SqliteSensor` available for polling database states directly.
Install
-
pip install apache-airflow-providers-sqlite
Imports
- SqliteHook
from airflow.providers.sqlite.hooks.sqlite import SqliteHook
- SqliteOperator
from airflow.providers.sqlite.operators.sqlite import SqliteOperator
Quickstart
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.sqlite.operators.sqlite import SqliteOperator
with DAG(
dag_id="sqlite_example_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
schedule=None,
catchup=False,
tags=["sqlite", "example"],
) as dag:
create_table = SqliteOperator(
task_id="create_table",
sqlite_conn_id="sqlite_default",
sql="""
CREATE TABLE IF NOT EXISTS test_table (
id INTEGER PRIMARY KEY,
name TEXT
);
""",
)
insert_data = SqliteOperator(
task_id="insert_data",
sqlite_conn_id="sqlite_default",
sql="INSERT INTO test_table (name) VALUES ('Airflow'), ('SQLite');",
)
query_data = SqliteOperator(
task_id="query_data",
sqlite_conn_id="sqlite_default",
sql="SELECT * FROM test_table;",
handler=lambda x: [print(row) for row in x] # Example handler to print results to logs
)
create_table >> insert_data >> query_data