Apache Airflow

raw JSON →
3.1.8 verified Tue May 12 auth: no python install: reviewed

Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows, particularly for data pipelines. It defines workflows as Directed Acyclic Graphs (DAGs) in Python, enabling dynamic, scalable, and extensible orchestration. The current stable version is 3.1.8, with releases occurring regularly to introduce new features, improvements, and bug fixes.

pip install "apache-airflow[celery,cncf.kubernetes,http,postgres,amazon]"==3.1.8 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.1.8/constraints-3.10.txt"
error ModuleNotFoundError: No module named 'airflow.providers.microsoft.mssql.operators'
cause This error occurs when the 'apache-airflow-providers-microsoft-mssql' package is not installed, leading to missing modules required for Microsoft SQL Server operations.
fix
Install the missing provider package using the command: 'pip install apache-airflow-providers-microsoft-mssql'.
error pymssql._mssql.MSSQLDatabaseException: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (servername)\n')
cause This error indicates that the specified SQL Server is unreachable, possibly due to incorrect server details or network issues.
fix
Verify the server name, ensure the server is running, and check network connectivity. If using Airflow's Connections UI, include the full server name with domain under the host attribute.
error sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
cause This error arises from SQL syntax issues, often due to compatibility problems between Airflow's ORM and the SQL Server version.
fix
Ensure that the SQL statements are compatible with your SQL Server version and consider updating the ODBC driver to the latest version.
error pymssql.exceptions.OperationalError: (20002, b'DB-Lib error message 20002, severity 9:\nAdaptive Server connection failed')
cause This error suggests a failure in establishing a connection to the SQL Server, possibly due to incorrect connection parameters or server unavailability.
fix
Double-check the connection parameters, including server address, port, username, and password. Ensure the SQL Server is accessible and running.
error ModuleNotFoundError: No module named 'airflow.operators.mssql_operator'
cause This error occurs when attempting to import 'MsSqlOperator' from an incorrect module path due to changes in Airflow's module structure.
fix
Update the import statement to: 'from airflow.providers.microsoft.mssql.operators.mssql import MsSqlOperator'.
breaking Direct metadata database access from task code is restricted in Airflow 3. Tasks can no longer directly import and use Airflow database sessions or models. All runtime interactions (state transitions, heartbeats, XComs, resource fetching) must now use the dedicated Task Execution API or the official Python API Client.
fix Rewrite task code to use the Task Execution API or the Airflow Python Client for database interactions. Avoid direct SQLAlchemy imports or session usage within task logic. Consider requesting new API endpoints or Task SDK features if required functionality is missing.
breaking SubDAGs have been removed in Airflow 3. They are replaced by TaskGroups, Assets, and Data Aware Scheduling.
fix Migrate existing SubDAGs to use TaskGroups for grouping related tasks, or explore using Assets and Data Aware Scheduling for more advanced scenarios.
breaking The Sequential Executor has been removed in Airflow 3. It is replaced by the LocalExecutor, which can still be used with SQLite for local development.
fix Update Airflow configuration to use `LocalExecutor` instead of `SequentialExecutor`.
deprecated SLAs (Service Level Agreements) are deprecated and have been removed in Airflow 3. They will be replaced by forthcoming Deadline Alerts.
fix Remove SLA definitions from DAGs. Monitor for the introduction of 'Deadline Alerts' as a replacement.
gotcha Avoid using relative imports in DAG files (e.g., `from . import my_module`). The same DAG file might be parsed in different contexts (scheduler, workers, tests), leading to inconsistent behavior.
fix Always use full Python package paths for imports within DAGs. Ensure shared code is either installed as a Python package or added to `PYTHONPATH` with a unique top-level name to prevent clashes.
gotcha Do not use Airflow Variables or Connections at the top level of DAG files (i.e., outside of task `execute()` methods or Jinja templates). This can cause slow DAG parsing and unexpected behavior, as the values are fetched every time the DAG file is parsed.
fix Access Airflow Variables and Connections inside operator `execute()` methods, or pass them to operators using Jinja templating, which defers evaluation until task execution. For sensitive data, use Secrets Backend.
breaking Apache Airflow 3.x requires Python 3.10 or newer. Attempting to install Airflow 3.x on Python 3.9 or older will result in a Python version incompatibility error during package resolution.
fix Upgrade your Python environment to version 3.10 or a newer compatible version (e.g., Python 3.10, 3.11, 3.12).
breaking Installing Apache Airflow 3.x on Alpine-based Python images (e.g., `python:3.13-alpine`) fails due to missing C/C++ build tools required by dependencies like `grpcio`. Minimal Alpine images do not include these development packages by default.
fix Add C/C++ build tools to the Dockerfile before installing Python packages (e.g., `apk add --no-cache build-base g++` for Alpine), or use a non-Alpine Python base image (e.g., `python:3.13-slim`).
pip install apache-airflow==3.1.8 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.1.8/constraints-3.10.txt"
python os / libc variant status wheel install import disk
3.10 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon build_error - 0.1s - -
3.10 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.10 alpine (musl) apache-airflow==3.1.8 wheel - 4.87s 229.7M
3.10 alpine (musl) apache-airflow==3.1.8 - - 4.67s 229.7M
3.10 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon wheel 52.7s 4.13s 737M
3.10 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon - - 3.56s 737M
3.10 slim (glibc) apache-airflow==3.1.8 wheel 24.5s 3.65s 230M
3.10 slim (glibc) apache-airflow==3.1.8 - - 3.49s 230M
3.11 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon build_error - 0.1s - -
3.11 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.11 alpine (musl) apache-airflow==3.1.8 wheel - 6.42s 248.4M
3.11 alpine (musl) apache-airflow==3.1.8 - - 6.75s 248.4M
3.11 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon wheel 56.3s 6.24s 877M
3.11 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon - - 5.46s 877M
3.11 slim (glibc) apache-airflow==3.1.8 wheel 23.0s 5.89s 249M
3.11 slim (glibc) apache-airflow==3.1.8 - - 5.45s 249M
3.12 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon build_error - - - -
3.12 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.12 alpine (musl) apache-airflow==3.1.8 wheel - 5.81s 238.7M
3.12 alpine (musl) apache-airflow==3.1.8 - - 5.79s 238.7M
3.12 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon wheel 44.7s 6.28s 868M
3.12 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon - - 6.22s 868M
3.12 slim (glibc) apache-airflow==3.1.8 wheel 18.3s 6.01s 240M
3.12 slim (glibc) apache-airflow==3.1.8 - - 6.05s 240M
3.13 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon build_error - - - -
3.13 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.13 alpine (musl) apache-airflow==3.1.8 build_error - - - -
3.13 alpine (musl) apache-airflow==3.1.8 - - - -
3.13 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon build_error - 13.8s - -
3.13 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.13 slim (glibc) apache-airflow==3.1.8 build_error - 10.7s - -
3.13 slim (glibc) apache-airflow==3.1.8 - - - -
3.9 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon build_error - - - -
3.9 alpine (musl) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.9 alpine (musl) apache-airflow==3.1.8 build_error - - - -
3.9 alpine (musl) apache-airflow==3.1.8 - - - -
3.9 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon build_error - 2.1s - -
3.9 slim (glibc) celery,cncf.kubernetes,http,postgres,amazon - - - -
3.9 slim (glibc) apache-airflow==3.1.8 build_error - 2.3s - -
3.9 slim (glibc) apache-airflow==3.1.8 - - - -

This quickstart defines a simple DAG with Bash and Python operators. To run this locally after installing Airflow, save the code as a `.py` file (e.g., `dags/quickstart_dag.py`) in your `AIRFLOW_HOME/dags` directory. Then, initialize the database and start the Airflow standalone environment. **To set up and run Airflow locally (assuming `apache-airflow` is installed with `sqlite` support via `pip install "apache-airflow[sqlite]"`):** ```bash # (Optional) Set AIRFLOW_HOME, e.g., to a temporary directory export AIRFLOW_HOME=$(pwd)/airflow_home # Initialize the database and create an admin user (first time only) airflow standalone # Follow prompts to set admin password. This command also starts webserver, scheduler, and triggerer. # You can also start components separately: # airflow db migrate # airflow users create --username admin --firstname Airflow --lastname Admin --role Admin --email admin@example.com -p mypassword # airflow webserver --port 8080 # airflow scheduler # airflow triggerer # After starting, visit http://localhost:8080 to enable the DAG. ``` Remember to define `AIRFLOW_HOME` before running `airflow standalone` or `airflow db init`.

import os
from datetime import datetime

from airflow.models.dag import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator


# Set AIRFLOW_HOME if not already set (e.g., in a local dev setup)
# os.environ['AIRFLOW_HOME'] = os.environ.get('AIRFLOW_HOME', '~/airflow')


def _greet(name):
    print(f"Hello, {name} from a Python task!")


with DAG(
    dag_id='simple_airflow_quickstart',
    start_date=datetime(2023, 1, 1),
    schedule_interval='@daily',
    catchup=False,
    tags=['quickstart'],
) as dag:
    start_task = BashOperator(
        task_id='start_workflow',
        bash_command='echo "Starting the workflow!"',
    )

    greet_task = PythonOperator(
        task_id='greet_with_python',
        python_callable=_greet,
        op_kwargs={'name': 'Airflow User'},
    )

    end_task = BashOperator(
        task_id='end_workflow',
        bash_command='echo "Workflow finished!"',
    )

    start_task >> greet_task >> end_task