Apache Airflow

3.1.8 · active · verified Sun Mar 29

Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows, particularly for data pipelines. It defines workflows as Directed Acyclic Graphs (DAGs) in Python, enabling dynamic, scalable, and extensible orchestration. The current stable version is 3.1.8, with releases occurring regularly to introduce new features, improvements, and bug fixes.

Warnings

Install

Imports

Quickstart

This quickstart defines a simple DAG with Bash and Python operators. To run this locally after installing Airflow, save the code as a `.py` file (e.g., `dags/quickstart_dag.py`) in your `AIRFLOW_HOME/dags` directory. Then, initialize the database and start the Airflow standalone environment. **To set up and run Airflow locally (assuming `apache-airflow` is installed with `sqlite` support via `pip install "apache-airflow[sqlite]"`):** ```bash # (Optional) Set AIRFLOW_HOME, e.g., to a temporary directory export AIRFLOW_HOME=$(pwd)/airflow_home # Initialize the database and create an admin user (first time only) airflow standalone # Follow prompts to set admin password. This command also starts webserver, scheduler, and triggerer. # You can also start components separately: # airflow db migrate # airflow users create --username admin --firstname Airflow --lastname Admin --role Admin --email admin@example.com -p mypassword # airflow webserver --port 8080 # airflow scheduler # airflow triggerer # After starting, visit http://localhost:8080 to enable the DAG. ``` Remember to define `AIRFLOW_HOME` before running `airflow standalone` or `airflow db init`.

import os
from datetime import datetime

from airflow.models.dag import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator


# Set AIRFLOW_HOME if not already set (e.g., in a local dev setup)
# os.environ['AIRFLOW_HOME'] = os.environ.get('AIRFLOW_HOME', '~/airflow')


def _greet(name):
    print(f"Hello, {name} from a Python task!")


with DAG(
    dag_id='simple_airflow_quickstart',
    start_date=datetime(2023, 1, 1),
    schedule_interval='@daily',
    catchup=False,
    tags=['quickstart'],
) as dag:
    start_task = BashOperator(
        task_id='start_workflow',
        bash_command='echo "Starting the workflow!"',
    )

    greet_task = PythonOperator(
        task_id='greet_with_python',
        python_callable=_greet,
        op_kwargs={'name': 'Airflow User'},
    )

    end_task = BashOperator(
        task_id='end_workflow',
        bash_command='echo "Workflow finished!"',
    )

    start_task >> greet_task >> end_task

view raw JSON →