Apache Airflow Amazon Provider

9.24.0 · active · verified Mon Apr 06

The `apache-airflow-providers-amazon` package extends Apache Airflow by providing operators, hooks, and sensors for seamless integration with various Amazon Web Services (AWS). It enables orchestrating workflows involving services like Amazon S3, Redshift, EMR, SQS, and more. The provider is actively maintained by the Apache Airflow community with frequent releases, typically every few weeks.

Warnings

Install

Imports

Quickstart

This example demonstrates a basic Airflow DAG that uses the `S3CopyObjectOperator` from the Amazon provider to copy an object between two S3 buckets. Ensure you have an Airflow connection named `aws_default` configured with appropriate AWS credentials and permissions, or set AWS environment variables.

from __future__ import annotations

import os

import pendulum

from airflow.models.dag import DAG
from airflow.providers.amazon.aws.operators.s3 import S3CopyObjectOperator

# Ensure AWS credentials are configured in Airflow Connection 'aws_default'
# or via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
# For local testing, you can mock this or use dummy values if S3 interaction isn't critical.

with DAG(
    dag_id="example_s3_copy_dag",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    schedule=None,
    tags=["s3", "aws", "example"],
) as dag:
    copy_s3_object = S3CopyObjectOperator(
        task_id="copy_s3_object_task",
        source_bucket_key=f"s3://{os.environ.get('SOURCE_BUCKET', 'my-source-bucket')}/source_file.txt",
        dest_bucket_key=f"s3://{os.environ.get('DEST_BUCKET', 'my-dest-bucket')}/dest_file.txt",
        aws_conn_id="aws_default", # Assumes 'aws_default' connection is configured in Airflow
    )

view raw JSON →