Apache Airflow Amazon Provider
The `apache-airflow-providers-amazon` package extends Apache Airflow by providing operators, hooks, and sensors for seamless integration with various Amazon Web Services (AWS). It enables orchestrating workflows involving services like Amazon S3, Redshift, EMR, SQS, and more. The provider is actively maintained by the Apache Airflow community with frequent releases, typically every few weeks.
Warnings
- breaking Provider versions 9.x and above (including 9.24.0) require Apache Airflow version 2.11.0 or greater. Attempting to install on older Airflow versions will lead to compatibility issues or automatic Airflow upgrades.
- breaking For provider versions 3.0.0 and above, the `CloudFormationCreateStackOperator` and `CloudFormationDeleteStackOperator` had their `params` argument renamed to `cloudformation_parameters`. This was to avoid a clash with Airflow 2.2+'s DAG `params` field.
- breaking In provider version 9.24.0, the deprecated `delegate_to` parameter has been removed from `GCSToS3Operator`, `GlacierToGCSOperator`, and `GoogleApiToS3Operator`.
- deprecated The use of `aws_access_key_id` and `aws_secret_access_key` directly in AWS connection `extra` fields is being deprecated in favor of `endpoint_url` for some connections.
- gotcha When running on Amazon MWAA with Airflow v2.7.2+, your `requirements.txt` file should include a `--constraint` statement. If omitted, MWAA will apply a default constraint file, which might not align with desired provider versions.
Install
-
pip install apache-airflow-providers-amazon -
pip install 'apache-airflow-providers-amazon[s3fs,cncf.kubernetes]' # Example with optional extras
Imports
- S3Hook
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
- S3CopyObjectOperator
from airflow.providers.amazon.aws.operators.s3 import S3CopyObjectOperator
- S3KeySensor
from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor
- EcsOperator
from airflow.providers.amazon.aws.operators.ecs import EcsOperator
Quickstart
from __future__ import annotations
import os
import pendulum
from airflow.models.dag import DAG
from airflow.providers.amazon.aws.operators.s3 import S3CopyObjectOperator
# Ensure AWS credentials are configured in Airflow Connection 'aws_default'
# or via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
# For local testing, you can mock this or use dummy values if S3 interaction isn't critical.
with DAG(
dag_id="example_s3_copy_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["s3", "aws", "example"],
) as dag:
copy_s3_object = S3CopyObjectOperator(
task_id="copy_s3_object_task",
source_bucket_key=f"s3://{os.environ.get('SOURCE_BUCKET', 'my-source-bucket')}/source_file.txt",
dest_bucket_key=f"s3://{os.environ.get('DEST_BUCKET', 'my-dest-bucket')}/dest_file.txt",
aws_conn_id="aws_default", # Assumes 'aws_default' connection is configured in Airflow
)