MWAA Disaster Recovery Solution (mwaa-dr)
mwaa-dr is a Python library that provides a reusable framework for implementing disaster recovery solutions for Amazon Managed Workflows for Apache Airflow (MWAA). It simplifies the creation of Airflow DAGs for exporting and importing MWAA metadata, enabling backup and restore capabilities for critical Airflow components like variables, connections, and DAG run history. The library currently supports various MWAA versions, with the latest PyPI release being 2.2.0, and development is ongoing with updates to support newer Airflow versions.
Common errors
-
ModuleNotFoundError: No module named 'mwaa_dr.v_X_Y'
cause The Python environment where the DAG is being parsed or run does not have the `mwaa-dr` library installed, or the version-specific import path is incorrect.fixEnsure `mwaa-dr` is added to your MWAA environment's `requirements.txt` file (e.g., `mwaa-dr==2.2.0`). Verify that `X_Y` in the import statement (`from mwaa_dr.v_X_Y.dr_factory import DRFactory_X_Y`) exactly matches your MWAA environment's Apache Airflow version. -
IndexError: tuple index out of range (or similar errors during backup/restore)
cause This can occur if the default list of tables to be backed up/restored (which includes `xcom` and `task_instance`) is modified or incomplete, leading to inconsistencies. A specific bug was reported for `create_backup_dag` if `xcom` and `task_instance` are excluded.fixIf you are customizing the tables to be backed up, ensure that essential tables like `xcom` and `task_instance` are included. If experiencing this with default settings, ensure you are using the latest `mwaa-dr` version and consider reporting the issue if it persists. Review the default `setup_tables()` method in `BaseDRFactory` for your MWAA version. -
Foreign Key Violation (e.g., ForeignKeyViolation: insert or update on table 'dag_run' violates foreign key constraint 'task_instance_log_template_id_fkey')
cause Attempting to restore metadata into an MWAA database that is not empty, leading to conflicts with existing entries or their dependencies. This was specifically reported for Airflow 2.8.1.fixBefore performing a restore operation, ensure the target MWAA metadata database is clean. Use the `cleanup_metadata` DAG (created via `factory.create_cleanup_dag()`) to empty the necessary tables. Always exercise caution when running cleanup DAGs. -
An error occurred (AccessDenied) when calling the GetObject operation (or similar S3 permission errors)
cause The MWAA execution role associated with your environment lacks the necessary S3 permissions to read from or write to the configured backup S3 bucket.fixVerify that your MWAA execution role has `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject`, and `s3:ListBucket` permissions on the designated S3 backup bucket (`DR_BACKUP_BUCKET`) and its contents. -
MWAA environment stuck in 'Creating' or 'Updating' state due to networking issues.
cause Although not directly `mwaa-dr` specific, an improperly configured MWAA environment (VPC, subnets, security groups, NAT gateway, VPC endpoints) can prevent the environment from starting or updating, which in turn affects `mwaa-dr` DAG deployment and execution.fixReview MWAA networking prerequisites. For private routing, ensure necessary VPC service endpoints (S3, Monitoring, ECR) are configured. For public routing, ensure correct public/private subnet setup and internet gateway. Use the `AWSSupport-TroubleshootMWAAEnvironmentCreation` runbook if an environment is stuck during creation.
Warnings
- breaking Direct metadata database access from Airflow workers is being removed in Apache Airflow 3.x. mwaa-dr needs updates to support Airflow 3.0.
- gotcha The import path for `DRFactory` is version-specific to your MWAA/Airflow environment. Using the wrong version (e.g., `DRFactory_2_5` for an Airflow 2.10.3 environment) will lead to import errors or unexpected behavior.
- gotcha For metadata restore to work correctly, the target database usually needs to be empty to avoid foreign key constraint violations. The solution provides a `cleanup_metadata` DAG for this purpose, which should be used with extreme caution.
- gotcha Airflow variables `DR_VARIABLE_RESTORE_STRATEGY` and `DR_CONNECTION_RESTORE_STRATEGY` control how variables and connections are restored. Incorrect settings can lead to unintended overwrites or data loss, especially if using AWS Secrets Manager.
Install
-
pip install mwaa-dr -
# Add to your MWAA requirements.txt file: mwaa-dr==2.2.0
Imports
- DRFactory_X_Y
from mwaa_dr.v_X_Y.dr_factory import DRFactory_X_Y
Quickstart
import os
from airflow import DAG
from airflow.utils.dates import days_ago
from mwaa_dr.v_2_10.dr_factory import DRFactory_2_10
# Ensure DR_BACKUP_BUCKET Airflow Variable is set in your MWAA environment
# and MWAA execution role has read/write permissions on it.
# Example: DR_BACKUP_BUCKET = 'your-mwaa-backup-bucket'
# Initialize the DRFactory for your MWAA/Airflow version
# For local testing with aws-mwaa-local-runner, use storage_type='LOCAL_FS'
# and create a 'data' folder in your dags directory.
factory = DRFactory_2_10(
dag_id='backup_metadata_example',
path_prefix='data', # Relative path within the S3 bucket or local_fs
storage_type='S3' # Or 'LOCAL_FS' for local development
)
# Create a backup DAG
backup_dag: DAG = factory.create_backup_dag(
schedule_interval='@daily', # Example schedule
start_date=days_ago(1)
)
# Create a restore DAG (typically disabled by default, meant for manual trigger)
restore_dag: DAG = factory.create_restore_dag(
dag_id='restore_metadata_example',
start_date=days_ago(1),
is_paused_upon_creation=True # Recommended for restore DAGs
)
# Create a cleanup DAG (for emptying metadata tables before restore, use with caution)
cleanup_dag: DAG = factory.create_cleanup_dag(
dag_id='cleanup_metadata_example',
start_date=days_ago(1),
is_paused_upon_creation=True # Recommended for cleanup DAGs
)