Apache Airflow IMAP Provider
The Apache Airflow IMAP Provider enables Airflow to interact with IMAP email servers. It provides hooks and operators for tasks such as retrieving email attachments. The current version is 3.11.1 and it follows the release cadence of Apache Airflow providers, which are released regularly to support new Airflow features and address bug fixes.
Warnings
- gotcha Incorrect Airflow IMAP Connection setup: The `imap_conn_id` must refer to a correctly configured Airflow connection (Admin -> Connections). Common errors include wrong host, port, login, password, or misconfigured SSL/TLS settings (often set in the 'Extra' field for 'ssl': true/false).
- gotcha File system permissions for `target_directory`: The directory specified in `target_directory` for saving attachments must exist and be writable by the Airflow worker process executing the task. Failure to do so will result in permission errors.
- gotcha Ineffective email filtering or attachment matching: Using overly broad or incorrect `email_filter` parameters or `attachment_name` regex can lead to not finding the desired emails/attachments, or unintended retrieval.
Install
-
pip install apache-airflow-providers-imap
Imports
- IMAPHook
from airflow.providers.imap.hooks.imap import IMAPHook
- IMAPRetrieveAttachmentOperator
from airflow.providers.imap.operators.imap import IMAPRetrieveAttachmentOperator
Quickstart
import pendulum
import os
from airflow.models.dag import DAG
from airflow.providers.imap.operators.imap import IMAPRetrieveAttachmentOperator
from airflow.utils.dates import days_ago
# IMPORTANT: In a real Airflow deployment, you must configure an IMAP connection
# via the Airflow UI (Admin -> Connections) with a 'Conn Id', e.g., 'imap_default'.
# This connection should include Host, Port, Login (Username), Password, and
# potentially Extra parameters (e.g., 'ssl': True) for SSL/TLS if needed.
# For this example, we assume 'imap_default' is configured or will be.
# Using os.environ.get for sensitivity, though connection details are best in Airflow secrets backend.
IMAP_CONN_ID = os.environ.get("AIRFLOW_IMAP_CONN_ID", "imap_default")
TARGET_ATTACHMENT_DIRECTORY = os.environ.get("AIRFLOW_IMAP_TARGET_DIR", "/tmp/airflow_imap_attachments")
with DAG(
dag_id="example_imap_retrieve_attachment",
start_date=days_ago(1),
schedule=None,
catchup=False,
tags=["imap", "email", "provider"],
) as dag:
# Task to retrieve a specific PDF attachment from an IMAP server
retrieve_monthly_report = IMAPRetrieveAttachmentOperator(
task_id="retrieve_monthly_report_attachment",
imap_conn_id=IMAP_CONN_ID,
email_filter={
"FROM": "reports@company.com",
"SUBJECT": "Monthly Report",
"UNSEEN": True, # Only process unseen emails
},
attachment_name="report_.*\\.pdf", # Use regex to match files like 'report_2023_01.pdf'
target_directory=TARGET_ATTACHMENT_DIRECTORY,
check_regex=True, # Enable regex matching for attachment_name
# delete_after_fetch=True, # Uncomment to delete emails after successful retrieval
)