Apache Airflow SSH Provider

raw JSON →
4.3.3 verified Wed May 13 auth: no python install: reviewed

This package provides operators, hooks, and sensors for interacting with SSH, SFTP, and SCP within Apache Airflow DAGs. It enables automation of tasks on remote servers via SSH protocol, including command execution and file transfers, supporting various authentication methods. The current version is 4.3.3, and its release cadence is tied to the broader Apache Airflow provider release schedule, with frequent updates.

pip install apache-airflow-providers-ssh
error ModuleNotFoundError: No module named 'airflow.providers.ssh'
cause The 'apache-airflow-providers-ssh' package is not installed.
fix
Install the package using 'pip install apache-airflow-providers-ssh'.
error SSH command timed out
cause The SSH command execution exceeded the default timeout limit.
fix
Increase the 'conn_timeout' parameter in the SSH connection settings.
error No module named 'airflow.providers.ssh' on AWS Airflow (Amazon MWAA)
cause The 'apache-airflow-providers-ssh' package is not included in the Amazon MWAA environment.
fix
Add 'apache-airflow-providers-ssh' to the 'requirements.txt' file and update the MWAA environment.
error Unable to create new SSH connection in Apache Airflow (AWS managed MWAA)
cause The 'SSH' connection type is not available in the Airflow UI due to missing dependencies.
fix
Ensure 'apache-airflow-providers-ssh' is listed in 'requirements.txt' and the MWAA environment is updated.
error airflow.exceptions.AirflowException: SSH command timed out
cause This error indicates that the SSH command executed by the `SSHOperator` or `SSHHook` took longer than the configured `cmd_timeout` (defaulting to 10 seconds) or `conn_timeout` for establishing the connection.
fix
Increase the cmd_timeout parameter in your SSHOperator or SSHHook definition, or set it to None for no timeout, based on the expected duration of your remote command. For connection issues, adjust conn_timeout. Example: SSHOperator(task_id='my_ssh_task', ssh_conn_id='ssh_default', command='long_running_script.sh', cmd_timeout=300) or SSHHook(ssh_conn_id='ssh_default', cmd_timeout=None).
breaking Provider version 4.0.0 introduced significant breaking changes by removing many previously deprecated features. Key changes include `SSHHook.timeout` removal (use `conn_timeout`), `SSHHook.create_tunnel()` being deprecated in favor of `get_tunnel()` with altered parameters, `SSHOperator.get_hook()` removed (use `hook` attribute), and `SSHOperator.exec_ssh_client_command()` removed (call `ssh_hook.exec_ssh_client_command()` directly). The minimum supported Airflow version was also bumped to 2.9.0.
fix Review changelog for 4.0.0 and update code to use recommended methods and parameters (e.g., `conn_timeout` instead of `timeout`, `hook` attribute instead of `get_hook()`). Ensure your Airflow environment is on version 2.9.0 or higher.
breaking Minimum Apache Airflow version requirements have consistently increased with provider updates. Version 4.1.0 requires Airflow 2.10.0+, version 4.2.0 and 4.3.0 require Airflow 2.11.0+. Additionally, Python 3.9 support was dropped in provider version 4.1.1.
fix Upgrade your Apache Airflow installation to at least 2.11.0 and ensure your Python environment is 3.10 or newer for provider versions 4.2.0+.
gotcha When configuring SSH connections, pay close attention to host key verification settings. By default, `no_host_key_check` is `true`, meaning new host keys are automatically added to `known_hosts`. However, `allow_host_key_change` is `false` by default, preventing connections if the host key changes. For robust production environments, consider strict host key checking.
fix Explicitly configure `no_host_key_check` and `allow_host_key_change` parameters in your Airflow SSH connection (often via 'Extra' JSON) to match your security policy. Consider using `host_key` in the connection extra to pin a specific host key.
gotcha If the SSH connection type does not appear in the Airflow UI's 'Admin -> Connections -> Add new record' dropdown after installation, it might indicate an issue with Airflow environment refresh or installation path.
fix Ensure the provider package is installed in the same Python environment as your Airflow installation. Restart Airflow scheduler and webserver components. Verify installation using `airflow providers list` or `pip list` in your Airflow environment.
breaking The `schedule_interval` parameter in `DAG` definitions was removed in Apache Airflow 2.2 (replaced by `schedule`). Using `schedule_interval` with Airflow versions 2.2 and newer will cause a `TypeError`. Additionally, importing `DAG` directly from `airflow` is deprecated and will be removed in future versions; it should be imported from `airflow.sdk`.
fix Update your DAG definitions to use the `schedule` parameter instead of `schedule_interval`. For example, change `schedule_interval=None` to `schedule=None` or `schedule='@daily'` to `schedule='@daily'`. Also, update your import statements from `from airflow import DAG` to `from airflow.sdk import DAG`.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) wheel - 4.38s 254.1M 67.1M clean
3.10 alpine (musl) - - 4.71s 247.1M 67.0M -
3.10 slim (glibc) wheel 25.5s 3.20s 252M 67.1M clean
3.10 slim (glibc) - - 3.36s 245M 67.0M -
3.11 alpine (musl) wheel - 5.68s 275.3M 72.8M clean
3.11 alpine (musl) - - 6.47s 267.5M 72.7M -
3.11 slim (glibc) wheel 23.9s 5.04s 273M 72.8M clean
3.11 slim (glibc) - - 5.22s 266M 72.7M -
3.12 alpine (musl) wheel - 5.24s 265.1M 71.5M clean
3.12 alpine (musl) - - 5.77s 257.6M 71.4M -
3.12 slim (glibc) wheel 18.5s 5.22s 264M 71.5M clean
3.12 slim (glibc) - - 5.70s 257M 71.4M -
3.13 alpine (musl) wheel - 4.77s 266.9M 72.2M clean
3.13 alpine (musl) - - 5.29s 259.3M 72.0M -
3.13 slim (glibc) wheel 18.3s 4.79s 266M 72.2M clean
3.13 slim (glibc) - - 5.51s 259M 72.0M -
3.9 alpine (musl) sdist - - 218.7M - broken
3.9 alpine (musl) - - - - - -
3.9 slim (glibc) wheel 29.4s - 214M - broken
3.9 slim (glibc) - - - - - -

This example demonstrates a basic DAG using the `SSHOperator` to connect to a remote server and execute a shell command. Before running, configure an 'SSH' connection in your Airflow UI (Admin -> Connections) with `Conn Id` as `ssh_default`, providing `Host`, `Login (Username)`, and `Port`. For authentication, you can specify `Password`, `Key File` path, or `Private Key` content in the 'Extra' field as a JSON object (e.g., `{"key_file": "/path/to/your/key.pem"}`).

import os
from datetime import datetime
from airflow.models.dag import DAG
from airflow.providers.ssh.operators.ssh import SSHOperator

# For local testing, ensure you have an 'ssh_default' connection configured in Airflow UI
# or via an environment variable, e.g., AIRFLOW_CONN_SSH_DEFAULT=ssh://user@hostname:22/?key_file=/path/to/key
# For this example, we'll mock the connection_id.

# Example of setting connection details via environment variable for local execution
# os.environ['AIRFLOW_CONN_SSH_DEFAULT'] = 'ssh://your_user@your_host:22'
# If using a private key file:
# os.environ['AIRFLOW_CONN_SSH_DEFAULT'] = 'ssh://your_user@your_host:22?key_file=/path/to/your/private_key.pem'

with DAG(
    dag_id='ssh_operator_quickstart',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False,
    tags=['ssh', 'example'],
) as dag:
    ssh_task = SSHOperator(
        task_id='run_remote_command',
        ssh_conn_id='ssh_default',  # Ensure this connection exists in Airflow
        command='echo "Hello from remote host $(hostname)" && ls -l',
        cmd_timeout=10,  # Timeout for the command execution
    )