HTTP Provider for Apache Airflow
The Apache Airflow HTTP Provider enables seamless integration with HTTP APIs within Airflow DAGs. It offers operators and hooks to send HTTP requests, handle responses, and poke API endpoints. Released independently from core Airflow, it adheres to Semantic Versioning and is currently at version 6.0.1.
Warnings
- breaking Provider versions 2.0.0 and above (including 6.0.1) require Airflow 2.1.0+. This is due to the removal of the `apply_default` decorator in core Airflow. Installing a newer provider version with an older Airflow might automatically upgrade Airflow, requiring a manual `airflow upgrade db`.
- breaking Provider versions 3.0.0 and above require Airflow 2.2+. This aligns with the Apache Airflow providers support policy.
- breaking From provider version 4.0.0, `TCP_KEEPALIVE` is enabled by default for `HttpOperator`, `HttpSensor`, and `HttpHook`. This prevents firewalls from closing long-running, inactive connections.
- gotcha Configuring HTTPS via the `HttpOperator` can be counter-intuitive due to historical implementation. The operator defaults to `http` protocol.
- gotcha When using `HttpOperator` with pagination, all API responses are stored in memory and returned as a single result. This can lead to high memory and CPU consumption for large datasets.
- gotcha The minimum required Apache Airflow version for `apache-airflow-providers-http` 6.0.1 is 2.11.0.
Install
-
pip install apache-airflow-providers-http
Imports
- HttpOperator
from airflow.providers.http.operators.http import HttpOperator
- HttpSensor
from airflow.providers.http.sensors.http import HttpSensor
- HttpHook
from airflow.providers.http.hooks.http import HttpHook
Quickstart
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.http.operators.http import HttpOperator
from airflow.providers.http.sensors.http import HttpSensor
# NOTE: You need to create an HTTP connection in Airflow UI
# Admin -> Connections -> + New Record
# Conn Id: http_default
# Conn Type: HTTP
# Host: httpbin.org
# For HTTPS, see the 'Configuring HTTPS is counter-intuitive' warning.
with DAG(
dag_id="http_example_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["http", "example"],
) as dag:
# Use HttpOperator to make a GET request
http_get_task = HttpOperator(
task_id="http_get_request",
http_conn_id="http_default",
method="GET",
endpoint="get", # This will resolve to httpbin.org/get
data={"param1": "value1", "param2": "value2"},
response_check=lambda response: "param1" in response.text,
log_response=True,
)
# Use HttpSensor to wait for a specific response
http_sensor_task = HttpSensor(
task_id="http_sensor_check",
http_conn_id="http_default",
endpoint="status/200", # This will resolve to httpbin.org/status/200
response_check=lambda response: response.status_code == 200,
poke_interval=5,
timeout=60,
)
http_get_task >> http_sensor_task