{"id":7934,"library":"apache-airflow-providers-opensearch","title":"OpenSearch Airflow Provider","description":"The `apache-airflow-providers-opensearch` library is an official Apache Airflow provider package for interacting with OpenSearch. It includes operators, hooks, and sensors to integrate OpenSearch functionality directly into Airflow DAGs for tasks like data ingestion, search queries, and cluster monitoring. As part of the Apache Airflow ecosystem, it generally follows a regular release cadence, often aligning with Airflow core releases, and is currently at version 1.9.0.","status":"active","version":"1.9.0","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/opensearch","tags":["airflow","opensearch","data-pipeline","etl","provider","search-engine"],"install":[{"cmd":"pip install apache-airflow-providers-opensearch","lang":"bash","label":"Install provider"}],"dependencies":[{"reason":"This is an Airflow provider and requires Airflow to run.","package":"apache-airflow","optional":false},{"reason":"The provider uses the official Python client for OpenSearch internally.","package":"opensearch-py","optional":false}],"imports":[{"symbol":"OpenSearchHook","correct":"from airflow.providers.opensearch.hooks.opensearch import OpenSearchHook"},{"symbol":"OpenSearchOperator","correct":"from airflow.providers.opensearch.operators.opensearch import OpenSearchOperator"},{"note":"The Ingest Operator is in a separate module.","wrong":"from airflow.providers.opensearch.operators.opensearch import OpenSearchIngestOperator","symbol":"OpenSearchIngestOperator","correct":"from airflow.providers.opensearch.operators.opensearch_ingest import OpenSearchIngestOperator"},{"symbol":"OpenSearchIndexSensor","correct":"from airflow.providers.opensearch.sensors.opensearch import OpenSearchIndexSensor"}],"quickstart":{"code":"from __future__ import annotations\n\nimport pendulum\n\nfrom airflow.models.dag import DAG\nfrom airflow.providers.opensearch.operators.opensearch import OpenSearchOperator\n\nwith DAG(\n    dag_id=\"opensearch_example_dag\",\n    schedule=None,\n    start_date=pendulum.datetime(2023, 1, 1, tz=\"UTC\"),\n    catchup=False,\n    tags=[\"opensearch\", \"example\"],\n) as dag:\n    # Configure an OpenSearch connection in Airflow UI (Admin -> Connections)\n    # with conn_id='opensearch_default'. Example:\n    # Conn Id: opensearch_default\n    # Conn Type: OpenSearch\n    # Host: http://localhost\n    # Port: 9200\n    # Scheme: http (or https)\n    # Verify SSL: False (for local dev, use True in prod)\n    \n    check_opensearch_health = OpenSearchOperator(\n        task_id=\"check_opensearch_health\",\n        conn_id=\"opensearch_default\",\n        endpoint=\"_cluster/health\",\n        method=\"GET\",\n        log_response=True,\n    )\n\n    # You can also perform data ingestion\n    # ingest_document = OpenSearchOperator(\n    #     task_id=\"ingest_example_document\",\n    #     conn_id=\"opensearch_default\",\n    #     endpoint=\"my_index/_doc/1\",\n    #     method=\"POST\",\n    #     data={\"field\": \"value\"},\n    #     headers={\"Content-Type\": \"application/json\"},\n    #     log_response=True,\n    # )\n    # check_opensearch_health >> ingest_document\n","lang":"python","description":"This quickstart DAG demonstrates how to use the `OpenSearchOperator` to perform a simple GET request to the OpenSearch cluster's `_cluster/health` endpoint. Before running, ensure you have an OpenSearch connection configured in your Airflow environment with the `conn_id` set to `opensearch_default`."},"warnings":[{"fix":"Ensure the connection with the specified `conn_id` exists, has 'OpenSearch' as connection type, and correct host, port, scheme, and credentials. Test the connection in the Airflow UI.","message":"The `conn_id` for OpenSearch connections (commonly `opensearch_default`) must be correctly configured in the Airflow UI or `airflow.cfg`. Missing or incorrect connection details are a frequent cause of 'Connection not found' or network errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to the official Airflow provider documentation for compatibility matrices. Upgrade both `apache-airflow` and `apache-airflow-providers-opensearch` to compatible versions. Pin `opensearch-py` if specific version is required by your OpenSearch cluster.","message":"Changes in Airflow core or `opensearch-py` client library versions can lead to incompatibilities. Always check the provider's `min_airflow_version` and test against your specific OpenSearch cluster version.","severity":"breaking","affected_versions":"Across major Airflow and `opensearch-py` releases."},{"fix":"For development, you can set `verify_ssl=False` in the connection extra settings (though not recommended for production). For production, ensure the necessary CA certificates are installed on the Airflow worker machines or provide the certificate path in the connection configuration.","message":"SSL/TLS certificate verification errors are common when connecting to OpenSearch clusters using self-signed certificates or when the certificate chain is not properly trusted by the Airflow worker environment.","severity":"gotcha","affected_versions":"All versions, depending on environment configuration."},{"fix":"Thoroughly review your DAGs and connection configurations when migrating. Consult OpenSearch documentation for any API changes compared to Elasticsearch.","message":"The OpenSearch provider was forked from the Elasticsearch provider. While largely compatible, be aware of specific OpenSearch features or API differences. Direct migration from `apache-airflow-providers-elasticsearch` might require minor adjustments.","severity":"gotcha","affected_versions":"Users migrating from older Elasticsearch providers."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install apache-airflow-providers-opensearch` to install the provider.","cause":"The `apache-airflow-providers-opensearch` package has not been installed in your Airflow environment.","error":"ModuleNotFoundError: No module named 'airflow.providers.opensearch'"},{"fix":"Go to Airflow UI -> Admin -> Connections and create a new connection. Set 'Conn Id' to `opensearch_default` and 'Conn Type' to 'OpenSearch', then fill in the host, port, and other details for your OpenSearch instance.","cause":"Airflow cannot find a connection named 'opensearch_default' (or whatever `conn_id` you specified) in its metadata database.","error":"airflow.exceptions.AirflowException: The conn_id 'opensearch_default' is not defined"},{"fix":"Verify the OpenSearch service is running and accessible from your Airflow worker. Check the `Host` and `Port` in your OpenSearch connection configuration. Ensure no firewalls are blocking the connection.","cause":"The Airflow worker cannot establish a network connection to the specified OpenSearch host and port. This could be due to an incorrect host/port, OpenSearch not running, or firewall issues.","error":"opensearch.exceptions.ConnectionError: Connection refused"},{"fix":"Ensure the user/role associated with your OpenSearch connection has the necessary permissions (e.g., `cluster_monitor` for `_cluster/health`, `indices:data/write/document` for ingestion) in your OpenSearch Security configuration.","cause":"The OpenSearch user configured in your connection does not have sufficient permissions to perform the requested operation.","error":"opensearch.exceptions.TransportError(401, 'security_exception', 'no permissions for [cluster:monitor/health] and User [name=airflow, backend_roles=[], requestedTenant=null]')"}]}