Apache Airflow Provider for ArangoDB

raw JSON →
2.9.4 verified Mon Apr 27 auth: no python

Apache Airflow provider package to integrate with ArangoDB. Version 2.9.4 supports Airflow 2.x and requires Python >=3.10. Allows use of ArangoDB as a metadata database and provides ArangoDBHook and operators.

pip install apache-airflow-providers-arangodb
error ModuleNotFoundError: No module named 'airflow.providers.arangodb'
cause Provider not installed or Airflow not restarted after install.
fix
Run 'pip install apache-airflow-providers-arangodb' and restart Airflow webserver/scheduler.
error airflow.exceptions.AirflowException: The conn_id `arangodb_default` isn't defined
cause Connection not configured in Airflow.
fix
Create an ArangoDB connection via Admin UI: Admin > Connections > Add, set Conn Id: arangodb_default, Conn Type: ArangoDB, Host, Login, Password, Port (8529), Extra: {"database": "your_database"}.
error TypeError: Object of type Cursor is not JSON serializable
cause ArangoDBOperator returns a python-arango Cursor, which Airflow tries to serialize for XCom.
fix
Use result_processor to convert result to a list/dict: result_processor=lambda res: list(res).
breaking Provider dropped support for Python 3.9 in version 2.8.0. Upgrade to Python >=3.10.
fix Use Python 3.10+ (recommended 3.11).
gotcha ArangoDBHook uses python-arango library. Connection URI format: 'arangodb://user:pass@host:port/db' or via Airflow connection extra fields including 'host', 'port', 'username', 'password', 'database'.
fix Set connection in Airflow UI as type 'ArangoDB', host/port/login/password, and extra '{"database": "your_db"}'.
gotcha ArangoDBOperator returns raw python-arango query result; result_processor is called if provided. If not, result is pushed to XCom as a dict.
fix Use result_processor to handle or transform result.

Create a DAG that executes an AQL query using ArangoDBOperator. Ensure connection is configured.

from airflow import DAG
from airflow.providers.arangodb.operators.arangodb import ArangoDBOperator
from datetime import datetime

with DAG(dag_id='arangodb_example', start_date=datetime(2024,1,1), schedule=None, catchup=False) as dag:
    query = ArangoDBOperator(
        task_id='run_query',
        arangodb_conn_id='arangodb_default',
        query='FOR doc IN collection RETURN doc',
        result_processor=lambda results: print(results),
        do_xcom_push=True
    )