Apache Airflow Provider for ArangoDB
raw JSON → 2.9.4 verified Mon Apr 27 auth: no python
Apache Airflow provider package to integrate with ArangoDB. Version 2.9.4 supports Airflow 2.x and requires Python >=3.10. Allows use of ArangoDB as a metadata database and provides ArangoDBHook and operators.
pip install apache-airflow-providers-arangodb Common errors
error ModuleNotFoundError: No module named 'airflow.providers.arangodb' ↓
cause Provider not installed or Airflow not restarted after install.
fix
Run 'pip install apache-airflow-providers-arangodb' and restart Airflow webserver/scheduler.
error airflow.exceptions.AirflowException: The conn_id `arangodb_default` isn't defined ↓
cause Connection not configured in Airflow.
fix
Create an ArangoDB connection via Admin UI: Admin > Connections > Add, set Conn Id: arangodb_default, Conn Type: ArangoDB, Host, Login, Password, Port (8529), Extra: {"database": "your_database"}.
error TypeError: Object of type Cursor is not JSON serializable ↓
cause ArangoDBOperator returns a python-arango Cursor, which Airflow tries to serialize for XCom.
fix
Use result_processor to convert result to a list/dict: result_processor=lambda res: list(res).
Warnings
breaking Provider dropped support for Python 3.9 in version 2.8.0. Upgrade to Python >=3.10. ↓
fix Use Python 3.10+ (recommended 3.11).
gotcha ArangoDBHook uses python-arango library. Connection URI format: 'arangodb://user:pass@host:port/db' or via Airflow connection extra fields including 'host', 'port', 'username', 'password', 'database'. ↓
fix Set connection in Airflow UI as type 'ArangoDB', host/port/login/password, and extra '{"database": "your_db"}'.
gotcha ArangoDBOperator returns raw python-arango query result; result_processor is called if provided. If not, result is pushed to XCom as a dict. ↓
fix Use result_processor to handle or transform result.
Imports
- ArangoDBHook wrong
from airflow.providers.arangodb.hooks.arango_hook import ArangoDBHookcorrectfrom airflow.providers.arangodb.hooks.arangodb import ArangoDBHook
Quickstart
from airflow import DAG
from airflow.providers.arangodb.operators.arangodb import ArangoDBOperator
from datetime import datetime
with DAG(dag_id='arangodb_example', start_date=datetime(2024,1,1), schedule=None, catchup=False) as dag:
query = ArangoDBOperator(
task_id='run_query',
arangodb_conn_id='arangodb_default',
query='FOR doc IN collection RETURN doc',
result_processor=lambda results: print(results),
do_xcom_push=True
)