Databand (dbnd)
Databand (dbnd) is a Python library for MLOps, providing orchestration, monitoring, and debugging capabilities for data pipelines. It allows users to define ML tasks and pipelines, track metadata, and integrate with orchestrators like Apache Airflow. The current version is 1.0.34.1, with releases occurring periodically, though core independent development appears to have slowed following an acquisition by IBM.
Common errors
-
ModuleNotFoundError: No module named 'dbnd_airflow'
cause Attempting to use Databand features designed for Apache Airflow integration without installing the `dbnd-airflow` package (part of the `dbnd[airflow]` extra).fixInstall the Airflow integration package: `pip install dbnd[airflow]` -
No tracking server configured and console tracking is disabled. To enable console tracking, add "dbnd__core__tracker=console" to your configuration.
cause Databand requires a tracking method (e.g., a remote server API or local console logging) to be explicitly configured. This error indicates neither was found.fixEnable console tracking via an environment variable: `export DBND__CORE__TRACKER=console` or by creating a `dbnd.cfg` file with `[core] tracker=console`. -
Cannot connect to Databand tracking server at <URL>. Please check your network connection and server availability. (Error: HTTPConnectionPool...)
cause The `DBND__CORE__DATABAND_URL` is incorrectly set, the tracking server is not running, or there's a network connectivity issue preventing `dbnd` from reaching it.fixVerify the `DBND__CORE__DATABAND_URL` and `DBND__CORE__DATABAND_ACCESS_TOKEN` environment variables or `dbnd.cfg` settings. Ensure the Databand tracking server is operational and accessible from your environment.
Warnings
- gotcha The `dbnd` library is currently in maintenance mode, with primary development efforts likely redirected towards IBM's enterprise MLOps offerings (Watsonx.data). While actively maintained, independent feature development may be limited.
- gotcha Full integration with orchestrators like Apache Airflow requires installing the optional `dbnd[airflow]` extra package, which adds necessary dependencies and plugins.
- gotcha Databand's configuration system allows settings via environment variables (e.g., `DBND__CORE__TRACKER`), `dbnd.cfg` files, and programmatic `dbnd_context`. Inconsistent or missing configuration can lead to jobs not being tracked or failing.
- gotcha Running `dbnd` pipelines can be done either directly as Python scripts (wrapped in `dbnd_context`) or via the `dbnd run <pipeline_name>` CLI command. The CLI provides more robust control over execution and integration with tracking.
Install
-
pip install dbnd -
pip install dbnd[airflow]
Imports
- task
from dbnd import task
- pipeline
from dbnd import pipeline
- dbnd_context
from dbnd import dbnd_context
- band
from dbnd import band
Quickstart
import os
from dbnd import task, pipeline, dbnd_context
# Configure DBND to log to console for demonstration
# In a real scenario, you'd typically integrate with a Databand tracking server
os.environ['DBND__CORE__TRACKER'] = 'console' # Ensure console logging is enabled
os.environ['DBND__CORE__DATABAND_URL'] = os.environ.get('DATABAND_URL', 'http://localhost:8080') # Placeholder
os.environ['DBND__CORE__DATABAND_ACCESS_TOKEN'] = os.environ.get('DATABAND_ACCESS_TOKEN', 'YOUR_API_KEY') # Placeholder
@task
def calculate_alpha(alpha: float) -> float:
print(f"Alpha is: {alpha}")
return alpha * 2
@pipeline
def alpha_pipeline() -> float:
val1 = calculate_alpha(alpha=1.0)
val2 = calculate_alpha(alpha=val1)
return val2
if __name__ == "__main__":
# Using dbnd_context to ensure configuration is applied for programmatic runs
with dbnd_context(conf={"core": {"tracker": ["console"]}}):
result = alpha_pipeline()
print(f"\nPipeline finished with result: {result}")
print("\nTo run via CLI with full tracking (if configured): dbnd run alpha_pipeline")