Databand (dbnd)

1.0.34.1 · maintenance · verified Fri Apr 17

Databand (dbnd) is a Python library for MLOps, providing orchestration, monitoring, and debugging capabilities for data pipelines. It allows users to define ML tasks and pipelines, track metadata, and integrate with orchestrators like Apache Airflow. The current version is 1.0.34.1, with releases occurring periodically, though core independent development appears to have slowed following an acquisition by IBM.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart defines a simple ML pipeline with two tasks using Databand's decorators. It configures the library to output tracking information to the console and demonstrates how to run a pipeline programmatically using `dbnd_context`. For full tracking, a Databand tracking server URL and API key would typically be configured via environment variables or a `dbnd.cfg` file.

import os
from dbnd import task, pipeline, dbnd_context

# Configure DBND to log to console for demonstration
# In a real scenario, you'd typically integrate with a Databand tracking server
os.environ['DBND__CORE__TRACKER'] = 'console' # Ensure console logging is enabled
os.environ['DBND__CORE__DATABAND_URL'] = os.environ.get('DATABAND_URL', 'http://localhost:8080') # Placeholder
os.environ['DBND__CORE__DATABAND_ACCESS_TOKEN'] = os.environ.get('DATABAND_ACCESS_TOKEN', 'YOUR_API_KEY') # Placeholder

@task
def calculate_alpha(alpha: float) -> float:
    print(f"Alpha is: {alpha}")
    return alpha * 2

@pipeline
def alpha_pipeline() -> float:
    val1 = calculate_alpha(alpha=1.0)
    val2 = calculate_alpha(alpha=val1)
    return val2

if __name__ == "__main__":
    # Using dbnd_context to ensure configuration is applied for programmatic runs
    with dbnd_context(conf={"core": {"tracker": ["console"]}}):
        result = alpha_pipeline()
        print(f"\nPipeline finished with result: {result}")

    print("\nTo run via CLI with full tracking (if configured): dbnd run alpha_pipeline")

view raw JSON →