Brickflows
Brickflows is a Python library and CLI tool designed to simplify the development and deployment of scalable workflows on Databricks. It enables users to define Databricks workflows declaratively using Python, leveraging decorators for tasks and integrating seamlessly with Databricks Asset Bundles (DABs) for deployment. The current version is 1.7.0, and it maintains an active release cadence with frequent updates and bug fixes.
Warnings
- breaking Deployment failure if 'health' rules were not set (regression in some v1.4.x releases).
- gotcha The `WorkflowDependencySensor` might not fail when an invalid `dependency_job_id` is provided, leading to silent failures or unexpected behavior.
- gotcha Default task settings like `timeout_seconds` might not apply correctly to 'If/Else' or 'For Each' task types in older versions.
- gotcha Running a single task directly via programmatic methods in Databricks workflows is not straightforward due to architectural differences from tools like Airflow. Brickflows provides a specific UI-based mechanism.
- gotcha Project setup is crucial for correct module resolution and deployment. Incorrectly configured project roots or workflow directories can lead to import errors or deployment issues.
Install
-
pip install brickflows -
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh databricks configure --token
Imports
- Workflow
from brickflow import Workflow
- TaskSettings
from brickflow import TaskSettings
- Cluster
from brickflow import Cluster
- Project
from brickflow import Project
- ctx
from brickflow.context import ctx
Quickstart
from datetime import timedelta
from brickflow import Workflow, Cluster, TaskSettings
import os
# Configure Databricks host and token via environment variables or databricks configure --token
# For this example, ensure your ~/.databrickscfg is set up or environment variables are available.
# Example: export DATABRICKS_HOST="https://<your-workspace-url>.cloud.databricks.com"
# export DATABRICKS_TOKEN="dapi..."
wf = Workflow(
"hello_world_workflow",
default_cluster=Cluster(
name="brickflow-example-cluster",
spark_version='12.2.x-scala2.12',
node_type_id='Standard_DS3_v2',
num_workers=1
),
default_task_settings=TaskSettings(
timeout_seconds=timedelta(hours=2).seconds
)
)
@wf.task()
def hello_task():
print(f"Hello from Databricks! Host: {os.environ.get('DATABRICKS_HOST', 'N/A')}")
return "Task completed successfully"
# To run this, you would typically use the brickflows CLI:
# 1. Create a project: `mkdir my_brickflow_project && cd my_brickflow_project && brickflow projects add`
# 2. Place this code in `workflows/hello_world_wf.py` (assuming workflows is your workflows directory).
# 3. Deploy: `brickflow deploy --deploy-mode=bundle -p <your_databricks_profile>`