Databricks Bundles (Declarative Automation Bundles)
Databricks Bundles, recently renamed to Declarative Automation Bundles, provides Python support for defining, dynamically creating, and modifying Databricks jobs and pipelines. It extends the core Declarative Automation Bundles functionality, allowing users to apply software engineering best practices like source control, code review, testing, and CI/CD to their data and AI projects. The library is currently at version 0.296.0 and is actively maintained, with a focus on streamlining deployments and enabling programmatic configuration through Python and YAML files, orchestrated via the Databricks CLI.
Warnings
- breaking The product name changed from 'Databricks Asset Bundles' to 'Declarative Automation Bundles'. While the `bundle` CLI command remains the same, this indicates a conceptual shift and continuous evolution of the platform.
- gotcha Directly editing deployed notebooks or jobs in the Databricks workspace UI can lead to configuration drift and unexpected behavior during subsequent bundle deployments. The local bundle repository is considered the source of truth.
- gotcha Permission denied errors (e.g., `CAN MANAGE`, `USE CATALOG`) are common if the service principal or user deploying the bundle lacks the necessary permissions on jobs, Unity Catalog, or other resources.
- breaking Workspace paths in bundle configurations are now automatically prefixed with `/Workspace` (Databricks CLI 0.230.0+). Using path strings like `/Workspace/${workspace.root_path}/...` will generate a warning and be replaced.
- breaking The fallback path resolution behavior for resources defined in one file and overridden in another was removed in Databricks CLI 0.266.0. This could lead to confusing and error-prone path resolution in older configurations.
- gotcha The default 'dev' target created by `databricks bundle init` might not automatically align with your development Git branch or expected development environment. This requires careful manual configuration.
Install
-
pip install databricks-bundles -
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh databricks -v # Verify installation databricks auth login --host https://<your-workspace-url>
Imports
- Not applicable for direct application-level imports
The 'databricks-bundles' Python package is primarily used by the Databricks CLI internally when processing Python-defined bundle resources, rather than direct 'from pkg import ClassName' statements in end-user application code.
Quickstart
# 1. Initialize a new bundle project (select 'Default Python' template when prompted)
databricks bundle init --template experimental-jobs-as-code
# 2. Navigate into the new project directory
cd <your-bundle-project-name>
# 3. Create a Python file for a job task (e.g., src/my_job.py)
# Content for src/my_job.py:
# print("Hello from my Databricks Bundle!")
# 4. Define a simple job in databricks.yml (or a Python resource definition if using Python bundles)
# Example databricks.yml snippet defining a job running my_job.py (ensure 'target: dev' matches your config)
# bundle:
# name: my-first-bundle
# resources:
# jobs:
# my_example_job:
# name: MyExampleJob
# tasks:
# - task_key: run_script
# python_file_task:
# python_file: src/my_job.py
# new_cluster:
# spark_version: 13.3.x-scala2.12
# node_type_id: Standard_DS3_v2
# num_workers: 1
# targets:
# dev:
# default: true
# workspace:
# host: https://<your-workspace-url>
# 5. Validate the bundle configuration
databricks bundle validate
# 6. Deploy the bundle to your Databricks workspace
databricks bundle deploy --target dev
# 7. Run the deployed job
databricks bundle run my_example_job --target dev