DBND Spark
raw JSON → 1.0.34.1 verified Mon Apr 27 auth: no python
DBND Spark provides integration between Databand's data orchestration framework and Apache Spark. It enables tracking, monitoring, and logging of Spark jobs, including data metrics, lineage, and execution context. The library wraps SparkSession to automatically capture logs and telemetry. Version 1.0.34.1 is the latest stable release, with monthly updates. 'dbnd-spark' is part of the 'dbnd' ecosystem but installed separately. Maintained by Databand (now IBMA).
pip install dbnd-spark Common errors
error ModuleNotFoundError: No module named 'dbnd_spark' ↓
cause dbnd-spark is a separate package from dbnd.
fix
Run 'pip install dbnd-spark' in addition to dbnd.
error AttributeError: 'SparkSession' object has no attribute 'dbnd_tracking' ↓
cause Spark session not wrapped by DBND; import missing or SparkContext not initialized properly.
fix
Ensure you import dbnd_spark before creating SparkSession, or use DbndSparkSessionBuilder.
Warnings
deprecated The 'dbnd-spark' package is being deprecated in favor of 'dbnd' unified package. New versions of dbnd include Spark support internally. ↓
fix Migrate to 'dbnd' package and use 'from dbnd_spark import ...' from within dbnd.
gotcha SparkSession must be created inside a DBND task. Creating it at module level breaks tracking. ↓
fix Always create SparkSession inside a @task-decorated function.
breaking Changed from camelCase to snake_case for configuration attributes in v1.0.20. ↓
fix Use underscore style: app_name instead of appName.
Imports
- DbndSparkConfig wrong
dbnd_spark.DbndSparkConfigcorrectfrom dbnd_spark import DbndSparkConfig
Quickstart
from dbnd import dbnd_config, task
from dbnd_spark import DbndSparkConfig
@task
def my_spark_job():
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()
df = spark.range(10)
df.show()
spark.stop()
if __name__ == "__main__":
dbnd_config.set(DbndSparkConfig.webapp_url=os.environ.get('DBND_WEBAPP_URL', ''))
my_spark_job.dbnd_run()