Databricks Delta Live Tables (DLT) Python Stubs
The `databricks-dlt` library provides Python stubs to facilitate local development of Databricks Delta Live Tables (DLT) pipelines. It offers API specifications, docstring references for IDE autocompletion, and Python data type hints for static type checking. This library is purely for development-time assistance and *does not contain functional implementations*; DLT pipelines must be executed on a Databricks workspace. The underlying DLT product has recently been updated to Lakeflow Spark Declarative Pipelines (SDP) by Databricks, with continuous updates on the platform.
Warnings
- gotcha The `databricks-dlt` PyPI library provides Python *stubs* for local development tools (IDE autocompletion, type checking) and *does not contain functional implementations*. DLT pipelines defined using this stub must be deployed and executed on a Databricks workspace; they cannot be run locally.
- deprecated The underlying product "Delta Live Tables (DLT)" has been rebranded to "Lakeflow Spark Declarative Pipelines (SDP)". While existing Python code using `import dlt` will continue to function, Databricks officially recommends migrating new development to use `from pyspark import pipelines as dp` and the corresponding `@dp` decorators and functions for future compatibility and to leverage new features.
- gotcha The `dlt` (or `pyspark.pipelines`) module and its decorators are only available when your Python code is executed within the context of a Databricks DLT/SDP pipeline. Attempting to import or use these modules in a standalone Python script or a regular Databricks notebook (not configured as a DLT pipeline) will result in an `ImportError` or `NameError`.
- gotcha Managing external Python dependencies within DLT/SDP pipelines directly with `%pip install` or init scripts on the cluster can be problematic due to potential conflicts and maintenance issues. This can lead to unexpected pipeline failures or inconsistent environments.
Install
-
pip install databricks-dlt
Imports
- dlt
import dlt
- pipelines (as dp)
from pyspark import pipelines as dp
Quickstart
# This code snippet is designed to run within a Databricks DLT Notebook environment.
# The 'databricks-dlt' library provides stubs for local development,
# but actual execution requires a Databricks workspace.
import dlt
from pyspark.sql.functions import *
# Define a streaming table (Bronze layer)
@dlt.table
def raw_data():
# In a real scenario, this would read from a source like Auto Loader
# e.g., spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load('/databricks-datasets/retail-org/sales_orders/')
# For demonstration, we'll simulate a static DataFrame read as this is a stub example
return spark.read.format('json').load('/databricks-datasets/retail-org/sales_orders/')
# Define a cleansed table (Silver layer) with expectations
@dlt.table(comment='Cleansed sales orders with valid order numbers')
@dlt.expect_or_drop('valid_order_number', 'order_number IS NOT NULL')
def cleansed_data():
return dlt.read('raw_data').select(col('customer_id'), col('order_number'), col('order_date'))
# Define an aggregated table (Gold layer)
@dlt.table(name='daily_sales_summary')
def daily_sales():
return (
dlt.read('cleansed_data')
.groupBy('order_date')
.agg(count('order_number').alias('total_orders'))
)