DiscoverX - Lakehouse Mapping and Search

0.0.9 · active · verified Thu Apr 16

DiscoverX is a Python library developed under Databricks Labs, designed as a "Swiss-Army-knife" for Lakehouse administration. It automates tasks like inspecting and operating on a large number of Lakehouse assets, particularly through multi-table operations with SQL templates. The current version is 0.0.9, released on May 2, 2025. It is provided for exploration and is not formally supported by Databricks with Service Level Agreements (SLAs).

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize DiscoverX, define a set of tables using a wildcard pattern, and then apply a SQL template (counting rows) concurrently across all matching tables in a Databricks environment.

from discoverx import DX

# Initialize DiscoverX. 'locale' can be set for region-specific rules.
dx = DX(locale="US")

# Define the tables to operate on using a wildcard pattern
# Example: all tables in 'my_catalog.my_schema'
from_tables = "my_catalog.my_schema.*"

# Example: Count rows in all selected tables and display the results
# The '{full_table_name}' placeholder is automatically replaced.
table_counts = dx.from_tables(from_tables).with_sql("SELECT COUNT(*) FROM {full_table_name}").apply()

# Display the resulting DataFrame
table_counts.display()

view raw JSON →