{"id":1982,"library":"databricks-labs-dqx","title":"Databricks Data Quality eXtended (DQX)","description":"Data Quality eXtended (DQX) is a Python library for defining, executing, and monitoring data quality checks. It leverages Apache Spark and is designed to integrate seamlessly within the Databricks ecosystem, supporting features like Delta Lake, DLT, and Unity Catalog. The library is actively maintained with frequent releases, currently at version 0.13.0, introducing features like an enhanced data quality dashboard and AI-assisted rule generation.","status":"active","version":"0.13.0","language":"en","source_language":"en","source_url":"https://github.com/databrickslabs/dqx","tags":["databricks","data quality","dqx","pyspark","delta lake","data governance"],"install":[{"cmd":"pip install databricks-labs-dqx pyspark","lang":"bash","label":"Install DQX and PySpark (for local use)"}],"dependencies":[{"reason":"DQX is built on Apache Spark and requires an active SparkSession to run any data quality checks. PySpark is needed for local development outside of a Databricks environment.","package":"pyspark","optional":false}],"imports":[{"symbol":"DQEngine","correct":"from dqx.core.dq_engine import DQEngine"},{"symbol":"DQRule","correct":"from dqx.core.rule import DQRule"},{"note":"Introduced in v0.12.0 for AI-assisted rule generation.","symbol":"DQGenerator","correct":"from dqx.core.dq_generator import DQGenerator"}],"quickstart":{"code":"from pyspark.sql import SparkSession\nfrom dqx.core.dq_engine import DQEngine\nfrom dqx.core.rule import DQRule\n\n# Initialize SparkSession (for local execution)\nspark = SparkSession.builder.appName(\"DQXQuickstart\") \\\n    .master(\"local[*]\") \\\n    .getOrCreate()\n\n# Create a sample DataFrame\ndata = [(\"A\", 1, \"2023-01-01\"), (\"B\", 2, \"2023-01-02\"), (\"C\", None, \"2023-01-03\"), (\"D\", 4, \"2023-01-01\")]\ncolumns = [\"id\", \"value\", \"event_date\"]\ndf = spark.createDataFrame(data, columns)\n\n# Define data quality rules\nrules = [\n    DQRule(\"value_not_null\", \"value IS NOT NULL\", \"value column should not be null\"),\n    DQRule(\"id_is_unique\", \"COUNT(DISTINCT id) = COUNT(id)\", \"id column should be unique\", dq_check_type=\"Aggregated\"),\n    DQRule(\"event_date_freshness\", \"event_date >= '2023-01-01'\", \"event_date should be recent\")\n]\n\n# Initialize DQEngine\ndq_engine = DQEngine(spark_session=spark)\n\n# Apply checks\nresults = dq_engine.run_checks(df, checks=rules)\n\n# Print results\nprint(\"Data Quality Check Results:\")\nresults.display()\n\n# Stop SparkSession\nspark.stop()","lang":"python","description":"This quickstart demonstrates how to set up a local SparkSession, create a sample DataFrame, define data quality rules using `DQRule` objects, and execute them with `DQEngine.run_checks()`. The results are displayed and the SparkSession is stopped."},"warnings":[{"fix":"Ensure `pyspark` is installed (`pip install pyspark`) and a `SparkSession` is created and passed to the `DQEngine` constructor (e.g., `DQEngine(spark_session=my_spark_session)`).","message":"DQX requires an active SparkSession. When running outside a Databricks environment (e.g., locally), you must explicitly install `pyspark` and create a `SparkSession` instance before initializing `DQEngine`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Migrate any dictionary-based rule definitions to explicit `DQRule` objects: `DQRule('rule_name', 'rule_condition', 'rule_description')`. Refer to the `DQRule` constructor for available parameters.","message":"Starting from v0.7.1, the `apply_checks` method enforces strict type validation for rules. Rules must be passed as a list of `DQRule` objects. Passing dictionaries or other types directly will raise a `TypeError`.","severity":"breaking","affected_versions":">=0.7.1"},{"fix":"Review the DQX documentation for the new dashboard structure and update any code or configurations related to dashboard generation or access. Re-evaluate if existing custom UI components are still compatible.","message":"The Data Quality Dashboard has been significantly enhanced and restructured in v0.13.0. Existing custom dashboard integrations or deployment scripts might require updates to align with the new three-tab structure and underlying APIs.","severity":"breaking","affected_versions":">=0.13.0"},{"fix":"For AI-assisted rules, use `from dqx.core.dq_generator import DQGenerator` and its methods like `generate_dq_rules_ai_assisted`. For ODCS integration, refer to the documentation on generating rules from data contracts.","message":"The `DQGenerator` class for AI-assisted rule generation (v0.12.0) and ODCS Data Contract rule generation (v0.11.0) introduces new APIs. Users expecting rule generation might need to adopt these new classes and methods instead of manual rule creation.","severity":"gotcha","affected_versions":">=0.11.0"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}