{"id":7913,"library":"airflow-provider-great-expectations","title":"Apache Airflow Provider for Great Expectations","description":"The `airflow-provider-great-expectations` package provides Apache Airflow operators for running Great Expectations (GX) data validations directly in your DAGs. It supports validating in-memory DataFrames, data from external sources using BatchDefinitions, or triggering actions with Checkpoints. The current version is 1.0.0, released in January 2026, and it typically receives new features and maintenance updates periodically.","status":"active","version":"1.0.0","language":"en","source_language":"en","source_url":"https://github.com/great-expectations/airflow-provider-great-expectations","tags":["airflow","great-expectations","data-quality","data-validation","etl","data-orchestration"],"install":[{"cmd":"pip install \"airflow-provider-great-expectations<3.14,>3.9\"","lang":"bash","label":"Basic Install"},{"cmd":"pip install \"airflow-provider-great-expectations[snowflake]\" # Example for Snowflake","lang":"bash","label":"Install with optional data source"}],"dependencies":[{"reason":"Required for Airflow DAG orchestration. Version 1.0.0+ of the provider requires Apache Airflow 2.1+.","package":"apache-airflow","optional":false},{"reason":"The core data validation framework. Version 1.0.0+ of the provider requires Great Expectations 1.7.0+.","package":"great-expectations","optional":false}],"imports":[{"note":"The `GreatExpectationsOperator` is deprecated and replaced by specialized operators in 1.0.0.","wrong":"from great_expectations_provider.operators.great_expectations import GreatExpectationsOperator","symbol":"GXValidateDataFrameOperator","correct":"from great_expectations_provider.operators.validate_dataframe import GXValidateDataFrameOperator"},{"symbol":"GXValidateBatchOperator","correct":"from great_expectations_provider.operators.validate_batch import GXValidateBatchOperator"},{"symbol":"GXValidateCheckpointOperator","correct":"from great_expectations_provider.operators.validate_checkpoint import GXValidateCheckpointOperator"}],"quickstart":{"code":"from __future__ import annotations\n\nimport pendulum\n\nfrom airflow.models.dag import DAG\nfrom airflow.operators.python import PythonOperator\n\nfrom great_expectations_provider.operators.validate_dataframe import GXValidateDataFrameOperator\n\nimport pandas as pd # Import pandas here as per best practice, not top-level if heavy\nfrom great_expectations.core import ExpectationSuite, ExpectationConfiguration # For defining expectations\n\n\ndef _get_dataframe():\n    # Simulate loading data into a Pandas DataFrame\n    data = {\n        'col_a': [1, 2, 3, 4, 5],\n        'col_b': ['a', 'b', 'c', 'd', 'e']\n    }\n    return pd.DataFrame(data)\n\n\ndef _get_expectations_suite(context):\n    # Define expectations. 'context' is the AbstractDataContext passed by the operator.\n    suite = context.suites.add_or_update(ExpectationSuite(name='my_expectation_suite'))\n    suite.add_expectation(ExpectationConfiguration(\n        expectation_type='expect_column_to_exist',\n        kwargs={'column': 'col_a'}\n    ))\n    suite.add_expectation(ExpectationConfiguration(\n        expectation_type='expect_column_values_to_be_of_type',\n        kwargs={'column': 'col_a', 'type': 'int64'}\n    ))\n    return suite\n\n\nwith DAG(\n    dag_id=\"great_expectations_dataframe_validation_dag\",\n    start_date=pendulum.datetime(2023, 1, 1, tz=\"UTC\"),\n    catchup=False,\n    schedule=None,\n    tags=[\"great_expectations\", \"data_quality\"],\n) as dag:\n    validate_dataframe_task = GXValidateDataFrameOperator(\n        task_id=\"validate_my_dataframe\",\n        configure_dataframe=_get_dataframe,\n        configure_expectations=_get_expectations_suite,\n    )\n\n    # Example of a downstream task that would run if validation passes\n    success_task = PythonOperator(\n        task_id=\"data_quality_passed\",\n        python_callable=lambda: print(\"Data quality checks passed!\"),\n    )\n\n    validate_dataframe_task >> success_task","lang":"python","description":"This quickstart demonstrates how to use the `GXValidateDataFrameOperator` to validate a Pandas DataFrame in an Airflow DAG. The `configure_dataframe` parameter takes a callable that returns the DataFrame, and `configure_expectations` takes a callable that returns an `ExpectationSuite` (or a single `Expectation`) to apply the validation."},"warnings":[{"fix":"Rewrite your DAGs to use the new `GXValidate*` operators, choosing the one most appropriate for your data context and validation needs. Consult the official migration guide.","message":"Version 1.0.0 (and its alpha releases) introduced new specialized operators (`GXValidateDataFrameOperator`, `GXValidateBatchOperator`, `GXValidateCheckpointOperator`) which replace the legacy `GreatExpectationsOperator`. Existing DAGs using `GreatExpectationsOperator` must be migrated.","severity":"breaking","affected_versions":">=1.0.0a1"},{"fix":"Ensure your downstream tasks are prepared for potential failures. Review your DAG's error handling and retry mechanisms. This change makes validation failures more visible and prevents downstream tasks from processing bad data.","message":"As of version 1.0.0a5, Great Expectations validation failures within the provider's operators will now explicitly raise an AirflowException, causing the DAG task to fail. Previous versions might have allowed the DAG to continue without halting.","severity":"breaking","affected_versions":">=1.0.0a5"},{"fix":"Upgrade your Python environment to 3.10, 3.11, 3.12, or 3.13 to ensure compatibility with the latest provider versions.","message":"Support for Python versions prior to 3.8 was dropped in version 0.3.0. Additionally, version 1.0.0+ requires Python 3.10+ (specifically `<3.14, >3.9` as per PyPI metadata).","severity":"breaking","affected_versions":">=0.3.0"},{"fix":"Always align your provider version with the recommended `great-expectations` version. For `airflow-provider-great-expectations==1.0.0`, ensure `great-expectations>=1.7.0` is installed. Upgrade both libraries if encountering `ModuleNotFoundError` related to GX components.","message":"Older versions of the `airflow-provider-great-expectations` (e.g., pre-0.2.9) were not compatible with `great_expectations` version 1.0.0 and above due to API changes (e.g., removal of `CheckpointResult`). The current provider `1.0.0` requires `great-expectations>=1.7.0`.","severity":"gotcha","affected_versions":"<1.0.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Upgrade `airflow-provider-great-expectations` to version `0.2.9` or later, or preferably to `1.0.0` which is designed for compatibility with newer Great Expectations versions (e.g., `great-expectations>=1.7.0`).","cause":"An older version of `airflow-provider-great-expectations` (e.g., pre-0.2.9) is being used with `great_expectations` version 1.0.0 or higher. The `CheckpointResult` class was removed in `great_expectations==1.0.0`.","error":"ModuleNotFoundError: No module named 'great_expectations.checkpoint.types.checkpoint_result'"},{"fix":"Upgrade to `airflow-provider-great-expectations` version `1.0.0a5` or newer. These versions are designed to fail the Airflow task upon a Great Expectations validation failure.","cause":"Using `airflow-provider-great-expectations` version older than `1.0.0a5`. In these versions, validation failures might have been logged but did not always explicitly raise an Airflow exception to halt the DAG.","error":"Great Expectations validation failed but Airflow DAG continued to run."},{"fix":"Ensure that `configure_dataframe` returns a `pandas.DataFrame` or `pyspark.sql.DataFrame`, and `configure_expectations` returns a `great_expectations.core.ExpectationSuite` or `great_expectations.expectations.Expectation`. Verify the callable functions are correctly defined and return the expected objects.","cause":"The functions provided to `configure_dataframe` or `configure_expectations` parameters in operators like `GXValidateDataFrameOperator` are either not callable, or they return `None` or an unexpected type.","error":"TypeError: 'NoneType' object is not callable (or similar errors related to `configure_dataframe`/`configure_expectations`)"}]}