Apache Airflow Provider for Great Expectations
The `airflow-provider-great-expectations` package provides Apache Airflow operators for running Great Expectations (GX) data validations directly in your DAGs. It supports validating in-memory DataFrames, data from external sources using BatchDefinitions, or triggering actions with Checkpoints. The current version is 1.0.0, released in January 2026, and it typically receives new features and maintenance updates periodically.
Common errors
-
ModuleNotFoundError: No module named 'great_expectations.checkpoint.types.checkpoint_result'
cause An older version of `airflow-provider-great-expectations` (e.g., pre-0.2.9) is being used with `great_expectations` version 1.0.0 or higher. The `CheckpointResult` class was removed in `great_expectations==1.0.0`.fixUpgrade `airflow-provider-great-expectations` to version `0.2.9` or later, or preferably to `1.0.0` which is designed for compatibility with newer Great Expectations versions (e.g., `great-expectations>=1.7.0`). -
Great Expectations validation failed but Airflow DAG continued to run.
cause Using `airflow-provider-great-expectations` version older than `1.0.0a5`. In these versions, validation failures might have been logged but did not always explicitly raise an Airflow exception to halt the DAG.fixUpgrade to `airflow-provider-great-expectations` version `1.0.0a5` or newer. These versions are designed to fail the Airflow task upon a Great Expectations validation failure. -
TypeError: 'NoneType' object is not callable (or similar errors related to `configure_dataframe`/`configure_expectations`)
cause The functions provided to `configure_dataframe` or `configure_expectations` parameters in operators like `GXValidateDataFrameOperator` are either not callable, or they return `None` or an unexpected type.fixEnsure that `configure_dataframe` returns a `pandas.DataFrame` or `pyspark.sql.DataFrame`, and `configure_expectations` returns a `great_expectations.core.ExpectationSuite` or `great_expectations.expectations.Expectation`. Verify the callable functions are correctly defined and return the expected objects.
Warnings
- breaking Version 1.0.0 (and its alpha releases) introduced new specialized operators (`GXValidateDataFrameOperator`, `GXValidateBatchOperator`, `GXValidateCheckpointOperator`) which replace the legacy `GreatExpectationsOperator`. Existing DAGs using `GreatExpectationsOperator` must be migrated.
- breaking As of version 1.0.0a5, Great Expectations validation failures within the provider's operators will now explicitly raise an AirflowException, causing the DAG task to fail. Previous versions might have allowed the DAG to continue without halting.
- breaking Support for Python versions prior to 3.8 was dropped in version 0.3.0. Additionally, version 1.0.0+ requires Python 3.10+ (specifically `<3.14, >3.9` as per PyPI metadata).
- gotcha Older versions of the `airflow-provider-great-expectations` (e.g., pre-0.2.9) were not compatible with `great_expectations` version 1.0.0 and above due to API changes (e.g., removal of `CheckpointResult`). The current provider `1.0.0` requires `great-expectations>=1.7.0`.
Install
-
pip install "airflow-provider-great-expectations<3.14,>3.9" -
pip install "airflow-provider-great-expectations[snowflake]" # Example for Snowflake
Imports
- GXValidateDataFrameOperator
from great_expectations_provider.operators.great_expectations import GreatExpectationsOperator
from great_expectations_provider.operators.validate_dataframe import GXValidateDataFrameOperator
- GXValidateBatchOperator
from great_expectations_provider.operators.validate_batch import GXValidateBatchOperator
- GXValidateCheckpointOperator
from great_expectations_provider.operators.validate_checkpoint import GXValidateCheckpointOperator
Quickstart
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.operators.python import PythonOperator
from great_expectations_provider.operators.validate_dataframe import GXValidateDataFrameOperator
import pandas as pd # Import pandas here as per best practice, not top-level if heavy
from great_expectations.core import ExpectationSuite, ExpectationConfiguration # For defining expectations
def _get_dataframe():
# Simulate loading data into a Pandas DataFrame
data = {
'col_a': [1, 2, 3, 4, 5],
'col_b': ['a', 'b', 'c', 'd', 'e']
}
return pd.DataFrame(data)
def _get_expectations_suite(context):
# Define expectations. 'context' is the AbstractDataContext passed by the operator.
suite = context.suites.add_or_update(ExpectationSuite(name='my_expectation_suite'))
suite.add_expectation(ExpectationConfiguration(
expectation_type='expect_column_to_exist',
kwargs={'column': 'col_a'}
))
suite.add_expectation(ExpectationConfiguration(
expectation_type='expect_column_values_to_be_of_type',
kwargs={'column': 'col_a', 'type': 'int64'}
))
return suite
with DAG(
dag_id="great_expectations_dataframe_validation_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["great_expectations", "data_quality"],
) as dag:
validate_dataframe_task = GXValidateDataFrameOperator(
task_id="validate_my_dataframe",
configure_dataframe=_get_dataframe,
configure_expectations=_get_expectations_suite,
)
# Example of a downstream task that would run if validation passes
success_task = PythonOperator(
task_id="data_quality_passed",
python_callable=lambda: print("Data quality checks passed!"),
)
validate_dataframe_task >> success_task