Great Expectations

raw JSON →
1.15.1 verified Tue May 12 auth: no python install: verified

Great Expectations (GX) is an open-source Python library for data quality. It helps data teams validate, document, and profile their data to ensure quality and consistency throughout data pipelines. It allows users to define 'Expectations' (assertions about data), run validation tests, and generate human-readable data quality reports called 'Data Docs'. The library is actively maintained with frequent releases and supports Python versions 3.10 through 3.13, with experimental support for 3.14.

pip install great_expectations
error ModuleNotFoundError: No module named 'great_expectations'
cause This error occurs when the 'great-expectations' package is not installed in the active Python environment or is not accessible by the interpreter being used.
fix
Ensure Great Expectations is installed using pip: pip install great-expectations. If using a virtual environment or IDE, verify that the correct Python interpreter linked to the installation is selected and restart the kernel if necessary.
error AttributeError: module 'great_expectations' has no attribute 'get_context'
cause This error typically arises when the `great_expectations` package is either not fully or correctly installed, or there's a version mismatch where the `get_context` function, a primary entry point, isn't found at the module level. This can also happen if the Python interpreter caches old module states.
fix
First, uninstall and reinstall the package: pip uninstall great-expectations followed by pip install great-expectations. If the issue persists, ensure your IDE or environment is using the correct Python interpreter where the package is installed and restart your kernel or IDE.
error AttributeError: 'EphemeralDataContext' object has no attribute 'sources'
cause This error indicates that you are attempting to access data sources using the `context.sources` attribute, which is part of the older Great Expectations V2 API. The newer V3 (GX 1.0+) Fluent API uses a different approach, often `context.data_sources` or specific methods for adding data sources.
fix
Update your code to use the modern Fluent API for defining and accessing Data Sources, such as context.data_sources.add_pandas(...) or context.add_or_update_datasource() for file-based contexts. Refer to the Great Expectations V1 documentation for the correct methods to configure data sources.
error AttributeError: 'ExpectationSuite' object has no attribute 'add_expectation_configuration'
cause This `AttributeError` occurs because the `add_expectation_configuration` method has been deprecated or removed in newer versions of Great Expectations. The correct method to add expectations to an `ExpectationSuite` is `add_expectation()`, or directly appending to the `suite.expectations` list.
fix
Replace suite.add_expectation_configuration(expectation_configuration=config) with suite.add_expectation(expectation_configuration=config) or suite.expectations.append(config). Consult the official documentation for the version of Great Expectations you are using.
error TypeError: 'Checkpoint' object is not subscriptable
cause This error typically arises when trying to access elements of a `Checkpoint` object using dictionary-like indexing (e.g., `checkpoint['batches']`), which is not supported for `Checkpoint` objects in current versions of Great Expectations. `Checkpoint` objects manage validation runs and return a `CheckpointResult` object, which then contains the validation results.
fix
Instead of subscripting the Checkpoint object directly, run the checkpoint to get a CheckpointResult object, and then access its attributes or methods, such as checkpoint_result.run_results or checkpoint_result.list_validation_results().
breaking Breaking changes were introduced in the transition from V0 to V1 API and V2 to V3 API, requiring significant updates to configuration files (e.g., `expectation_suite_name` to `name`, `evaluation_parameters` to `suite_parameters`, `ge_cloud_id` to `id`). Validation Operators were deprecated in V3.
fix Consult the official migration guides in the Great Expectations documentation for detailed steps on upgrading your configurations and API calls.
gotcha Windows support for the open-source Python version (GX OSS) is currently limited or unavailable. Users in Windows environments might encounter errors or performance issues.
fix Consider running Great Expectations in a Linux or macOS environment, or using a Linux-based Docker container on Windows.
gotcha When validating data from SQL data sources, it can be challenging to retrieve specific row identifiers (e.g., primary keys or row numbers) for failed expectations directly in the validation results. This often requires switching to a Pandas-based execution engine to obtain more granular details.
fix For detailed row-level failure information, consider using a Pandas-backed data source, or implement custom logic to extract identifying information from your SQL query results before validation.
gotcha In complex data pipelines, particularly when integrating with orchestrators like Airflow, users have reported issues with Expectations executing multiple times or experiencing slow performance.
fix Carefully review your Great Expectations and orchestrator configurations. Ensure checkpoints are correctly defined and that batch requests are optimized to prevent redundant computations. Consider isolated testing of expectation suites to diagnose performance bottlenecks.
gotcha When loading data from remote URLs (e.g., using `pandas.read_csv` with a URL), users may encounter `HTTP Error 404: Not Found` if the remote resource is unavailable, has moved, or the URL is incorrect. This prevents data from being loaded into the Great Expectations context.
fix Verify the accessibility and correctness of the remote URL pointing to your data source. If the URL refers to a resource within the Great Expectations project repository, ensure you are using a current and valid path or consider downloading the data locally.
gotcha Data loading from remote URLs (e.g., raw GitHub links) may fail if the resource is moved, deleted, or if there are network issues, resulting in HTTP errors (e.g., 404 Not Found).
fix Verify the data source URL is correct and accessible. Check for changes in the repository path or file availability. Consider downloading the data locally or using a more stable data hosting solution if frequent changes occur.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 9.54s 372.8M
3.10 alpine (musl) - - 8.57s 371.4M
3.10 slim (glibc) wheel 18.5s 7.40s 359M
3.10 slim (glibc) - - 7.09s 357M
3.11 alpine (musl) wheel - 11.28s 398.9M
3.11 alpine (musl) - - 11.65s 397.4M
3.11 slim (glibc) wheel 17.4s 10.45s 383M
3.11 slim (glibc) - - 9.42s 381M
3.12 alpine (musl) wheel - 10.79s 380.3M
3.12 alpine (musl) - - 11.33s 378.7M
3.12 slim (glibc) wheel 16.1s 11.12s 364M
3.12 slim (glibc) - - 11.28s 363M
3.13 alpine (musl) wheel - 9.91s 378.4M
3.13 alpine (musl) - - 10.50s 376.8M
3.13 slim (glibc) wheel 16.7s 10.13s 362M
3.13 slim (glibc) - - 10.90s 361M
3.9 alpine (musl) wheel - 9.79s 373.0M
3.9 alpine (musl) - - 8.89s 371.7M
3.9 slim (glibc) wheel 21.3s 9.03s 364M
3.9 slim (glibc) - - 7.80s 363M

This quickstart demonstrates how to initialize a Data Context, connect to a sample Pandas DataFrame, define and save an Expectation Suite, run validation using a Checkpoint, and view the results. For persistent setups, you would typically run `great_expectations init` in your terminal to create a filesystem-backed Data Context.

import great_expectations as gx
import pandas as pd
import os

# 1. Initialize a Data Context (or use an existing one)
# For quickstart, a temporary in-memory context is often sufficient
# For persistent configuration, run `great_expectations init` in your terminal
context = gx.get_context()

# 2. Connect to data (using a Pandas DataFrame for simplicity)
# This example uses a publicly available CSV dataset
# In a real scenario, you'd load your own data, e.g., from a file, database, or API
df = pd.read_csv("https://raw.githubusercontent.com/great-expectations/great_expectations/develop/tests/test_sets/taxi_trips.csv")

# Add a Pandas Datasource and a Data Asset
datasource = context.data_sources.add_pandas("my_pandas_datasource")
data_asset = datasource.add_dataframe_asset(name="my_dataframe_asset", dataframe=df)

# Get a Validator to create and run Expectations
validator = context.get_validator(batch_request=data_asset.build_batch_request())

# 3. Create Expectations
# Define assertions about your data
validator.expect_column_to_exist("passenger_count")
validator.expect_column_values_to_be_between("passenger_count", min_value=1, max_value=6)
validator.expect_column_values_to_not_be_null("pickup_datetime")

# 4. Save the Expectation Suite
validator.save_expectation_suite(discard_failed_expectations=False)

# 5. Run validation
checkpoint = context.add_or_update_checkpoint(
    name="my_checkpoint",
    validator=validator,
)

checkpoint_result = checkpoint.run()

# 6. Review validation results (e.g., in Data Docs)
# To open Data Docs in your browser, uncomment the line below after a successful run
# context.build_data_docs()
# context.open_data_docs()

print("Validation successful:", checkpoint_result.success)
if not checkpoint_result.success:
    print("Validation failed. Check Data Docs for details.")