Great Expectations

1.15.1 · active · verified Sun Mar 29

Great Expectations (GX) is an open-source Python library for data quality. It helps data teams validate, document, and profile their data to ensure quality and consistency throughout data pipelines. It allows users to define 'Expectations' (assertions about data), run validation tests, and generate human-readable data quality reports called 'Data Docs'. The library is actively maintained with frequent releases and supports Python versions 3.10 through 3.13, with experimental support for 3.14.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a Data Context, connect to a sample Pandas DataFrame, define and save an Expectation Suite, run validation using a Checkpoint, and view the results. For persistent setups, you would typically run `great_expectations init` in your terminal to create a filesystem-backed Data Context.

import great_expectations as gx
import pandas as pd
import os

# 1. Initialize a Data Context (or use an existing one)
# For quickstart, a temporary in-memory context is often sufficient
# For persistent configuration, run `great_expectations init` in your terminal
context = gx.get_context()

# 2. Connect to data (using a Pandas DataFrame for simplicity)
# This example uses a publicly available CSV dataset
# In a real scenario, you'd load your own data, e.g., from a file, database, or API
df = pd.read_csv("https://raw.githubusercontent.com/great-expectations/great_expectations/develop/tests/test_sets/taxi_trips.csv")

# Add a Pandas Datasource and a Data Asset
datasource = context.data_sources.add_pandas("my_pandas_datasource")
data_asset = datasource.add_dataframe_asset(name="my_dataframe_asset", dataframe=df)

# Get a Validator to create and run Expectations
validator = context.get_validator(batch_request=data_asset.build_batch_request())

# 3. Create Expectations
# Define assertions about your data
validator.expect_column_to_exist("passenger_count")
validator.expect_column_values_to_be_between("passenger_count", min_value=1, max_value=6)
validator.expect_column_values_to_not_be_null("pickup_datetime")

# 4. Save the Expectation Suite
validator.save_expectation_suite(discard_failed_expectations=False)

# 5. Run validation
checkpoint = context.add_or_update_checkpoint(
    name="my_checkpoint",
    validator=validator,
)

checkpoint_result = checkpoint.run()

# 6. Review validation results (e.g., in Data Docs)
# To open Data Docs in your browser, uncomment the line below after a successful run
# context.build_data_docs()
# context.open_data_docs()

print("Validation successful:", checkpoint_result.success)
if not checkpoint_result.success:
    print("Validation failed. Check Data Docs for details.")

view raw JSON →