Great Expectations Experimental Builds
`great-expectations-experimental` is a daily build of the Great Expectations library, a robust tool for data quality, validation, and documentation. Unlike the stable `great_expectations` package, this package provides access to the latest, potentially unstable, features and bug fixes directly from the main development branch. Its versioning is date-based, reflecting its continuous integration nature, and it is primarily intended for early testing and development, not for production environments.
Warnings
- breaking APIs and internal implementations within `great-expectations-experimental` can change daily without notice. It is built directly from the `great_expectations` main branch.
- gotcha This package is explicitly for development and testing new features; it is not intended for production use. It may contain incomplete features, unannounced bugs, or performance issues.
- gotcha Version numbers are date-based (e.g., `0.1.20240917055`) and do not follow standard semantic versioning (MAJOR.MINOR.PATCH). This means that a 'newer' version number might not directly imply compatibility with an older one.
- gotcha While the package name is `great-expectations-experimental`, its Python modules are exposed under the `great_expectations` namespace (e.g., `import great_expectations as gx`). This can be confusing for new users.
Install
-
pip install great-expectations-experimental
Imports
- great_expectations
import great_expectations as gx
- DataContext
from great_expectations_experimental.data_context import DataContext
from great_expectations.data_context import DataContext
Quickstart
import great_expectations as gx
import pandas as pd
import os
# Create a sample DataFrame
df = pd.DataFrame({
"col1": [1, 2, 3, 4, 5],
"col2": ["A", "B", "C", "D", "E"]
})
# Initialize an ephemeral DataContext
# This creates a temporary context in memory without modifying the filesystem.
context = gx.get_context(cloud_mode=False)
# Add an In-Memory Data Asset
# The name 'my_experimental_dataframe' is used to refer to this data within GX.
my_asset = context.add_pandas_dataframe_asset(
dataframe=df, name="my_experimental_dataframe"
)
# Build a batch request for validation
batch_request = my_asset.build_batch_request()
# Get a Validator for the specified batch
validator = context.get_validator(batch_request=batch_request)
# Add an expectation: expect values in 'col1' to be between 1 and 5 (inclusive)
validator.expect_column_values_to_be_between(column="col1", min_value=1, max_value=5)
# Add another expectation: expect 'col2' to contain distinct values from a set
validator.expect_column_distinct_values_to_contain_set(column="col2", value_set=["A", "C", "E"])
# Validate the data against the defined expectations
results = validator.validate()
print("\nValidation Results:")
print(f"Overall Validation Success: {results.success}")
# You can inspect individual expectation results
for result in results.results:
print(f" Expectation: {result.expectation_config.expectation_type}, Success: {result.success}")