TDDA: Test-Driven Data Analysis

2.2.17 · active · verified Wed Apr 15

TDDA (Test-Driven Data Analysis) is a Python library and set of command-line tools designed to improve the correctness and robustness of data analysis. It provides features for reference testing of data pipelines, automatic discovery and verification of data constraints, anomaly detection, and inference of regular expressions from text data (Rexpy). Additionally, from version 2.0, it includes features for automatic test generation (Gentest) for command-line programs. It currently supports Python >=3.8 and is actively maintained, with version 2.2.17 being the latest stable release.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `tdda.constraints` to automatically discover constraints from a Pandas DataFrame and then verify another DataFrame against these discovered constraints. It highlights the `discover_df` and `verify_df` functions, showing how to save and load constraints from a `.tdda` JSON file.

import pandas as pd
from tdda.constraints import discover_df, verify_df
import os

# Create a sample DataFrame
data = {
    'col1': [1, 2, 3, 4, 5, None],
    'col2': ['A', 'B', 'A', 'C', 'B', 'D'],
    'col3': [10.1, 11.2, 10.1, 13.4, 15.5, 12.3]
}
df = pd.DataFrame(data)

# 1. Discover constraints from the DataFrame
constraints = discover_df(df)

# Constraints object has a to_json() method to save them
constraints_filename = 'my_dataframe_constraints.tdda'
with open(constraints_filename, 'w') as f:
    f.write(constraints.to_json())
print(f"Constraints discovered and saved to {constraints_filename}")

# 2. Verify a (potentially new or modified) DataFrame against the constraints
# Let's create a slightly different DataFrame for verification
df_to_verify = pd.DataFrame({
    'col1': [1, 2, 3, 6, 5, 7],
    'col2': ['A', 'B', 'A', 'C', 'B', 'E'],
    'col3': [10.1, 11.2, 10.1, 13.0, 15.5, 12.0]
})

verification_result = verify_df(df_to_verify, constraints_filename)

print("\nVerification Results:")
print(f"Passed constraints: {verification_result.passes}")
print(f"Failed constraints: {verification_result.failures}")
if verification_result.failures > 0:
    print("Details of failed constraints:")
    print(verification_result.to_frame())

# Clean up the generated constraints file
os.remove(constraints_filename)

view raw JSON →