whylogs

1.6.4 · active · verified Thu Apr 16

whylogs is an open-source Python library for logging, profiling, and monitoring ML data pipelines end-to-end. It generates lightweight, mergeable statistical summaries (profiles) of datasets, enabling data quality validation, drift detection, and exploratory data analysis. It integrates with the WhyLabs Platform for observability and alerting, but the core library is open source. The library is actively maintained with frequent patch releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a whylogs session, log a Pandas DataFrame to create a data profile, and then view the summary statistics.

import pandas as pd
from whylogs import get_or_create_session

# Create a sample DataFrame
data = {
    'col_a': [1, 2, 3, 4, 5],
    'col_b': ['apple', 'banana', 'cherry', 'apple', 'date']
}
df = pd.DataFrame(data)

# Get or create a whylogs session
session = get_or_create_session()

# Log the DataFrame to generate a profile
with session.logger(dataset_name="my_first_dataset") as logger:
    logger.log_dataframe(df)

# Get the generated profile (ResultSet)
results = logger.profile() 

# You can also use the direct API for convenience (e.g., if not using a logger for multiple logs)
# import whylogs as why
# results_direct = why.log(df)

print(results.view().to_pandas())

view raw JSON →