whylogs
whylogs is an open-source Python library for logging, profiling, and monitoring ML data pipelines end-to-end. It generates lightweight, mergeable statistical summaries (profiles) of datasets, enabling data quality validation, drift detection, and exploratory data analysis. It integrates with the WhyLabs Platform for observability and alerting, but the core library is open source. The library is actively maintained with frequent patch releases.
Common errors
-
ModuleNotFoundError: No module named 'whylogs.viz'
cause The visualization utilities are part of an optional extra installation ('viz') and are not included in the base `whylogs` package.fixInstall whylogs with the visualization extra: `pip install "whylogs[viz]"` -
ModuleNotFoundError: No module named 'whylogs.pyspark.experimental'
cause PySpark integration is an optional extra ('spark') and is not included in the base `whylogs` package.fixInstall whylogs with the PySpark extra: `pip install "whylogs[spark]"` -
Failed to deserialize profile: The profile was generated with an incompatible whylogs version.
cause Attempting to load a whylogs profile generated by a significantly different major version of whylogs (e.g., v0.x profile with v1.x library), which introduced breaking changes in the profile format.fixEnsure the whylogs library version used for reading profiles is compatible with the version used for generating them. If migrating from v0.x to v1.x, you may need to re-profile your historical data or use the appropriate migration tools if available.
Warnings
- breaking whylogs v1 introduced significant breaking changes from v0.x, including API alterations and potential incompatibility with profiles generated by older versions. Users migrating from v0.x should consult the migration guide.
- gotcha Core visualization tools (`ProfileVisualizer`, `profile_viewer`) and PySpark integration require extra installations (`whylogs[viz]` and `whylogs[spark]`, respectively). A base `pip install whylogs` will not include these functionalities.
- deprecated The hosted WhyLabs Platform, used for advanced monitoring and observability of whylogs profiles, is being discontinued. While the whylogs library remains open source and the WhyLabs platform's source code is publicly available for self-hosting, the managed SaaS offering is no longer accessible.
- breaking whylogs version 1.1.2 was yanked from PyPI due to a bug that prevented it from correctly reading dataset profiles written with previous versions.
Install
-
pip install whylogs -
pip install "whylogs[viz]" -
pip install "whylogs[spark]"
Imports
- get_or_create_session
from whylogs import get_or_create_session
- whylogs as why
import whylogs as why
- ProfileVisualizer
from whylogs import ProfileVisualizer
from whylogs.viz import ProfileVisualizer
- profile_viewer
from whylogs.viz import profile_viewer
Quickstart
import pandas as pd
from whylogs import get_or_create_session
# Create a sample DataFrame
data = {
'col_a': [1, 2, 3, 4, 5],
'col_b': ['apple', 'banana', 'cherry', 'apple', 'date']
}
df = pd.DataFrame(data)
# Get or create a whylogs session
session = get_or_create_session()
# Log the DataFrame to generate a profile
with session.logger(dataset_name="my_first_dataset") as logger:
logger.log_dataframe(df)
# Get the generated profile (ResultSet)
results = logger.profile()
# You can also use the direct API for convenience (e.g., if not using a logger for multiple logs)
# import whylogs as why
# results_direct = why.log(df)
print(results.view().to_pandas())