YData Profiling
ydata-profiling is a powerful Python library that automates the generation of comprehensive exploratory data analysis (EDA) reports for pandas DataFrames. It provides detailed statistics, visualizations, and interactive widgets to understand data quality and distributions. The library is actively maintained with frequent minor releases, typically monthly or bi-monthly.
Warnings
- breaking The library was renamed from `pandas-profiling` to `ydata-profiling` starting with version 3.0.0. This change requires updating import statements and package names in `requirements.txt` or `pyproject.toml`.
- gotcha Profiling large datasets can be computationally intensive and consume significant memory. For very large datasets, consider sampling or using the Spark integration (`ydata-profiling[spark]`).
- gotcha Specific features like Jupyter Notebook widgets or Spark DataFrame profiling require installing optional dependencies. The base `pip install ydata-profiling` does not include these.
- gotcha The library has specific Python version requirements (currently `Python >=3.10, <3.14`). Ensure your environment matches these requirements to avoid installation or runtime errors.
Install
-
pip install ydata-profiling -
pip install ydata-profiling[notebook] -
pip install ydata-profiling[spark]
Imports
- ProfileReport
from ydata_profiling import ProfileReport
Quickstart
import pandas as pd
from ydata_profiling import ProfileReport
# Sample DataFrame
data = {
'col1': [1, 2, 3, 4, 5],
'col2': ['A', 'B', 'A', 'C', 'B'],
'col3': [1.1, 2.2, None, 4.4, 5.5]
}
df = pd.DataFrame(data)
# Generate the profile report
profile = ProfileReport(df, title="My DataFrame Profile", explorative=True)
# Save report to an HTML file
profile.to_file("your_report.html")
# If running in a Jupyter Notebook, you can display widgets directly:
# profile.to_widgets()
print("Profile report saved to your_report.html")