PyCaret
PyCaret is an open-source, low-code machine learning library in Python that streamlines end-to-end machine learning workflows, from data preparation to model deployment. It is currently at version 3.3.2 and maintains an active release schedule, frequently delivering minor updates for bug fixes and dependency compatibility, alongside significant major releases that introduce new features and breaking API changes. [1, 13, 14, 15]
Warnings
- breaking PyCaret 3.0 introduced significant API changes, making code written for PyCaret 2.x largely incompatible without modifications. [4, 9]
- breaking Python 3.7 support was dropped in PyCaret 3.1.0, and Python 3.8 support was dropped in PyCaret 3.3.0. [4]
- breaking The `deep_check` and `eda` functions were removed in PyCaret 3.1.0 and will raise exceptions if called. [4]
- gotcha The `setup()` function is interactive by default, prompting the user for data type confirmation. This can halt execution in non-interactive environments. [1, 10]
- gotcha PyCaret 3.x has had specific dependency version constraints. For instance, PyCaret 3.0.4 pinned `scikit-learn<1.3.0` and 3.0.1 pinned `numpy<1.24`. While newer PyCaret 3.x versions (e.g., 3.3.0) support `scikit-learn 1.4` and `pandas 2.0`, users might encounter conflicts if not using the latest compatible PyCaret version. [4, 9]
- gotcha The behavior and results of `compare_models()` might have changed significantly between PyCaret 3.0.0 and versions >= 3.0.1 due to internal bug fixes affecting model performance evaluation. [18]
Install
-
pip install pycaret -
pip install pycaret[full]
Imports
- get_data
from pycaret.datasets import get_data
- setup
from pycaret.classification import setup
- ClassificationExperiment
from pycaret.classification import ClassificationExperiment
- compare_models
from pycaret.classification import compare_models
- predict_model
from pycaret.classification import predict_model
- save_model
from pycaret.classification import save_model
Quickstart
import pandas as pd
from pycaret.datasets import get_data
from pycaret.classification import *
# Load a sample dataset
data = get_data('diabetes')
# Initialize the setup (classification experiment)
# Use silent=True for non-interactive environments and session_id for reproducibility
clf1 = setup(data=data, target='Class variable', session_id=123, silent=True)
# Compare all available models and select the best one
best_model = compare_models()
# Make predictions on the hold-out set
predictions = predict_model(best_model)
# Save the trained model pipeline
save_model(best_model, 'diabetes_best_pipeline')