OptBinning
OptBinning is a Python library for optimal binning, a data preprocessing technique used in machine learning to transform continuous or categorical features into discrete bins. It supports various binning algorithms, including optimal, isotonic, and tree-based methods, and facilitates scorecard development. The current version is 0.21.0, with a release cadence of typically a new minor version every 1-2 months, often including new features and bug fixes.
Common errors
-
ModuleNotFoundError: No module named 'ortools'
cause The `ortools` package, a required dependency for the underlying optimization solvers, is not installed or its version conflicts with OptBinning's requirements.fixInstall the correct version of `ortools` by running `pip install 'ortools<9.12'` to satisfy OptBinning's dependency requirements. -
AttributeError: 'Scorecard' object has no attribute 'transform'
cause Attempting to call the `transform` method on a `Scorecard` object in an OptBinning version older than 0.21.0, where this method did not exist.fixUpgrade OptBinning to version 0.21.0 or newer (`pip install --upgrade optbinning`). Alternatively, use `Scorecard.decision_function` if you need the scores for a specific dataset. -
ValueError: Binning process not fitted. Call 'fit' or 'fit_transform' first.
cause The `transform` method was called on an `OptimalBinning` or `BinningProcess` object before the `fit` or `fit_transform` method was executed, meaning the binning rules have not been learned yet.fixEnsure that `binning_instance.fit(X, y)` or `binning_instance.fit_transform(X, y)` is called before attempting to call `binning_instance.transform(X)`. -
TypeError: OptimalBinning.fit() missing 1 required positional argument: 'y'
cause The `fit` method for optimal binning classes (e.g., `OptimalBinning`) requires both features (X) and target (y) arguments, but 'y' was omitted.fixProvide both the feature series/array (X) and the target series/array (y) to the `fit` method, e.g., `optimal_binning_instance.fit(X['feature_name'], y)`.
Warnings
- breaking OptBinning v0.20.1 introduced a specific constraint for the `ortools` dependency (`ortools<9.12`) to avoid incompatible changes in CP-SAT solver. Using `ortools` version 9.12 or higher will cause runtime errors.
- gotcha The `Scorecard.transform` method was added in OptBinning v0.21.0. If you are using an older version, this method will not exist, and you may need to manually apply transformations or update your library.
- gotcha Prior to OptBinning v0.19.0, the `transform` method might not preserve the `pandas.DataFrame` index. This could lead to misalignment issues if not handled carefully.
- gotcha The handling and implementation of `sample_weight` have evolved across several versions (e.g., v0.17.0, v0.17.3, v0.21.0). Behavior might differ slightly or require specific checks depending on your OptBinning version.
Install
-
pip install optbinning
Imports
- OptimalBinning
from optbinning import OptimalBinning
- BinningProcess
from optbinning import BinningProcess
- Scorecard
from optbinning import Scorecard
- OptimalBinningSklearn
from optbinning import OptimalBinningSklearn
from optbinning.optimal_binning import OptimalBinningSklearn
Quickstart
import numpy as np
import pandas as pd
from optbinning import OptimalBinning
# Create dummy data
np.random.seed(42)
X = pd.DataFrame({
'feature_1': np.random.rand(100) * 100,
'feature_2': np.random.randint(0, 5, 100),
'feature_3': np.random.normal(50, 10, 100),
})
y = np.random.randint(0, 2, 100) # Binary target
# Initialize and fit OptimalBinning for a continuous feature
optb_num = OptimalBinning(name="feature_1", dtype="numerical", dtype_target="binary")
optb_num.fit(X["feature_1"], y)
# Transform the feature
X["feature_1_binned"] = optb_num.transform(X["feature_1"])
# Print binning table
print(f"Binning Table for feature_1:\n{optb_num.binning_table.build()}\n")
# Example with a categorical feature
optb_cat = OptimalBinning(name="feature_2", dtype="categorical", dtype_target="binary")
optb_cat.fit(X["feature_2"], y)
X["feature_2_binned"] = optb_cat.transform(X["feature_2"])
print(f"Binning Table for feature_2:\n{optb_cat.binning_table.build()}\n")
# The transformed data
print("Transformed DataFrame head:")
print(X.head())