OptBinning

0.21.0 · active · verified Thu Apr 16

OptBinning is a Python library for optimal binning, a data preprocessing technique used in machine learning to transform continuous or categorical features into discrete bins. It supports various binning algorithms, including optimal, isotonic, and tree-based methods, and facilitates scorecard development. The current version is 0.21.0, with a release cadence of typically a new minor version every 1-2 months, often including new features and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to use `OptimalBinning` to discretize both numerical and categorical features for a binary target. It covers initialization, fitting the binning process, and transforming the data, finally printing the generated binning tables.

import numpy as np
import pandas as pd
from optbinning import OptimalBinning

# Create dummy data
np.random.seed(42)
X = pd.DataFrame({
    'feature_1': np.random.rand(100) * 100,
    'feature_2': np.random.randint(0, 5, 100),
    'feature_3': np.random.normal(50, 10, 100),
})
y = np.random.randint(0, 2, 100) # Binary target

# Initialize and fit OptimalBinning for a continuous feature
optb_num = OptimalBinning(name="feature_1", dtype="numerical", dtype_target="binary")
optb_num.fit(X["feature_1"], y)

# Transform the feature
X["feature_1_binned"] = optb_num.transform(X["feature_1"])

# Print binning table
print(f"Binning Table for feature_1:\n{optb_num.binning_table.build()}\n")

# Example with a categorical feature
optb_cat = OptimalBinning(name="feature_2", dtype="categorical", dtype_target="binary")
optb_cat.fit(X["feature_2"], y)
X["feature_2_binned"] = optb_cat.transform(X["feature_2"])
print(f"Binning Table for feature_2:\n{optb_cat.binning_table.build()}\n")

# The transformed data
print("Transformed DataFrame head:")
print(X.head())

view raw JSON →