Category Encoders

2.9.0 · active · verified Fri Apr 10

Category Encoders is a Python library providing a comprehensive set of scikit-learn-style transformers for encoding categorical variables into numeric representations using various techniques. It offers first-class support for pandas DataFrames as input and output, and integrates seamlessly with scikit-learn pipelines. The library is actively maintained, with the current version being 2.9.0, and releases occur regularly to introduce new encoders, features, and bug fixes.

Warnings

Install

Imports

Quickstart

This example demonstrates how to use the `TargetEncoder` to convert categorical columns ('city', 'country') into numerical representations based on the 'target' variable. The `fit_transform` method is used on the training data, taking both features (X) and the target (y).

import pandas as pd
import category_encoders as ce

# Sample Data
data = {
    'city': ['New York', 'London', 'Paris', 'New York', 'London', 'Berlin'],
    'country': ['USA', 'UK', 'France', 'USA', 'UK', 'Germany'],
    'target': [10, 20, 15, 12, 22, 18]
}
df = pd.DataFrame(data)

# Initialize and fit the TargetEncoder
# It's crucial to specify 'cols' to encode specific columns.
# For supervised encoders, 'y' is passed during fit_transform.
encoder = ce.TargetEncoder(cols=['city', 'country'])
encoded_df = encoder.fit_transform(df, df['target'])

print("Original DataFrame:")
print(df)
print("\nEncoded DataFrame:")
print(encoded_df)

view raw JSON →