{"id":2424,"library":"category-encoders","title":"Category Encoders","description":"Category Encoders is a Python library providing a comprehensive set of scikit-learn-style transformers for encoding categorical variables into numeric representations using various techniques. It offers first-class support for pandas DataFrames as input and output, and integrates seamlessly with scikit-learn pipelines. The library is actively maintained, with the current version being 2.9.0, and releases occur regularly to introduce new encoders, features, and bug fixes.","status":"active","version":"2.9.0","language":"en","source_language":"en","source_url":"https://github.com/scikit-learn-contrib/category_encoders","tags":["machine learning","feature engineering","categorical encoding","sklearn"],"install":[{"cmd":"pip install category-encoders","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Required for numerical operations.","package":"numpy","optional":false},{"reason":"Required for statistical models used in some encoders.","package":"statsmodels","optional":false},{"reason":"Required for scientific computing functions.","package":"scipy","optional":false},{"reason":"Required for DataFrame input/output; version >= 1.0 recommended for full compatibility.","package":"pandas","optional":false}],"imports":[{"symbol":"TargetEncoder","correct":"from category_encoders import TargetEncoder"},{"symbol":"OneHotEncoder","correct":"from category_encoders import OneHotEncoder"},{"symbol":"OrdinalEncoder","correct":"from category_encoders import OrdinalEncoder"},{"symbol":"BinaryEncoder","correct":"from category_encoders import BinaryEncoder"}],"quickstart":{"code":"import pandas as pd\nimport category_encoders as ce\n\n# Sample Data\ndata = {\n    'city': ['New York', 'London', 'Paris', 'New York', 'London', 'Berlin'],\n    'country': ['USA', 'UK', 'France', 'USA', 'UK', 'Germany'],\n    'target': [10, 20, 15, 12, 22, 18]\n}\ndf = pd.DataFrame(data)\n\n# Initialize and fit the TargetEncoder\n# It's crucial to specify 'cols' to encode specific columns.\n# For supervised encoders, 'y' is passed during fit_transform.\nencoder = ce.TargetEncoder(cols=['city', 'country'])\nencoded_df = encoder.fit_transform(df, df['target'])\n\nprint(\"Original DataFrame:\")\nprint(df)\nprint(\"\\nEncoded DataFrame:\")\nprint(encoded_df)","lang":"python","description":"This example demonstrates how to use the `TargetEncoder` to convert categorical columns ('city', 'country') into numerical representations based on the 'target' variable. The `fit_transform` method is used on the training data, taking both features (X) and the target (y)."},"warnings":[{"fix":"Ensure your environment meets the minimum requirements: Python >=3.11, pandas >=1.0. If using scikit-learn, upgrade to a compatible version (e.g., >=1.0).","message":"Breaking changes in version 2.x removed support for older Python, pandas, and scikit-learn versions. Specifically, `category-encoders` v2.x requires Python >=3.11, pandas >=1.0, and dropped support for scikit-learn 0.x.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Explicitly set parameters like `handle_missing`, `handle_unknown`, or `smoothing` to match the desired behavior if migrating from an older version or to ensure consistent results.","message":"Default parameters for some encoders, such as `TargetEncoder` (issue 327) and `HelmertEncoder` (`handle_missing`, `handle_unknown`), changed in minor 2.x releases. This can subtly alter encoding behavior compared to earlier versions.","severity":"breaking","affected_versions":">=2.1.0, >=2.6.0"},{"fix":"Follow the scikit-learn API pattern: `encoder.fit_transform(X_train, y_train)` and then `encoder.transform(X_test)`.","message":"For supervised encoders (e.g., `TargetEncoder`, `LeaveOneOutEncoder`), always use `fit_transform(X_train, y_train)` for training data and `transform(X_test)` for test data. Using `fit().transform()` on training data might lead to different results, as `fit_transform` often employs techniques like nested cross-validation to prevent overfitting during training.","severity":"gotcha","affected_versions":"All"},{"fix":"Configure `handle_unknown` (e.g., 'value', 'return_nan', or 'error') during encoder instantiation to explicitly define how unseen categories should be handled. For production, consider 'value' with a sensible default or 'return_nan' for error detection.","message":"Handling unknown categories in new data (e.g., in a production environment) can lead to errors or unexpected values. The `handle_unknown` parameter's default behavior varies by encoder; for `TargetEncoder`, it defaults to the target mean.","severity":"gotcha","affected_versions":"All"},{"fix":"Always explicitly list the categorical columns to be encoded using the `cols` parameter to prevent accidental encoding of inappropriate features.","message":"If the `cols` parameter is not provided during encoder instantiation, `category-encoders` will attempt to encode *all* non-numeric columns (object or pandas categorical dtype). This can unintentionally encode ID columns or numerical columns that were loaded as strings.","severity":"gotcha","affected_versions":"All"},{"fix":"Reserve `OrdinalEncoder` for genuinely ordinal data. For nominal variables, consider `OneHotEncoder`, `BinaryEncoder`, or other contrast encoders.","message":"Using `OrdinalEncoder` for nominal (unordered) categorical variables can introduce an artificial, misleading order into the data, which may negatively impact models sensitive to numerical relationships (e.g., linear models).","severity":"gotcha","affected_versions":"All"},{"fix":"If you need the latest features and fixes, install via pip: `pip install category-encoders`. If using conda, check the available version carefully and consider creating a dedicated environment for pip installations if necessary.","message":"Installing `category-encoders` via `conda-forge` might provide an older version of the library (e.g., 1.x) that lacks recent features, bug fixes, or compatibility updates present in the latest pip release.","severity":"gotcha","affected_versions":"<=2.8.1 if installed via conda-forge"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}