Automatic Piecewise Linear Regression
APLR (Automatic Piecewise Linear Regression) is a Python library for building predictive and interpretable regression or classification machine learning models. It implements the Automatic Piecewise Linear Regression methodology, often achieving predictive accuracy comparable to tree-based methods while offering smoother, more interpretable predictions. The library is actively maintained with frequent releases, currently at version 10.22.0, and supports Python versions 3.8 and above.
Common errors
-
AttributeError: 'APLRRegressor' object has no attribute '...' (when loading a saved model)
cause A regression in `aplr` version `10.20.0` caused incompatibility when loading models saved with older versions (10.18.0-10.19.3), particularly those that used Python-based preprocessing.fixUpgrade `aplr` to version `>=10.20.1` (which includes a fix for backward compatibility) or retrain the model with `aplr>=10.20.0`. If using 10.20.0, any model saved with 10.18.0-10.19.3 needs retraining. -
ValueError: X_names must be list-like (or similar error when passing X_names with NumPy array)
cause In versions prior to `10.19.3`, the input validation for the `X_names` parameter in the `fit` method was strict and would raise a `ValueError` if a NumPy array was provided.fixUpgrade `aplr` to version `>=10.19.3` which improved validation to handle any list-like iterable. Alternatively, ensure `X_names` is a `list` or `tuple`. -
RuntimeWarning: Mean of empty slice (during model fitting or preprocessing)
cause This warning occurred in versions prior to `10.19.2` during median imputation if a column contained only missing values, leading to an attempt to calculate the mean of an empty slice.fixUpgrade `aplr` to version `>=10.19.2`, which includes a fix for this memory optimization and preprocessing robustness. Alternatively, pre-process data to handle columns with all missing values (e.g., dropping them) before passing to `aplr`.
Warnings
- breaking Models saved with `aplr` versions `10.18.0` through `10.19.3` are not compatible with `10.20.0` or newer if they were trained on data that triggered Python-based preprocessing (e.g., `pandas.DataFrame` with categorical features or missing values). These models must be retrained.
- gotcha The `validation_ratio` parameter (introduced in 10.21.0) for `APLRRegressor` and `APLRClassifier` now takes precedence over `cv_folds`. If `validation_ratio` is specified, `cv_folds` is ignored for internal hyperparameter tuning.
- gotcha The `min_observations_in_split` and `predictor_min_observations_in_split` parameters were changed in version 10.22.0. Users relying on specific previous behavior for these parameters should re-evaluate their impact.
- gotcha Automatic data preprocessing (`preprocess=True` by default) in `APLRRegressor` and `APLRClassifier` intelligently handles missing values (imputation) and one-hot encodes categorical features for `pandas.DataFrame` inputs. While convenient, this might mask custom preprocessing needs or introduce overhead if manual preprocessing is preferred.
- gotcha When using `APLRTuner`, enabling `sequential_tuning=True` can significantly speed up hyperparameter search by tuning parameters sequentially and avoiding re-testing duplicate combinations, compared to a full grid search.
Install
-
pip install aplr -
pip install aplr[plots]
Imports
- APLRRegressor
from aplr import APLRRegressor
- APLRClassifier
from aplr import APLRClassifier
- APLRTuner
from aplr import APLRTuner
Quickstart
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from aplr import APLRRegressor
# Generate synthetic data
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=42)
X_df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
y_series = pd.Series(y)
# Add a categorical feature for testing APLR's auto-preprocessing
X_df['categorical_feature'] = np.random.choice(['A', 'B', 'C'], size=1000)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_df, y_series, test_size=0.2, random_state=42)
# Initialize and train the APLRRegressor model
# preprocess=True (default) enables automatic handling of categorical features and missing values
model = APLRRegressor(random_state=42, m=2000, n_jobs=-1, validation_ratio=0.1)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model (e.g., using R-squared from scikit-learn)
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2:.3f}")