{"id":6992,"library":"aplr","title":"Automatic Piecewise Linear Regression","description":"APLR (Automatic Piecewise Linear Regression) is a Python library for building predictive and interpretable regression or classification machine learning models. It implements the Automatic Piecewise Linear Regression methodology, often achieving predictive accuracy comparable to tree-based methods while offering smoother, more interpretable predictions. The library is actively maintained with frequent releases, currently at version 10.22.0, and supports Python versions 3.8 and above.","status":"active","version":"10.22.0","language":"en","source_language":"en","source_url":"https://github.com/ottenbreit-data-science/aplr","tags":["machine learning","regression","classification","interpretable ML","piecewise linear","sklearn-compatible"],"install":[{"cmd":"pip install aplr","lang":"bash","label":"Basic Installation"},{"cmd":"pip install aplr[plots]","lang":"bash","label":"Installation with Plotting Support"}],"dependencies":[{"reason":"Required Python version.","package":"python","optional":false},{"reason":"Fundamental package for numerical computing.","package":"numpy","optional":false},{"reason":"Used for DataFrame input handling and preprocessing.","package":"pandas","optional":false},{"reason":"Common machine learning utilities and data handling.","package":"scikit-learn","optional":false}],"imports":[{"symbol":"APLRRegressor","correct":"from aplr import APLRRegressor"},{"symbol":"APLRClassifier","correct":"from aplr import APLRClassifier"},{"note":"Used for hyperparameter tuning, supporting sequential and grid search.","symbol":"APLRTuner","correct":"from aplr import APLRTuner"}],"quickstart":{"code":"import numpy as np\nimport pandas as pd\nfrom sklearn.datasets import make_regression\nfrom sklearn.model_selection import train_test_split\nfrom aplr import APLRRegressor\n\n# Generate synthetic data\nX, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=42)\nX_df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])\ny_series = pd.Series(y)\n\n# Add a categorical feature for testing APLR's auto-preprocessing\nX_df['categorical_feature'] = np.random.choice(['A', 'B', 'C'], size=1000)\n\n# Split data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X_df, y_series, test_size=0.2, random_state=42)\n\n# Initialize and train the APLRRegressor model\n# preprocess=True (default) enables automatic handling of categorical features and missing values\nmodel = APLRRegressor(random_state=42, m=2000, n_jobs=-1, validation_ratio=0.1)\nmodel.fit(X_train, y_train)\n\n# Make predictions\ny_pred = model.predict(X_test)\n\n# Evaluate the model (e.g., using R-squared from scikit-learn)\nfrom sklearn.metrics import r2_score\nr2 = r2_score(y_test, y_pred)\nprint(f\"R-squared: {r2:.3f}\")","lang":"python","description":"This quickstart demonstrates how to initialize, train, and make predictions with an `APLRRegressor` model using synthetic data. It includes a categorical feature to highlight APLR's automatic preprocessing capabilities for `pandas.DataFrame` inputs. The `validation_ratio` parameter is used for faster internal hyperparameter tuning, which supersedes `cv_folds`."},"warnings":[{"fix":"Upgrade to `aplr` version `>=10.20.0` and retrain affected models. Alternatively, downgrade to the exact version the model was trained with.","message":"Models saved with `aplr` versions `10.18.0` through `10.19.3` are not compatible with `10.20.0` or newer if they were trained on data that triggered Python-based preprocessing (e.g., `pandas.DataFrame` with categorical features or missing values). These models must be retrained.","severity":"breaking","affected_versions":"10.18.0 - 10.19.3 when loading with >=10.20.0"},{"fix":"Review existing code using `cv_folds` for hyperparameter tuning. If `validation_ratio` is also present, ensure it aligns with the desired validation strategy or remove it if cross-validation is intended.","message":"The `validation_ratio` parameter (introduced in 10.21.0) for `APLRRegressor` and `APLRClassifier` now takes precedence over `cv_folds`. If `validation_ratio` is specified, `cv_folds` is ignored for internal hyperparameter tuning.","severity":"gotcha","affected_versions":">=10.21.0"},{"fix":"Consult the latest documentation for `min_observations_in_split` and `predictor_min_observations_in_split` and adjust hyperparameters accordingly. Larger values are generally more robust for larger datasets.","message":"The `min_observations_in_split` and `predictor_min_observations_in_split` parameters were changed in version 10.22.0. Users relying on specific previous behavior for these parameters should re-evaluate their impact.","severity":"gotcha","affected_versions":">=10.22.0"},{"fix":"To disable automatic preprocessing, initialize the model with `preprocess=False`. In this case, ensure `X` is a purely numeric `numpy.ndarray` or `pandas.DataFrame` with all preprocessing handled externally.","message":"Automatic data preprocessing (`preprocess=True` by default) in `APLRRegressor` and `APLRClassifier` intelligently handles missing values (imputation) and one-hot encodes categorical features for `pandas.DataFrame` inputs. While convenient, this might mask custom preprocessing needs or introduce overhead if manual preprocessing is preferred.","severity":"gotcha","affected_versions":"All versions >=10.18.0 (when auto-preprocessing was introduced)"},{"fix":"Consider setting `sequential_tuning=True` in `APLRTuner` for faster hyperparameter optimization, especially with many parameters or complex search spaces.","message":"When using `APLRTuner`, enabling `sequential_tuning=True` can significantly speed up hyperparameter search by tuning parameters sequentially and avoiding re-testing duplicate combinations, compared to a full grid search.","severity":"gotcha","affected_versions":">=10.22.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Upgrade `aplr` to version `>=10.20.1` (which includes a fix for backward compatibility) or retrain the model with `aplr>=10.20.0`. If using 10.20.0, any model saved with 10.18.0-10.19.3 needs retraining.","cause":"A regression in `aplr` version `10.20.0` caused incompatibility when loading models saved with older versions (10.18.0-10.19.3), particularly those that used Python-based preprocessing.","error":"AttributeError: 'APLRRegressor' object has no attribute '...' (when loading a saved model)"},{"fix":"Upgrade `aplr` to version `>=10.19.3` which improved validation to handle any list-like iterable. Alternatively, ensure `X_names` is a `list` or `tuple`.","cause":"In versions prior to `10.19.3`, the input validation for the `X_names` parameter in the `fit` method was strict and would raise a `ValueError` if a NumPy array was provided.","error":"ValueError: X_names must be list-like (or similar error when passing X_names with NumPy array)"},{"fix":"Upgrade `aplr` to version `>=10.19.2`, which includes a fix for this memory optimization and preprocessing robustness. Alternatively, pre-process data to handle columns with all missing values (e.g., dropping them) before passing to `aplr`.","cause":"This warning occurred in versions prior to `10.19.2` during median imputation if a column contained only missing values, leading to an attempt to calculate the mean of an empty slice.","error":"RuntimeWarning: Mean of empty slice (during model fitting or preprocessing)"}]}