Scikit-Survival
Scikit-survival is a Python library for survival analysis built on top of scikit-learn. It provides various survival models like Cox proportional hazards, random survival forests, and gradient boosting, along with utility functions for data preparation and evaluation. The current version is 0.27.0, and it follows an active release cadence, frequently updating to support newer versions of scikit-learn, NumPy, and pandas.
Common errors
-
TypeError: A structured array must be used to define the response. Found array of dtype <class 'numpy.int64'>
cause The target variable `y` was passed as a simple NumPy array or pandas Series, instead of the required structured array format (e.g., `dtype=[('fstat', 'bool'), ('lenfol', 'float')])`).fixConvert your event and time data into a structured NumPy array. Example: `y_structured = np.array(list(zip(events, times)), dtype=[('event', 'bool'), ('time', 'float')])`. -
ModuleNotFoundError: No module named 'sksurv'
cause The `scikit-survival` package is not installed in the current Python environment, or the environment is not active.fixInstall the package using pip: `pip install scikit-survival`. If using a virtual environment, ensure it is activated. -
ImportError: cannot import name '...' from 'sklearn.tree._criterion'
cause This usually indicates an incompatibility between your `scikit-survival` version and your installed `scikit-learn` version. `sksurv` relies heavily on scikit-learn's internal APIs, which can change between major versions.fixCheck the `scikit-survival` release notes for the exact `scikit-learn` version range it supports. Upgrade or downgrade `scikit-learn` (and potentially `sksurv`) to a compatible version. For example, `pip install 'scikit-learn>=1.8.0,<1.9.0' scikit-survival` for v0.27.0. -
OSError: scikit-survival failed to load its C++ extension module.
cause This error occurs when the underlying C++/Cython extensions of `scikit-survival` could not be built or loaded correctly. This can happen due to missing build tools (e.g., C++ compiler), incompatible Python versions, or corrupted installations.fixEnsure you have a C++ compiler installed (e.g., Build Tools for Visual Studio on Windows, `build-essential` on Debian/Ubuntu, Xcode Command Line Tools on macOS). Reinstall `scikit-survival` with `pip install --no-cache-dir --force-reinstall scikit-survival` in a clean environment.
Warnings
- breaking Scikit-survival frequently updates its minimum required versions for core dependencies like scikit-learn, pandas, numpy, and python. Using an older `sksurv` version with a newer dependency (or vice-versa) can lead to `ImportError`, `AttributeError`, or `TypeError` due to API mismatches.
- gotcha The target variable `y` in scikit-survival models must be a structured NumPy array (or pandas DataFrame with matching dtypes) containing two fields: one boolean for the 'event' (e.g., 'fstat') and one float for the 'time' (e.g., 'lenfol'). Passing a simple NumPy array or pandas Series will raise a `TypeError`.
- breaking Version 0.24.1 restricted the `osqp` dependency to versions less than 1.0.0 (`osqp<1.0.0`). However, subsequent versions (e.g., 0.26.0 and later) explicitly support and require `osqp>=1.0.2`. Installing `osqp` with the wrong version for your `sksurv` release can lead to runtime errors when using models that rely on it.
- gotcha Some models, particularly tree-based ones like `SurvivalTree` or `RandomSurvivalForest`, gained missing value support in recent `sksurv` releases (e.g., v0.22.0 for `SurvivalTree`, v0.23.0 for `RandomSurvivalForest`) due to underlying scikit-learn updates. Older `sksurv` versions or incompatible scikit-learn versions might not handle `np.nan` values correctly.
Install
-
pip install scikit-survival
Imports
- RandomSurvivalForest
from sksurv.ensemble import RandomSurvivalForest
- CoxPHSurvivalAnalysis
from sksurv.linear_model import CoxPHSurvivalAnalysis
- load_whas500
from sksurv.datasets import load_whas500
- concordance_index_censored
from sksurv.metrics import concordance_index_censored
Quickstart
import numpy as np
from sksurv.datasets import load_whas500
from sksurv.ensemble import RandomSurvivalForest
X, y = load_whas500()
# Split data (simple for quickstart)
X_train, X_test = X.iloc[:300], X.iloc[300:]
y_train, y_test = y[:300], y[300:]
# Initialize and fit a Random Survival Forest model
rsf = RandomSurvivalForest(
n_estimators=100,
min_samples_leaf=20,
random_state=42
)
rsf.fit(X_train, y_train)
# Predict survival functions and calculate concordance index
surv_fns = rsf.predict_survival_function(X_test, return_array=True)
preds = rsf.predict(X_test)
from sksurv.metrics import concordance_index_censored
c_index = concordance_index_censored(y_test['fstat'], y_test['lenfol'], preds)[0]
print(f"Predicted survival for first test sample: {surv_fns[0, :5].round(2)}")
print(f"Concordance Index (C-index): {c_index:.3f}")