Scikit-Survival

0.27.0 · active · verified Thu Apr 16

Scikit-survival is a Python library for survival analysis built on top of scikit-learn. It provides various survival models like Cox proportional hazards, random survival forests, and gradient boosting, along with utility functions for data preparation and evaluation. The current version is 0.27.0, and it follows an active release cadence, frequently updating to support newer versions of scikit-learn, NumPy, and pandas.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart loads the WHAS500 dataset, prepares it for survival analysis, trains a RandomSurvivalForest model, and demonstrates prediction of survival functions and calculation of the concordance index. Note the use of a structured NumPy array for the `y` target, which is characteristic of survival analysis in scikit-survival.

import numpy as np
from sksurv.datasets import load_whas500
from sksurv.ensemble import RandomSurvivalForest

X, y = load_whas500()

# Split data (simple for quickstart)
X_train, X_test = X.iloc[:300], X.iloc[300:]
y_train, y_test = y[:300], y[300:]

# Initialize and fit a Random Survival Forest model
rsf = RandomSurvivalForest(
    n_estimators=100,
    min_samples_leaf=20,
    random_state=42
)
rsf.fit(X_train, y_train)

# Predict survival functions and calculate concordance index
surv_fns = rsf.predict_survival_function(X_test, return_array=True)
preds = rsf.predict(X_test)

from sksurv.metrics import concordance_index_censored
c_index = concordance_index_censored(y_test['fstat'], y_test['lenfol'], preds)[0]

print(f"Predicted survival for first test sample: {surv_fns[0, :5].round(2)}")
print(f"Concordance Index (C-index): {c_index:.3f}")

view raw JSON →