{"id":7705,"library":"scikit-survival","title":"Scikit-Survival","description":"Scikit-survival is a Python library for survival analysis built on top of scikit-learn. It provides various survival models like Cox proportional hazards, random survival forests, and gradient boosting, along with utility functions for data preparation and evaluation. The current version is 0.27.0, and it follows an active release cadence, frequently updating to support newer versions of scikit-learn, NumPy, and pandas.","status":"active","version":"0.27.0","language":"en","source_language":"en","source_url":"https://github.com/sebp/scikit-survival","tags":["survival-analysis","scikit-learn-compatible","machine-learning","healthcare","statistics"],"install":[{"cmd":"pip install scikit-survival","lang":"bash","label":"Install latest stable version"}],"dependencies":[{"reason":"Core dependency, providing base estimators and utilities.","package":"scikit-learn","optional":false},{"reason":"Used for data handling and structured array creation.","package":"pandas","optional":false},{"reason":"Fundamental package for numerical computing.","package":"numpy","optional":false},{"reason":"Used for scientific computing, particularly optimization and statistics.","package":"scipy","optional":false},{"reason":"Required by some models for quadratic programming (e.g., Coxnet).","package":"osqp","optional":false}],"imports":[{"symbol":"RandomSurvivalForest","correct":"from sksurv.ensemble import RandomSurvivalForest"},{"symbol":"CoxPHSurvivalAnalysis","correct":"from sksurv.linear_model import CoxPHSurvivalAnalysis"},{"symbol":"load_whas500","correct":"from sksurv.datasets import load_whas500"},{"symbol":"concordance_index_censored","correct":"from sksurv.metrics import concordance_index_censored"}],"quickstart":{"code":"import numpy as np\nfrom sksurv.datasets import load_whas500\nfrom sksurv.ensemble import RandomSurvivalForest\n\nX, y = load_whas500()\n\n# Split data (simple for quickstart)\nX_train, X_test = X.iloc[:300], X.iloc[300:]\ny_train, y_test = y[:300], y[300:]\n\n# Initialize and fit a Random Survival Forest model\nrsf = RandomSurvivalForest(\n    n_estimators=100,\n    min_samples_leaf=20,\n    random_state=42\n)\nrsf.fit(X_train, y_train)\n\n# Predict survival functions and calculate concordance index\nsurv_fns = rsf.predict_survival_function(X_test, return_array=True)\npreds = rsf.predict(X_test)\n\nfrom sksurv.metrics import concordance_index_censored\nc_index = concordance_index_censored(y_test['fstat'], y_test['lenfol'], preds)[0]\n\nprint(f\"Predicted survival for first test sample: {surv_fns[0, :5].round(2)}\")\nprint(f\"Concordance Index (C-index): {c_index:.3f}\")","lang":"python","description":"This quickstart loads the WHAS500 dataset, prepares it for survival analysis, trains a RandomSurvivalForest model, and demonstrates prediction of survival functions and calculation of the concordance index. Note the use of a structured NumPy array for the `y` target, which is characteristic of survival analysis in scikit-survival."},"warnings":[{"fix":"Always check the release notes for the `sksurv` version you are using or planning to use. Ensure all your dependencies (`scikit-learn`, `pandas`, `numpy`, `scipy`, `python`) meet the minimum requirements for your `sksurv` version. Upgrading `sksurv` often requires upgrading its dependencies simultaneously.","message":"Scikit-survival frequently updates its minimum required versions for core dependencies like scikit-learn, pandas, numpy, and python. Using an older `sksurv` version with a newer dependency (or vice-versa) can lead to `ImportError`, `AttributeError`, or `TypeError` due to API mismatches.","severity":"breaking","affected_versions":"All versions, specifically when updating adjacent libraries."},{"fix":"Ensure `y` is a structured array, typically created from event/time columns. Example: `y = np.array([(e, t) for e, t in zip(events, times)], dtype=[('fstat', 'bool'), ('lenfol', 'float')])` or loading from a dataset like `sksurv.datasets.load_whas500()` which provides it in the correct format.","message":"The target variable `y` in scikit-survival models must be a structured NumPy array (or pandas DataFrame with matching dtypes) containing two fields: one boolean for the 'event' (e.g., 'fstat') and one float for the 'time' (e.g., 'lenfol'). Passing a simple NumPy array or pandas Series will raise a `TypeError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For `sksurv>=0.26.0`, ensure `osqp>=1.0.2` is installed. If using `sksurv=0.24.1`, ensure `osqp<1.0.0`. It's best to let `pip` resolve dependencies or explicitly install compatible versions: `pip install 'osqp>=1.0.2'` for current `sksurv`.","message":"Version 0.24.1 restricted the `osqp` dependency to versions less than 1.0.0 (`osqp<1.0.0`). However, subsequent versions (e.g., 0.26.0 and later) explicitly support and require `osqp>=1.0.2`. Installing `osqp` with the wrong version for your `sksurv` release can lead to runtime errors when using models that rely on it.","severity":"breaking","affected_versions":"Versions 0.24.1, 0.26.0, 0.27.0 (and potentially others around this range)."},{"fix":"If working with missing values in `X`, ensure you are using `sksurv>=0.23.0` (with compatible `scikit-learn`) for robust support across tree-based models. Otherwise, explicitly handle missing values (imputation, dropping rows) before passing `X` to the model.","message":"Some models, particularly tree-based ones like `SurvivalTree` or `RandomSurvivalForest`, gained missing value support in recent `sksurv` releases (e.g., v0.22.0 for `SurvivalTree`, v0.23.0 for `RandomSurvivalForest`) due to underlying scikit-learn updates. Older `sksurv` versions or incompatible scikit-learn versions might not handle `np.nan` values correctly.","severity":"gotcha","affected_versions":"Prior to v0.23.0 for `RandomSurvivalForest`, prior to v0.22.0 for `SurvivalTree`."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Convert your event and time data into a structured NumPy array. Example: `y_structured = np.array(list(zip(events, times)), dtype=[('event', 'bool'), ('time', 'float')])`.","cause":"The target variable `y` was passed as a simple NumPy array or pandas Series, instead of the required structured array format (e.g., `dtype=[('fstat', 'bool'), ('lenfol', 'float')])`).","error":"TypeError: A structured array must be used to define the response. Found array of dtype <class 'numpy.int64'>"},{"fix":"Install the package using pip: `pip install scikit-survival`. If using a virtual environment, ensure it is activated.","cause":"The `scikit-survival` package is not installed in the current Python environment, or the environment is not active.","error":"ModuleNotFoundError: No module named 'sksurv'"},{"fix":"Check the `scikit-survival` release notes for the exact `scikit-learn` version range it supports. Upgrade or downgrade `scikit-learn` (and potentially `sksurv`) to a compatible version. For example, `pip install 'scikit-learn>=1.8.0,<1.9.0' scikit-survival` for v0.27.0.","cause":"This usually indicates an incompatibility between your `scikit-survival` version and your installed `scikit-learn` version. `sksurv` relies heavily on scikit-learn's internal APIs, which can change between major versions.","error":"ImportError: cannot import name '...' from 'sklearn.tree._criterion'"},{"fix":"Ensure you have a C++ compiler installed (e.g., Build Tools for Visual Studio on Windows, `build-essential` on Debian/Ubuntu, Xcode Command Line Tools on macOS). Reinstall `scikit-survival` with `pip install --no-cache-dir --force-reinstall scikit-survival` in a clean environment.","cause":"This error occurs when the underlying C++/Cython extensions of `scikit-survival` could not be built or loaded correctly. This can happen due to missing build tools (e.g., C++ compiler), incompatible Python versions, or corrupted installations.","error":"OSError: scikit-survival failed to load its C++ extension module."}]}