scikit-surprise

raw JSON →
1.1.4 verified Mon Apr 27 auth: no python

scikit-surprise (Surprise) is a Python scikit for building and analyzing recommender systems. Version 1.1.4 is the current release. It supports prediction-based and neighborhood-based collaborative filtering, matrix factorization, and evaluation metrics. Releases are infrequent (last stable was 1.1.1 in 2020, then 1.1.3/1.1.4 in 2025).

pip install scikit-surprise
error ModuleNotFoundError: No module named 'surprise'
cause Package not installed or installed under 'scikit-surprise' but imported as 'surprise'.
fix
pip install scikit-surprise. The import is 'import surprise' or 'from surprise import ...'.
error ValueError: `rating_scale` must be a tuple (low, high).
cause Reader not initialized with rating_scale when using custom dataset.
fix
reader = Reader(rating_scale=(1, 5))
error AttributeError: module 'surprise' has no attribute 'cross_validate'
cause cross_validate is in surprise.model_selection, not top-level surprise.
fix
from surprise.model_selection import cross_validate
error FileNotFoundError: [Errno 2] No such file or directory: '~/.surprise_data/ml-100k/...'
cause Built-in dataset not downloaded (network issue or missing folder).
fix
Run Dataset.load_builtin('ml-100k') with internet; or set SURPRISE_DATA_FOLDER to an existing directory.
gotcha Dataset.load_builtin() downloads data to ~/.surprise_data by default. If the disk is full or permission denied, it raises an error. Ensure sufficient space or set SURPRISE_DATA_FOLDER environment variable.
fix Set env var SURPRISE_DATA_FOLDER to a writable directory.
breaking In version 1.1.0+, the default similarity measure in KNNBasic changed from 'msd' to 'cosine'. If you rely on old behavior, specify 'msd' explicitly.
fix sim_options = {'name': 'msd', 'user_based': True}
gotcha When using custom datasets with Reader, the rating_scale must match the actual ratings. Mismatch leads to inaccurate predictions or errors.
fix Always set rating_scale=(min_rating, max_rating) in Reader.
deprecated The surprise.dump module is deprecated and may be removed in future versions. Use pickle or joblib directly on algorithm objects.
fix import joblib; joblib.dump(algo, 'model.pkl')
gotcha SVD and other algorithms require the dataset to have user and item IDs as integers or strings. If IDs are not consecutive integers, the algorithm still works but may be less efficient.
fix Use Dataset.load_from_df() with DataFrame columns: user, item, rating.
conda install -c conda-forge scikit-surprise

Loads the Movielens 100k dataset, trains SVD, and evaluates RMSE.

from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

# Load the built-in movielens dataset
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.25)

algo = SVD()
algo.fit(trainset)
predictions = algo.test(testset)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")