scikit-surprise
raw JSON → 1.1.4 verified Mon Apr 27 auth: no python
scikit-surprise (Surprise) is a Python scikit for building and analyzing recommender systems. Version 1.1.4 is the current release. It supports prediction-based and neighborhood-based collaborative filtering, matrix factorization, and evaluation metrics. Releases are infrequent (last stable was 1.1.1 in 2020, then 1.1.3/1.1.4 in 2025).
pip install scikit-surprise Common errors
error ModuleNotFoundError: No module named 'surprise' ↓
cause Package not installed or installed under 'scikit-surprise' but imported as 'surprise'.
fix
pip install scikit-surprise. The import is 'import surprise' or 'from surprise import ...'.
error ValueError: `rating_scale` must be a tuple (low, high). ↓
cause Reader not initialized with rating_scale when using custom dataset.
fix
reader = Reader(rating_scale=(1, 5))
error AttributeError: module 'surprise' has no attribute 'cross_validate' ↓
cause cross_validate is in surprise.model_selection, not top-level surprise.
fix
from surprise.model_selection import cross_validate
error FileNotFoundError: [Errno 2] No such file or directory: '~/.surprise_data/ml-100k/...' ↓
cause Built-in dataset not downloaded (network issue or missing folder).
fix
Run Dataset.load_builtin('ml-100k') with internet; or set SURPRISE_DATA_FOLDER to an existing directory.
Warnings
gotcha Dataset.load_builtin() downloads data to ~/.surprise_data by default. If the disk is full or permission denied, it raises an error. Ensure sufficient space or set SURPRISE_DATA_FOLDER environment variable. ↓
fix Set env var SURPRISE_DATA_FOLDER to a writable directory.
breaking In version 1.1.0+, the default similarity measure in KNNBasic changed from 'msd' to 'cosine'. If you rely on old behavior, specify 'msd' explicitly. ↓
fix sim_options = {'name': 'msd', 'user_based': True}
gotcha When using custom datasets with Reader, the rating_scale must match the actual ratings. Mismatch leads to inaccurate predictions or errors. ↓
fix Always set rating_scale=(min_rating, max_rating) in Reader.
deprecated The surprise.dump module is deprecated and may be removed in future versions. Use pickle or joblib directly on algorithm objects. ↓
fix import joblib; joblib.dump(algo, 'model.pkl')
gotcha SVD and other algorithms require the dataset to have user and item IDs as integers or strings. If IDs are not consecutive integers, the algorithm still works but may be less efficient. ↓
fix Use Dataset.load_from_df() with DataFrame columns: user, item, rating.
Install
conda install -c conda-forge scikit-surprise Imports
- Dataset
from surprise import Dataset - Reader
from surprise import Reader - SVD
from surprise import SVD - accuracy
from surprise import accuracy - cross_validate wrong
from surprise.model_selection import cross_validatecorrectfrom surprise.model_selection import cross_validate - GridSearchCV wrong
from surprise.model_selection import GridSearchCVcorrectfrom surprise.model_selection import GridSearchCV
Quickstart
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split
# Load the built-in movielens dataset
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.25)
algo = SVD()
algo.fit(trainset)
predictions = algo.test(testset)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")