RecBole
RecBole is a unified, comprehensive, and efficient Python library for recommendation systems, built on PyTorch. It provides a wide array of state-of-the-art recommendation models and datasets, featuring standardized data processing, training, and evaluation pipelines. The library undergoes active development, with major updates and new versions typically released every few months, incorporating user feedback and architectural improvements.
Common errors
-
ModuleNotFoundError: No module named 'torch'
cause PyTorch is not installed or not correctly linked. RecBole depends heavily on PyTorch.fixInstall PyTorch: `pip install torch` for CPU, or follow the PyTorch official website for GPU-compatible installation commands (`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` for CUDA 11.8 as an example). -
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/your_dataset_name/your_dataset_name.inter'
cause The specified dataset or its interaction file is not found in the expected location. RecBole looks for datasets in a 'dataset' folder by default.fixEither ensure the dataset files (e.g., `ml-100k.inter`, `ml-100k.item`) are in `dataset/ml-100k/`, or specify `config['data_path'] = '/path/to/your/dataset'` in your configuration. -
RuntimeError: CUDA error: device-side assert triggered
cause This typically indicates an issue with GPU memory, invalid input to a CUDA kernel, or an incorrect PyTorch-CUDA installation mismatch.fixCheck GPU memory usage, reduce batch size. Ensure PyTorch and your CUDA driver are compatible. Reinstall `recbole[gpu]` and PyTorch, making sure to use the correct `pip` command for your CUDA version.
Warnings
- breaking Major architectural overhauls in versions like v1.0.0, v1.1.0, and v1.2.0 introduced significant changes to the framework, especially affecting the data module and configuration API.
- gotcha RecBole often requires specific versions of PyTorch. Installing `recbole[gpu]` ensures GPU dependencies are pulled, but manual CUDA setup and matching PyTorch-CUDA versions are critical.
- gotcha Configuring datasets and data preprocessing can be complex. Incorrect paths, invalid split ratios, or missing necessary dataset files (like `.inter`, `.item`, `.user` files) are common sources of errors.
Install
-
pip install recbole -
pip install recbole[gpu]
Imports
- run_recbole
from recbole.quick_start import run_recbole
- Config
from recbole.model.abstract_recommender import AbstractRecommender
from recbole.config import Config
Quickstart
import os
import torch
from recbole.quick_start import run_recbole
# Ensure you have a 'dataset' folder in the current directory
# and 'ml-100k' dataset downloaded/prepared, or RecBole will download it.
# Basic configuration for running a BPR model on ml-100k dataset
# Using CPU by default, or GPU if available and recbole[gpu] was installed.
config_dict = {
'model': 'BPR',
'dataset': 'ml-100k',
'eval_args': {
'split_ratio': '0.8:0.1:0.1', # Train:Valid:Test split
'group_by': 'user' # Ensure evaluation is fair per user
},
'use_gpu': torch.cuda.is_available() # Dynamically check for GPU
}
print(f"Running RecBole with config: {config_dict}")
# Run the recommendation experiment
run_recbole(config_dict=config_dict)