RAPIDS cuML
RAPIDS cuML (CUDA-accelerated Machine Learning) is a suite of GPU-accelerated machine learning libraries and algorithms designed to be fully compatible with scikit-learn APIs, enabling users to transition seamlessly from CPU to GPU without significant code changes. It's part of the broader RAPIDS ecosystem for data science, optimized for CUDA 12. The current version is 26.4.0, following a monthly release cadence aligned with the RAPIDS project.
Common errors
-
RuntimeError: CUDA out of memory. Tried to allocate X GiB
cause Attempting to allocate more GPU memory than is available on the device.fixReduce the size of the dataset, decrease batch sizes in iterative algorithms, or use Dask with cuML for out-of-core processing if your dataset exceeds single-GPU memory. Consider using a GPU with more VRAM. -
AttributeError: 'numpy.ndarray' object has no attribute 'get_nrows'
cause cuML estimators primarily expect `cudf.DataFrame` or `cupy.ndarray` objects as input, not `numpy.ndarray`.fixConvert your input data from NumPy arrays to cuDF DataFrames or CuPy arrays before passing them to cuML estimators, e.g., `X_gdf = cudf.DataFrame(X_numpy)`. -
No module named 'cudf'
cause The `cudf` library, which provides GPU-accelerated DataFrames and is essential for most cuML workflows, is not installed in the environment.fixInstall `cudf` explicitly with the correct CUDA version, e.g., `pip install cudf-cu12` for CUDA 12, or follow the full RAPIDS installation guide for your environment. -
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
cause The CUDA Toolkit (specifically, the CUDA Runtime library) for the expected CUDA version (e.g., v12 for `libcuml-cu12`) is either not installed or its library path is not correctly configured in `LD_LIBRARY_PATH`.fixEnsure that the appropriate CUDA Toolkit is installed for your system and that its `lib` directory is included in your `LD_LIBRARY_PATH` environment variable. Using `conda` to manage RAPIDS installations often handles this automatically.
Warnings
- breaking Sparse input validation now raises `TypeError` if sparse input is not supported by the algorithm.
- breaking `check_is_fitted` validation and `feature_names_in_` support have been added, requiring estimators to conform more strictly to scikit-learn's API.
- breaking The `handle` object has been deprecated from public APIs, affecting low-level GPU resource management.
- breaking `output_type=None` in estimator `__init__` will no longer implicitly coerce to a global `output_type` setting.
- breaking `dask` is now an optional dependency. Dask-related features will require `dask` and `distributed` to be installed explicitly.
- deprecated The `convert_to_*` methods (e.g., `convert_to_cudf`) in `cuml.ensemble` have been deprecated in favor of `as_*` methods (e.g., `as_cudf`).
Install
-
pip install libcuml-cu12 -
pip install cudf-cu12 # Recommended for data handling -
pip install dask distributed # Recommended for Dask integration
Imports
- KMeans
from cuml.cluster import KMeans
- RandomForestClassifier
from cuml.ensemble import RandomForestClassifier
- LinearRegression
from cuml.linear_model import LinearRegression
Quickstart
import cuml
import cudf
from sklearn.datasets import make_blobs
# Generate synthetic data on CPU
X, _ = make_blobs(n_samples=1000, n_features=10, centers=5, random_state=42)
# Convert to cuDF DataFrame for GPU processing
X_gdf = cudf.DataFrame(X)
# Initialize and fit a cuML KMeans model
kmeans = cuml.cluster.KMeans(n_clusters=5, random_state=42)
kmeans.fit(X_gdf)
# Predict cluster labels
labels = kmeans.predict(X_gdf)
print("Cluster labels (first 5):\n", labels.head())
print("Cluster centers (first 5 rows):\n", kmeans.cluster_centers_.head())