cuML - RAPIDS ML Algorithms

26.4.0 · active · verified Thu Apr 16

cuML is a suite of GPU-accelerated machine learning algorithms provided by the RAPIDS ecosystem, designed to be API-compatible with scikit-learn for ease of use. It leverages NVIDIA CUDA for high-performance computing on GPUs, significantly speeding up tasks like clustering, regression, classification, and dimensionality reduction. It generally follows a monthly release cadence, aligning with the broader RAPIDS release schedule. The `cuml-cu12` package specifically targets CUDA 12.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic GPU-accelerated K-Means clustering using cuML. It generates sample data on the CPU, transfers it to the GPU using CuPy, performs the clustering, and then retrieves the results back to the CPU. Ensure `cupy-cuda12x` is installed alongside `cuml-cu12`.

import cuml
import cupy as cp
import numpy as np

# For reproducibility
np.random.seed(0)

# Generate some random data on the CPU
n_samples = 1000
n_features = 2
n_clusters = 3
X_host = np.random.rand(n_samples, n_features) * 10

# Simulate clusters
for i in range(n_clusters):
    X_host[i*n_samples//n_clusters:(i+1)*n_samples//n_clusters] += i * 2

# Transfer data to GPU using CuPy
X_gpu = cp.asarray(X_host, dtype=cp.float32)

# Initialize and train KMeans on the GPU
kmeans_cuml = cuml.cluster.KMeans(n_clusters=n_clusters, random_state=0)
kmeans_cuml.fit(X_gpu)

# Get cluster centers and labels (still on GPU)
cluster_centers_gpu = kmeans_cuml.cluster_centers_
labels_gpu = kmeans_cuml.labels_

print("CuML KMeans fitted successfully.")
print(f"Cluster Centers (on GPU):\n{cluster_centers_gpu}")
print(f"First 10 Labels (on GPU): {labels_gpu[:10]}")

# Transfer results back to CPU if needed
cluster_centers_cpu = cluster_centers_gpu.get()
labels_cpu = labels_gpu.get()

print(f"Cluster Centers (on CPU):\n{cluster_centers_cpu}")
print(f"First 10 Labels (on CPU): {labels_cpu[:10]}")

view raw JSON →