{"id":9075,"library":"libcuml-cu12","title":"RAPIDS cuML","description":"RAPIDS cuML (CUDA-accelerated Machine Learning) is a suite of GPU-accelerated machine learning libraries and algorithms designed to be fully compatible with scikit-learn APIs, enabling users to transition seamlessly from CPU to GPU without significant code changes. It's part of the broader RAPIDS ecosystem for data science, optimized for CUDA 12. The current version is 26.4.0, following a monthly release cadence aligned with the RAPIDS project.","status":"active","version":"26.4.0","language":"en","source_language":"en","source_url":"https://github.com/rapidsai/cuml","tags":["GPU","Machine Learning","RAPIDS","CUDA","Scikit-learn API"],"install":[{"cmd":"pip install libcuml-cu12","lang":"bash","label":"Install core cuML for CUDA 12"},{"cmd":"pip install cudf-cu12 # Recommended for data handling","lang":"bash","label":"Install cuDF (GPU DataFrames)"},{"cmd":"pip install dask distributed # Recommended for Dask integration","lang":"bash","label":"Install Dask for distributed computing"}],"dependencies":[{"reason":"Required for CUDA interoperability.","package":"cuda-python","optional":false},{"reason":"General numerical computations.","package":"numpy","optional":false},{"reason":"For NVIDIA Management Library interaction.","package":"pynvml","optional":false},{"reason":"Utilities for Dask integration.","package":"rapids-dask-tools","optional":false},{"reason":"For API compatibility and sometimes used in conjunction.","package":"scikit-learn","optional":false},{"reason":"Scientific computing utilities.","package":"scipy","optional":false},{"reason":"Practically essential for GPU DataFrame operations, which are the common input format for cuML.","package":"cudf-cu12","optional":true},{"reason":"For distributed computing capabilities.","package":"dask","optional":true},{"reason":"Dask scheduler and worker components for distributed computing.","package":"distributed","optional":true}],"imports":[{"symbol":"KMeans","correct":"from cuml.cluster import KMeans"},{"symbol":"RandomForestClassifier","correct":"from cuml.ensemble import RandomForestClassifier"},{"symbol":"LinearRegression","correct":"from cuml.linear_model import LinearRegression"}],"quickstart":{"code":"import cuml\nimport cudf\nfrom sklearn.datasets import make_blobs\n\n# Generate synthetic data on CPU\nX, _ = make_blobs(n_samples=1000, n_features=10, centers=5, random_state=42)\n\n# Convert to cuDF DataFrame for GPU processing\nX_gdf = cudf.DataFrame(X)\n\n# Initialize and fit a cuML KMeans model\nkmeans = cuml.cluster.KMeans(n_clusters=5, random_state=42)\nkmeans.fit(X_gdf)\n\n# Predict cluster labels\nlabels = kmeans.predict(X_gdf)\n\nprint(\"Cluster labels (first 5):\\n\", labels.head())\nprint(\"Cluster centers (first 5 rows):\\n\", kmeans.cluster_centers_.head())","lang":"python","description":"This example demonstrates how to perform k-means clustering using cuML. It generates synthetic data with scikit-learn, converts it to a cuDF DataFrame for GPU processing, and then fits a KMeans model to find clusters. It requires `cudf` and `scikit-learn`."},"warnings":[{"fix":"Ensure that sparse input data is only provided to cuML algorithms that explicitly support it, or convert sparse data to a dense format before passing it to unsupported algorithms.","message":"Sparse input validation now raises `TypeError` if sparse input is not supported by the algorithm.","severity":"breaking","affected_versions":">=26.04.00"},{"fix":"If using custom or wrapped estimators, ensure they implement `_validate_data` and set `feature_names_in_` after fitting, consistent with scikit-learn's guidelines.","message":"`check_is_fitted` validation and `feature_names_in_` support have been added, requiring estimators to conform more strictly to scikit-learn's API.","severity":"breaking","affected_versions":">=26.04.00"},{"fix":"Review existing code for direct usage of `handle` objects and refactor to use higher-level cuML APIs. cuML now manages GPU resources internally for most use cases.","message":"The `handle` object has been deprecated from public APIs, affecting low-level GPU resource management.","severity":"breaking","affected_versions":">=26.02.00"},{"fix":"Explicitly specify `output_type` in the constructor for cuML estimators if you rely on a particular output type (e.g., `output_type='cudf'`).","message":"`output_type=None` in estimator `__init__` will no longer implicitly coerce to a global `output_type` setting.","severity":"breaking","affected_versions":">=26.02.00"},{"fix":"If using cuML with Dask, ensure you install `dask` and `distributed` separately (e.g., `pip install dask distributed`).","message":"`dask` is now an optional dependency. Dask-related features will require `dask` and `distributed` to be installed explicitly.","severity":"breaking","affected_versions":">=25.12.00"},{"fix":"Update method calls to use the `as_*` prefix (e.g., `model.as_cudf(X)` instead of `model.convert_to_cudf(X)`).","message":"The `convert_to_*` methods (e.g., `convert_to_cudf`) in `cuml.ensemble` have been deprecated in favor of `as_*` methods (e.g., `as_cudf`).","severity":"deprecated","affected_versions":">=25.10.00"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Reduce the size of the dataset, decrease batch sizes in iterative algorithms, or use Dask with cuML for out-of-core processing if your dataset exceeds single-GPU memory. Consider using a GPU with more VRAM.","cause":"Attempting to allocate more GPU memory than is available on the device.","error":"RuntimeError: CUDA out of memory. Tried to allocate X GiB"},{"fix":"Convert your input data from NumPy arrays to cuDF DataFrames or CuPy arrays before passing them to cuML estimators, e.g., `X_gdf = cudf.DataFrame(X_numpy)`.","cause":"cuML estimators primarily expect `cudf.DataFrame` or `cupy.ndarray` objects as input, not `numpy.ndarray`.","error":"AttributeError: 'numpy.ndarray' object has no attribute 'get_nrows'"},{"fix":"Install `cudf` explicitly with the correct CUDA version, e.g., `pip install cudf-cu12` for CUDA 12, or follow the full RAPIDS installation guide for your environment.","cause":"The `cudf` library, which provides GPU-accelerated DataFrames and is essential for most cuML workflows, is not installed in the environment.","error":"No module named 'cudf'"},{"fix":"Ensure that the appropriate CUDA Toolkit is installed for your system and that its `lib` directory is included in your `LD_LIBRARY_PATH` environment variable. Using `conda` to manage RAPIDS installations often handles this automatically.","cause":"The CUDA Toolkit (specifically, the CUDA Runtime library) for the expected CUDA version (e.g., v12 for `libcuml-cu12`) is either not installed or its library path is not correctly configured in `LD_LIBRARY_PATH`.","error":"ImportError: libcudart.so.12: cannot open shared object file: No such file or directory"}]}