RMM - RAPIDS Memory Manager for CUDA 12
RMM (RAPIDS Memory Manager) is a C++ and Python library for efficient GPU memory management. It provides a highly optimized allocation and deallocation framework tailored for NVIDIA GPUs, often used within the RAPIDS ecosystem to improve performance of data science workloads. The current version is 26.4.0, and it generally follows a monthly release cadence.
Common errors
-
ModuleNotFoundError: No module named 'rmm._lib'
cause Attempting to import from the internal `rmm._lib` module, which was removed in RMM v25.02.00.fixUpdate your code to use the public APIs directly from `rmm` or `rmm.mr`. For example, `rmm.DeviceBuffer` for GPU memory allocation. -
RuntimeError: RMM failure: CUDA error at /path/to/rmm/src/.../detail/aligned_allocator.hpp:145: cudaErrorNoDevice
cause This usually indicates a CUDA version mismatch. RMM v25.08.00 and later require CUDA 12.0 or newer, or there's an issue with your system's CUDA setup (e.g., `LD_LIBRARY_PATH` not correctly pointing to CUDA libraries).fixVerify your CUDA Toolkit version is 12.0 or higher. Ensure you've installed the `rmm-cu12` (or correct `rmm-cuXX` variant) matching your CUDA installation. Check your `LD_LIBRARY_PATH` and `PATH` environment variables. -
AttributeError: module 'rmm.mr' has no attribute 'HostMemoryResource'
cause The `HostMemoryResource` and related host memory interfaces were removed in RMM v26.02.00.fixMigrate your code to use `cuda_async_memory_resource` or other device memory resources for managing GPU-accessible memory, as host memory resources are no longer part of RMM.
Warnings
- breaking RMM requires CUDA 12.0+ starting from v25.08.00. Using RMM with older CUDA Toolkits (e.g., 11.x) will lead to runtime errors or compilation failures.
- breaking The internal `rmm._lib` module was removed. Direct imports from this module are no longer supported.
- breaking The Python/Cython `memory_resource` interface underwent a significant refactor, affecting how custom or experimental memory resources are defined and used.
- breaking Host memory resources (`HostMemoryResource`) and related interfaces were removed, including the legacy memory resource interface in favor of the CCCL interface.
- breaking Zero-value special casing was removed in `set_element_async` to preserve IEEE 754 -0.0.
Install
-
pip install rmm-cu12
Imports
- rmm
import rmm
- PoolMemoryResource
from rmm.mr import PoolMemoryResource
- CudaMemoryResource
from rmm.mr import CudaMemoryResource
- _lib
from rmm._lib import ...
No direct import; use public rmm or rmm.mr APIs.
Quickstart
import rmm
from rmm.mr import PoolMemoryResource, CudaMemoryResource, set_current_device_resource
# Create an upstream resource (e.g., CudaMemoryResource) for the pool
upstream = CudaMemoryResource()
# Create a PoolMemoryResource with an initial size and an optional maximum size
initial_pool_size = 128 * 1024 * 1024 # 128 MiB
maximum_pool_size = 1024 * 1024 * 1024 # 1 GiB
pool_mr = PoolMemoryResource(
upstream=upstream,
initial_pool_size=initial_pool_size,
maximum_pool_size=maximum_pool_size
)
# Set the default RMM memory resource for the current device
set_current_device_resource(pool_mr)
print(f"RMM current device resource: {rmm.mr.get_current_device_resource()}")
# Allocate a DeviceBuffer using the default RMM memory resource
# This buffer resides on the GPU
buffer_size = 64 * 1024 * 1024 # 64 MiB
device_buffer = rmm.DeviceBuffer(size=buffer_size)
print(f"Successfully allocated rmm.DeviceBuffer of {device_buffer.size / (1024*1024):.2f} MB on GPU.")
# Memory is automatically freed when device_buffer goes out of scope or program exits.