NVIDIA NVSHMEM (nvshmem4py) - CUDA 12
NVIDIA NVSHMEM is an implementation of the OpenSHMEM specification for NVIDIA GPUs, providing a Partitioned Global Address Space (PGAS) for efficient and scalable communication in GPU clusters. The `nvidia-nvshmem-cu12` package provides the official Python bindings (NVSHMEM4Py) for CUDA 12.x compatible environments, enabling Python applications to leverage NVSHMEM's high-performance communication model. The current version is 3.6.5, with releases typically occurring several times a year to align with NVSHMEM and CUDA toolkit updates.
Warnings
- breaking Internal layout changes in RC-connected Queue Pairs (QPs) starting in NVSHMEM 3.5.19 caused ABI compatibility breakage when enabling InfiniBand GPUDirect Async (IBGDA). This affects custom builds or specific configurations leveraging IBGDA.
- gotcha NVSHMEM (including its device-side APIs) and libraries that utilize NVSHMEM can typically only be built and linked as static libraries. This is due to limitations in how CUDA device symbols are linked across shared libraries, which is not supported.
- gotcha Prior to CUDA driver versions 460.106.00 (or later 470+), NVSHMEM might not be able to allocate the complete device memory due to issues with reusing BAR1 space. This can lead to memory allocation failures or unexpected behavior.
- gotcha NVSHMEM is not officially supported in virtualized environments (VMs). Using it in such environments may lead to unexpected behavior, performance degradation, or outright failures.
- gotcha When `pip install nvidia-nvshmem-cu12` needs to compile Cython source code (e.g., if a pre-built wheel is not available), the CUDA runtime API headers must be accessible in the compiler's include path. Failure to do so results in compilation errors like 'Failed building wheel for nvshmem4py-cu12'.
- deprecated Support for the active set-based collectives interface in OpenSHMEM has been removed. Older applications relying on this interface will no longer function as expected.
Install
-
pip install nvidia-nvshmem-cu12
Imports
- nvshmem
import nvshmem.core as nvshmem
Quickstart
import nvshmem.core as nvshmem
import os
def main():
# Initialize NVSHMEM. This is a collective operation.
# In a real scenario, this script would be launched with `mpiexec` or `nvshmrun`.
nvshmem.init()
# Query PE information
my_pe = nvshmem.my_pe()
n_pes = nvshmem.n_pes()
print(f"Hello from PE {my_pe} of {n_pes}")
# Perform some simple collective (e.g., a barrier)
# This ensures all PEs reach this point before proceeding
nvshmem.barrier_all()
# Finalize NVSHMEM. This is also a collective operation.
nvshmem.finalize()
if __name__ == '__main__':
# Note: This script needs to be run using an MPI launcher (e.g., mpiexec -n 2 python your_script.py)
# or NVSHMEM's own launcher (nvshmrun). Running directly 'python your_script.py'
# will result in an error or hang if NVSHMEM expects multiple processes.
try:
main()
except Exception as e:
# Catch potential errors if not launched collectively, for a more graceful exit
print(f"Error: {e}")
print("Please ensure the script is launched collectively, e.g., 'mpiexec -n 2 python quickstart.py'")