NVIDIA NVSHMEM (nvshmem4py) - CUDA 12

3.6.5 · active · verified Tue Mar 31

NVIDIA NVSHMEM is an implementation of the OpenSHMEM specification for NVIDIA GPUs, providing a Partitioned Global Address Space (PGAS) for efficient and scalable communication in GPU clusters. The `nvidia-nvshmem-cu12` package provides the official Python bindings (NVSHMEM4Py) for CUDA 12.x compatible environments, enabling Python applications to leverage NVSHMEM's high-performance communication model. The current version is 3.6.5, with releases typically occurring several times a year to align with NVSHMEM and CUDA toolkit updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the basic initialization and finalization of NVSHMEM within a Python program using `nvshmem4py`. It also shows how to query the current PE (processing element) ID and the total number of PEs. Note that NVSHMEM operations are collective and require the script to be launched via a parallel environment, such as `mpiexec` (from MPI) or `nvshmrun` (provided with NVSHMEM), to correctly allocate and coordinate multiple PEs across GPUs. Running the script directly with `python` will lead to an error if not launched collectively.

import nvshmem.core as nvshmem
import os

def main():
    # Initialize NVSHMEM. This is a collective operation.
    # In a real scenario, this script would be launched with `mpiexec` or `nvshmrun`.
    nvshmem.init()

    # Query PE information
    my_pe = nvshmem.my_pe()
    n_pes = nvshmem.n_pes()

    print(f"Hello from PE {my_pe} of {n_pes}")

    # Perform some simple collective (e.g., a barrier)
    # This ensures all PEs reach this point before proceeding
    nvshmem.barrier_all()

    # Finalize NVSHMEM. This is also a collective operation.
    nvshmem.finalize()

if __name__ == '__main__':
    # Note: This script needs to be run using an MPI launcher (e.g., mpiexec -n 2 python your_script.py)
    # or NVSHMEM's own launcher (nvshmrun). Running directly 'python your_script.py'
    # will result in an error or hang if NVSHMEM expects multiple processes.
    try:
        main()
    except Exception as e:
        # Catch potential errors if not launched collectively, for a more graceful exit
        print(f"Error: {e}")
        print("Please ensure the script is launched collectively, e.g., 'mpiexec -n 2 python quickstart.py'")

view raw JSON →