{"id":667,"library":"nvidia-nccl-cu12","title":"NVIDIA Collective Communication Library (NCCL) Runtime for CUDA 12","description":"nvidia-nccl-cu12 (version 2.29.7) is the Python package providing the NVIDIA Collective Communication Library (NCCL) runtime specifically built for CUDA 12.x. NCCL is a foundational library for high-performance inter-GPU and inter-node communication primitives, such as all-reduce, all-gather, broadcast, and point-to-point operations, crucial for accelerating distributed deep learning workloads. It features a rapid release cadence, often synchronized with CUDA toolkit and major deep learning framework updates.","status":"active","version":"2.29.7","language":"python","source_language":"en","source_url":"https://github.com/NVIDIA/nccl","tags":["GPU","CUDA","Deep Learning","Distributed Training","Collective Communication","NVIDIA"],"install":[{"cmd":"pip install nvidia-nccl-cu12","lang":"bash","label":"Default Install"},{"cmd":"pip install \"nccl4py[cu12]\" # Official Python bindings","lang":"bash","label":"With NCCL4Py Bindings"}],"dependencies":[{"reason":"Often used implicitly or explicitly for CUDA Python bindings, especially with nccl4py.","package":"cuda-python","optional":false},{"reason":"Commonly used as a backend for PyTorch's distributed training module (torch.distributed).","package":"torch","optional":true},{"reason":"Commonly used as a backend for TensorFlow's distributed strategies (tf.distribute).","package":"tensorflow","optional":true}],"imports":[{"note":"Primary high-level API for official nccl4py bindings.","symbol":"NcclCommunicator","correct":"from nccl.core import NcclCommunicator"},{"note":"Low-level API for official nccl4py bindings, less common for direct user interaction.","symbol":"lib","correct":"from nccl.bindings import lib"},{"note":"nvidia-nccl-cu12 is the runtime. Python users typically interact via higher-level frameworks like PyTorch's distributed module, or dedicated bindings like nccl4py.","wrong":"import nccl # Directly importing nvidia-nccl-cu12 for API calls","symbol":"dist","correct":"import torch.distributed as dist"},{"note":"nvidia-nccl-cu12 is the runtime. TensorFlow utilizes NCCL as a backend for its distribution strategies.","wrong":"import nccl # Directly importing nvidia-nccl-cu12 for API calls","symbol":"tf.distribute.NcclAllReduce","correct":"import tensorflow as tf\nstrategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce())"}],"quickstart":{"code":"import os\nimport torch\nimport torch.distributed as dist\n\n# This quickstart assumes a multi-process setup, typically launched\n# via torch.distributed.launch or mpirun, where each process\n# runs this script with a unique rank and world_size.\n\n# Example environment variables (set by launch utility):\n# os.environ['MASTER_ADDR'] = os.environ.get('MASTER_ADDR', 'localhost')\n# os.environ['MASTER_PORT'] = os.environ.get('MASTER_PORT', '29500')\n# os.environ['RANK'] = os.environ.get('RANK', '0')\n# os.environ['WORLD_SIZE'] = os.environ.get('WORLD_SIZE', '1')\n\ndef run_distributed_example(rank, world_size):\n    # Initialize the process group with NCCL backend\n    print(f\"Initializing process group for rank {rank}/{world_size-1}...\")\n    dist.init_process_group(backend='nccl', rank=rank, world_size=world_size)\n    print(f\"Process group initialized on rank {rank}.\")\n\n    # Set device for the current process\n    torch.cuda.set_device(rank)\n\n    # Create a tensor on the GPU\n    tensor = torch.ones(10, device=f'cuda:{rank}') * (rank + 1)\n    print(f\"Rank {rank}: Initial tensor value: {tensor}\")\n\n    # Perform an all_reduce operation (summing tensors across all GPUs)\n    dist.all_reduce(tensor, op=dist.ReduceOp.SUM)\n\n    print(f\"Rank {rank}: Tensor after all_reduce: {tensor}\")\n\n    # Clean up the process group\n    dist.destroy_process_group()\n    print(f\"Rank {rank}: Process group destroyed.\")\n\n# To run this, you would typically use:\n# python -m torch.distributed.launch --nproc_per_node=2 your_script.py\n# Or set environment variables and run:\n# MASTER_ADDR=localhost MASTER_PORT=29500 RANK=0 WORLD_SIZE=2 python your_script.py\n# MASTER_ADDR=localhost MASTER_PORT=29500 RANK=1 WORLD_SIZE=2 python your_script.py\n\n# For simplicity, if running as a single process for structural check:\nif __name__ == '__main__':\n    # In a real scenario, rank and world_size would be provided by a launcher.\n    # This block is for structural demonstration only and will not perform\n    # actual distributed communication without a proper launcher.\n    try:\n        rank = int(os.environ.get('RANK', '0'))\n        world_size = int(os.environ.get('WORLD_SIZE', '1'))\n        if torch.cuda.is_available() and world_size > 0:\n             run_distributed_example(rank, world_size)\n        else:\n             print(\"CUDA not available or world_size is 0. Cannot run distributed example.\")\n    except RuntimeError as e:\n        print(f\"Error initializing distributed environment: {e}. This often happens if not run with a proper distributed launcher like torch.distributed.launch.\")\n","lang":"python","description":"This quickstart demonstrates how NCCL is typically used indirectly via PyTorch's `torch.distributed` module for multi-GPU collective communication, specifically an `all_reduce` operation. NCCL provides the underlying high-performance backend. A proper distributed launcher (e.g., `torch.distributed.launch` or `mpirun`) is required to run this code across multiple processes/GPUs. For direct Python bindings, consider `nccl4py` for explicit NCCL API calls."},"warnings":[{"fix":"Ensure that the `nvidia-nccl-cu12` package, your system's CUDA Toolkit, and the CUDA version used by your deep learning framework are all compatible. Consult the NVIDIA documentation or framework-specific guides for compatibility matrices. For PyTorch, `torch.cuda.is_available()` and `torch.version.cuda` can help verify. For `nccl4py`, use `pip install \"nccl4py[cu12]\"` to ensure correct CUDA 12 support.","message":"NCCL versions are tightly coupled with CUDA Toolkit versions and the CUDA version used to compile deep learning frameworks (like PyTorch or TensorFlow). Mismatches can lead to runtime errors, silent performance degradation, or unexpected behavior.","severity":"breaking","affected_versions":"All versions"},{"fix":"To use NCCL directly from Python, install and import `nccl4py`. If using with a deep learning framework, configure its distributed module to use the NCCL backend. Avoid `import nccl` for direct API calls, as this package is a runtime provider.","message":"The `nvidia-nccl-cu12` package itself primarily provides the `libnccl.so` shared library. Direct Python API calls are not exposed through this package. Instead, Python users interact with NCCL through higher-level libraries like `nccl4py` (official bindings) or as a backend to distributed training modules in frameworks like PyTorch (`torch.distributed`) or TensorFlow (`tf.distribute`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Prefer using `nvidia-nccl-cu12` installed via pip for consistency within Python environments. If system-wide NCCL is necessary, carefully manage `LD_LIBRARY_PATH` to ensure the correct `libnccl.so` is prioritized. Frameworks like PyTorch often statically link NCCL, mitigating some of these issues, but custom builds might need `USE_SYSTEM_NCCL` flags.","message":"Conflicts can arise if multiple NCCL installations are present on the system (e.g., `nvidia-nccl-cu12` from PyPI, a system-wide `apt`/`dnf` installed NCCL, or one bundled with a deep learning framework). The linker's search path (`LD_LIBRARY_PATH`) can affect which `libnccl.so` is loaded, potentially leading to incorrect versions being used.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify the availability of `nccl4py[cu12]` for your specific Python version and OS on PyPI or the official `nccl4py` documentation. If pre-built wheels are not available, you might need to compile `nccl4py` from source (which requires a CUDA Toolkit installation and potentially other build dependencies) or consider using a deep learning framework's distributed module, which often bundles NCCL or manages its own bindings.","message":"The `nccl4py[cu12]` package, while recommended for direct Python interaction with NCCL CUDA 12, may not always have pre-built wheels available for all Python versions, operating systems, or architectures on PyPI. This can lead to `ERROR: Could not find a version that satisfies the requirement` during installation.","severity":"breaking","affected_versions":"All versions of `nccl4py` with `[cu12]` extra"},{"fix":"To install `nvidia-nccl-cu12`, you must first install `nvidia-pyindex` to configure the NVIDIA Python Package Index, or specify the NVIDIA index URL directly. For example: `pip install nvidia-pyindex && pip install nvidia-nccl-cu12`, or `pip install --extra-index-url https://pypi.ngc.nvidia.com nvidia-nccl-cu12`.","message":"The `nvidia-nccl-cu12` package is not directly available on the default PyPI.org repository. It is hosted on the NVIDIA Python Package Index, and attempting to install it without configuring this index will result in a build error indicating it's a \"placeholder project\".","severity":"breaking","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-05-12T17:37:52.007Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Ensure your CUDA toolkit, GPU drivers, and deep learning framework (e.g., PyTorch) versions are compatible. Monitor GPU memory usage for out-of-memory issues. Rerun your application with `NCCL_DEBUG=INFO` or `NCCL_DEBUG=WARN` environment variables to get more detailed logs that can pinpoint the specific CUDA error.","cause":"This generic error often indicates an underlying issue with CUDA or GPU, such as out-of-memory errors, incompatible CUDA/driver/framework versions, or a temporary hardware problem during distributed training.","error":"RuntimeError: NCCL Error 1: unhandled cuda error"},{"fix":"Set the environment variable `NCCL_SHM_DISABLE=1` to prevent NCCL from using shared memory. Verify that `/dev/shm` is correctly mounted and has sufficient space, especially in containerized environments. For InfiniBand, check network connectivity and drivers.","cause":"This error typically arises from misconfigurations related to system resources, particularly Linux shared memory (e.g., `/dev/shm`) used for inter-process communication, or issues with InfiniBand network setup.","error":"RuntimeError: NCCL Error 2: unhandled system error"},{"fix":"Check the available versions on PyPI (`pypi.org/project/nvidia-nccl-cu12/#files`) for your specific environment (Python version, OS, architecture). Ensure your deep learning framework's CUDA version is compatible with the `nvidia-nccl-cu12` package you are trying to install. If using a framework like PyTorch or TensorFlow, sometimes they manage NCCL internally, and explicit installation of `nvidia-nccl-cu12` might not be necessary or can cause conflicts.","cause":"This installation error occurs when the specified version of `nvidia-nccl-cu12` is not available for your current Python version, operating system, or architecture on PyPI, or if there's a conflict with another library's NCCL dependency.","error":"ERROR: Could not find a version that satisfies the requirement nvidia-nccl-cu12==X.Y.Z (from versions: ...)"},{"fix":"Reinstall PyTorch making sure to specify a CUDA-enabled version, for example: `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121`. Ensure that the `nvidia-nccl-cu12` package or the system-wide NCCL library is correctly installed and accessible in your environment's `LD_LIBRARY_PATH`.","cause":"This PyTorch-specific error indicates that your PyTorch installation was not compiled with NCCL support, or the NCCL library cannot be found or loaded by PyTorch at runtime.","error":"RuntimeError: Distributed package doesn't have NCCL built in"}],"ecosystem":"pypi","meta_description":null,"install_score":23,"install_tag":"stale","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":"2.30.4","install_checks":{"last_tested":"2026-05-12","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":7.8,"import_time_s":null,"mem_mb":null,"disk_size":"412M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":7.6,"import_time_s":null,"mem_mb":null,"disk_size":"414M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":7.4,"import_time_s":null,"mem_mb":null,"disk_size":"406M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":7.1,"import_time_s":null,"mem_mb":null,"disk_size":"406M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":8.1,"import_time_s":null,"mem_mb":null,"disk_size":"412M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]},"quickstart_checks":{"last_tested":"2026-04-24","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":-1}]}}