NVIDIA Collective Communication Library (NCCL) Runtime for CUDA 11

2.21.5 · active · verified Sat Apr 11

The `nvidia-nccl-cu11` package provides the NVIDIA Collective Communication Library (NCCL) runtime binaries specifically compiled for CUDA 11. NCCL is a high-performance library for collective communication operations (e.g., all-reduce, all-gather, broadcast) across multiple GPUs, both within a single node and across multiple nodes. It is optimized for NVIDIA GPUs and high-speed interconnects like NVLink and InfiniBand. This package primarily serves as a backend dependency for deep learning frameworks (like PyTorch, TensorFlow) and other GPU-accelerated libraries that require NCCL's capabilities for distributed computing. The current version is 2.21.5, with frequent updates corresponding to new NCCL releases and CUDA versions.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to verify that NCCL is detected and its version reported by a common deep learning framework like PyTorch. The `nvidia-nccl-cu11` package provides the backend, and frameworks then expose its capabilities. This code checks for CUDA availability and attempts to retrieve the NCCL version via PyTorch's API, which is a common way users confirm NCCL's presence and compatibility.

import torch

if torch.cuda.is_available():
    print(f"CUDA available: {torch.cuda.is_available()}")
    print(f"CUDA version: {torch.version.cuda}")
    if hasattr(torch.cuda, 'nccl'):
        print(f"NCCL version (via PyTorch): {torch.cuda.nccl.version()}")
    else:
        print("PyTorch's CUDA backend does not expose NCCL version directly, or NCCL not linked.")
else:
    print("CUDA is not available. NCCL requires NVIDIA GPUs and CUDA.")

view raw JSON →