NVIDIA Collective Communication Library (NCCL) Runtime for CUDA 11
The `nvidia-nccl-cu11` package provides the NVIDIA Collective Communication Library (NCCL) runtime binaries specifically compiled for CUDA 11. NCCL is a high-performance library for collective communication operations (e.g., all-reduce, all-gather, broadcast) across multiple GPUs, both within a single node and across multiple nodes. It is optimized for NVIDIA GPUs and high-speed interconnects like NVLink and InfiniBand. This package primarily serves as a backend dependency for deep learning frameworks (like PyTorch, TensorFlow) and other GPU-accelerated libraries that require NCCL's capabilities for distributed computing. The current version is 2.21.5, with frequent updates corresponding to new NCCL releases and CUDA versions.
Warnings
- breaking NCCL versions are tightly coupled with CUDA toolkit versions. Installing `nvidia-nccl-cu11` requires a compatible CUDA 11.x installation. A mismatch between the installed NCCL version and the CUDA version expected by your deep learning framework can lead to runtime errors, particularly in distributed training scenarios.
- gotcha This package is a runtime library, not a direct Python API. Users often expect to `import nccl` and use its functions directly. However, `nvidia-nccl-cu11` provides the C/C++ shared library (`libnccl.so`) that other Python libraries or frameworks link against. Direct Python bindings (like NCCL4Py) are separate projects and may support different CUDA versions.
- gotcha When running multi-GPU or distributed workloads, NCCL relies on correct system configuration for GPU Direct, PCI topology, and network interfaces. Issues with BIOS settings, virtual machine/container configurations, or network setup can lead to `ncclUnhandledCudaError` or `ncclSystemError`, poor performance, or hangs.
- gotcha The `nvidia-nccl-cu11` package is specifically for CUDA 11. Trying to use it with a system configured for CUDA 12 or newer (e.g., if you have `nvidia-nccl-cu12` installed by another dependency) can lead to conflicts and runtime errors due to symbol mismatches or incompatible library versions.
- gotcha Windows support for `nvidia-nccl-cu11` can be challenging. While the PyPI page lists Windows as a supported OS, many deep learning ecosystems and NCCL's underlying C/C++ nature are primarily optimized for Linux environments, and users report installation difficulties.
Install
-
pip install nvidia-nccl-cu11
Imports
- NCCL functions via framework
import torch; torch.cuda.nccl.version()
- NCCL4Py (for direct API access)
from nccl.core import Communicator
Quickstart
import torch
if torch.cuda.is_available():
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if hasattr(torch.cuda, 'nccl'):
print(f"NCCL version (via PyTorch): {torch.cuda.nccl.version()}")
else:
print("PyTorch's CUDA backend does not expose NCCL version directly, or NCCL not linked.")
else:
print("CUDA is not available. NCCL requires NVIDIA GPUs and CUDA.")