{"id":4142,"library":"nvidia-nccl-cu11","title":"NVIDIA Collective Communication Library (NCCL) Runtime for CUDA 11","description":"The `nvidia-nccl-cu11` package provides the NVIDIA Collective Communication Library (NCCL) runtime binaries specifically compiled for CUDA 11. NCCL is a high-performance library for collective communication operations (e.g., all-reduce, all-gather, broadcast) across multiple GPUs, both within a single node and across multiple nodes. It is optimized for NVIDIA GPUs and high-speed interconnects like NVLink and InfiniBand. This package primarily serves as a backend dependency for deep learning frameworks (like PyTorch, TensorFlow) and other GPU-accelerated libraries that require NCCL's capabilities for distributed computing. The current version is 2.21.5, with frequent updates corresponding to new NCCL releases and CUDA versions.","status":"active","version":"2.21.5","language":"en","source_language":"en","source_url":"https://github.com/NVIDIA/nccl","tags":["GPU","deep learning","distributed computing","NVIDIA","CUDA","collective communication"],"install":[{"cmd":"pip install nvidia-nccl-cu11","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"NCCL is a CUDA-dependent library; requires a compatible CUDA installation on the system.","package":"CUDA Toolkit (runtime)","optional":false},{"reason":"This package is typically consumed by deep learning frameworks for distributed training; direct Python API usage is rare and requires separate bindings.","package":"PyTorch or TensorFlow (or similar DL framework)","optional":true}],"imports":[{"note":"The `nvidia-nccl-cu11` package does not expose a direct Python API for NCCL functions. Instead, it provides the underlying C/C++ library that deep learning frameworks (like PyTorch) or dedicated Python bindings (e.g., NCCL4Py, pynccl) interface with. Users typically interact with NCCL indirectly through these frameworks' distributed training modules.","symbol":"NCCL functions via framework","correct":"import torch; torch.cuda.nccl.version()"},{"note":"While `nvidia-nccl-cu11` provides the runtime, official Python bindings for NCCL (like NCCL4Py) are separate packages. Note that NCCL4Py explicitly lists `nvidia-nccl-cu12` or `nvidia-nccl-cu13` as dependencies, meaning it's tied to newer CUDA versions than this `cu11` runtime.","wrong":"import nccl","symbol":"NCCL4Py (for direct API access)","correct":"from nccl.core import Communicator"}],"quickstart":{"code":"import torch\n\nif torch.cuda.is_available():\n    print(f\"CUDA available: {torch.cuda.is_available()}\")\n    print(f\"CUDA version: {torch.version.cuda}\")\n    if hasattr(torch.cuda, 'nccl'):\n        print(f\"NCCL version (via PyTorch): {torch.cuda.nccl.version()}\")\n    else:\n        print(\"PyTorch's CUDA backend does not expose NCCL version directly, or NCCL not linked.\")\nelse:\n    print(\"CUDA is not available. NCCL requires NVIDIA GPUs and CUDA.\")\n","lang":"python","description":"This quickstart demonstrates how to verify that NCCL is detected and its version reported by a common deep learning framework like PyTorch. The `nvidia-nccl-cu11` package provides the backend, and frameworks then expose its capabilities. This code checks for CUDA availability and attempts to retrieve the NCCL version via PyTorch's API, which is a common way users confirm NCCL's presence and compatibility."},"warnings":[{"fix":"Ensure that your `nvidia-nccl-cu11` package version, your system's CUDA toolkit version, and your deep learning framework's CUDA compilation version are all compatible. Check the documentation for your framework to identify the required NCCL and CUDA versions. You may need to specify an exact version during installation, e.g., `pip install nvidia-nccl-cu11==X.Y.Z` or manage environments carefully with tools like Conda.","message":"NCCL versions are tightly coupled with CUDA toolkit versions. Installing `nvidia-nccl-cu11` requires a compatible CUDA 11.x installation. A mismatch between the installed NCCL version and the CUDA version expected by your deep learning framework can lead to runtime errors, particularly in distributed training scenarios.","severity":"breaking","affected_versions":"All versions"},{"fix":"If you need direct programmatic access to NCCL from Python, investigate dedicated Python bindings like `NCCL4Py` or `pynccl`, being mindful of their specific CUDA version requirements. Otherwise, leverage NCCL's capabilities through the distributed training modules of deep learning frameworks (e.g., `torch.distributed`).","message":"This package is a runtime library, not a direct Python API. Users often expect to `import nccl` and use its functions directly. However, `nvidia-nccl-cu11` provides the C/C++ shared library (`libnccl.so`) that other Python libraries or frameworks link against. Direct Python bindings (like NCCL4Py) are separate projects and may support different CUDA versions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consult the NVIDIA NCCL documentation for detailed setup and troubleshooting guides. Ensure `/sys` is properly mounted in containers/VMs, verify GPU-to-GPU communication with `p2pBandwidthLatencyTest` from CUDA samples, and set `NCCL_DEBUG=WARN` to get more explicit error messages from NCCL.","message":"When running multi-GPU or distributed workloads, NCCL relies on correct system configuration for GPU Direct, PCI topology, and network interfaces. Issues with BIOS settings, virtual machine/container configurations, or network setup can lead to `ncclUnhandledCudaError` or `ncclSystemError`, poor performance, or hangs.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Maintain consistent CUDA versions across all your deep learning ecosystem components. If you require CUDA 12+, use `nvidia-nccl-cu12`. If multiple dependencies inadvertently install different NCCL CUDA versions, environment isolation (e.g., with Conda or virtual environments) is crucial. Prioritize the NCCL version compatible with your primary deep learning framework.","message":"The `nvidia-nccl-cu11` package is specifically for CUDA 11. Trying to use it with a system configured for CUDA 12 or newer (e.g., if you have `nvidia-nccl-cu12` installed by another dependency) can lead to conflicts and runtime errors due to symbol mismatches or incompatible library versions.","severity":"gotcha","affected_versions":"All versions when used with incorrect CUDA versions"},{"fix":"For Windows, ensure you have the correct NVIDIA drivers and CUDA Toolkit installed. Refer to specific guides for installing deep learning frameworks with GPU support on Windows, as they often handle NCCL dependencies. Be prepared for potential manual library path configurations or consider using WSL2 or Docker for a more consistent Linux-like environment.","message":"Windows support for `nvidia-nccl-cu11` can be challenging. While the PyPI page lists Windows as a supported OS, many deep learning ecosystems and NCCL's underlying C/C++ nature are primarily optimized for Linux environments, and users report installation difficulties.","severity":"gotcha","affected_versions":"All versions on Windows"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}