{"library":"nvidia-nccl-cu13","title":"NVIDIA Collective Communication Library (NCCL) Runtime","description":"The `nvidia-nccl-cu13` package provides the NVIDIA Collective Communication Library (NCCL) runtime specific to CUDA 13.x. NCCL is a library of standard routines for inter-GPU communication, optimized for NVIDIA GPUs. It is primarily used as a backend by deep learning frameworks like PyTorch and TensorFlow for distributed training on multi-GPU systems. This package does not expose a direct Python API for end-users but provides the necessary shared libraries. It's released in conjunction with NVIDIA CUDA Toolkit versions.","language":"python","status":"active","last_verified":"Wed May 13","install":{"commands":["pip install nvidia-nccl-cu13"],"cli":null},"imports":["This package primarily provides shared library files (e.g., libnccl.so) that deep learning frameworks (like PyTorch or TensorFlow) link against. It does NOT expose a direct Python API for end-user import."],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nimport torch\nimport torch.distributed as dist\nfrom torch.nn.parallel import DistributedDataParallel as DDP\n\ndef setup(rank, world_size):\n    os.environ['MASTER_ADDR'] = os.environ.get('MASTER_ADDR', 'localhost')\n    os.environ['MASTER_PORT'] = os.environ.get('MASTER_PORT', '29500')\n    dist.init_process_group(\"nccl\", rank=rank, world_size=world_size)\n\ndef cleanup():\n    dist.destroy_process_group()\n\nclass ToyModel(torch.nn.Module):\n    def __init__(self):\n        super(ToyModel, self).__init__()\n        self.net1 = torch.nn.Linear(10, 10)\n        self.relu = torch.nn.ReLU()\n        self.net2 = torch.nn.Linear(10, 5)\n\n    def forward(self, x):\n        return self.net2(self.relu(self.net1(x)))\n\ndef demo_basic(rank, world_size):\n    print(f\"Running basic DDP example on rank {rank}.\")\n    setup(rank, world_size)\n\n    # Use a GPU if available, otherwise CPU (though NCCL requires GPUs)\n    device = torch.device(f'cuda:{rank}' if torch.cuda.is_available() else 'cpu')\n    model = ToyModel().to(device)\n    ddp_model = DDP(model, device_ids=[rank] if torch.cuda.is_available() else None)\n\n    loss_fn = torch.nn.MSELoss()\n    optimizer = torch.optim.SGD(ddp_model.parameters(), lr=0.001)\n\n    for _ in range(3):\n        inputs = torch.randn(20, 10).to(device)\n        labels = torch.randn(20, 5).to(device)\n        optimizer.zero_grad()\n        outputs = ddp_model(inputs)\n        loss = loss_fn(outputs, labels)\n        loss.backward()\n        optimizer.step()\n        if rank == 0: # Only print from rank 0 to avoid floods\n            print(f\"Rank {rank}, Loss: {loss.item():.4f}\")\n\n    cleanup()\n\nif __name__ == \"__main__\":\n    # This example requires multiple processes to run.\n    # You would typically run this using torch.distributed.launch or torchrun:\n    # python -m torch.distributed.run --nproc_per_node=2 your_script.py\n    # For a single-process 'dry run' for syntax:\n    # Note: NCCL backend will fail if not run in a multi-GPU DDP setup.\n    # world_size = 1 # For dry-run, will likely fail with NCCL backend\n    # rank = 0\n    # demo_basic(rank, world_size)\n    print(\"This script demonstrates NCCL usage via PyTorch DDP.\")\n    print(\"To run, execute with `torchrun --nproc_per_node=<num_gpus> your_script.py`\")\n    print(\"e.g., `torchrun --nproc_per_node=2 quickstart.py`\")","lang":"python","description":"This quickstart demonstrates how NCCL is implicitly used by PyTorch for distributed data parallel (DDP) training across multiple GPUs. The `nvidia-nccl-cu13` package provides the underlying `libnccl.so` library that `torch.distributed` links against when `dist.init_process_group` is called with the 'nccl' backend. The code sets up a minimal DDP training loop. You would run this script using `torchrun` (part of PyTorch) to launch multiple processes, each assigned to a GPU.","tag":null,"tag_description":null,"last_tested":"2026-04-24","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":1}]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-13","installed_version":"2.30.4","pypi_latest":"2.30.4","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":50,"avg_install_s":4.4,"avg_import_s":null,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"17.8M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.8,"import_time_s":null,"mem_mb":null,"disk_size":"255M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"19.6M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.3,"import_time_s":null,"mem_mb":null,"disk_size":"257M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"11.5M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.3,"import_time_s":null,"mem_mb":null,"disk_size":"249M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"11.2M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":3.9,"import_time_s":null,"mem_mb":null,"disk_size":"249M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"17.3M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.6,"import_time_s":null,"mem_mb":null,"disk_size":"255M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu13","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]}}