{"library":"nvidia-nccl-cu12","title":"NVIDIA Collective Communication Library (NCCL) Runtime for CUDA 12","description":"nvidia-nccl-cu12 (version 2.29.7) is the Python package providing the NVIDIA Collective Communication Library (NCCL) runtime specifically built for CUDA 12.x. NCCL is a foundational library for high-performance inter-GPU and inter-node communication primitives, such as all-reduce, all-gather, broadcast, and point-to-point operations, crucial for accelerating distributed deep learning workloads. It features a rapid release cadence, often synchronized with CUDA toolkit and major deep learning framework updates.","language":"python","status":"active","last_verified":"Tue May 12","install":{"commands":["pip install nvidia-nccl-cu12","pip install \"nccl4py[cu12]\" # Official Python bindings"],"cli":null},"imports":["from nccl.core import NcclCommunicator","from nccl.bindings import lib","import torch.distributed as dist","import tensorflow as tf\nstrategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce())"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nimport torch\nimport torch.distributed as dist\n\n# This quickstart assumes a multi-process setup, typically launched\n# via torch.distributed.launch or mpirun, where each process\n# runs this script with a unique rank and world_size.\n\n# Example environment variables (set by launch utility):\n# os.environ['MASTER_ADDR'] = os.environ.get('MASTER_ADDR', 'localhost')\n# os.environ['MASTER_PORT'] = os.environ.get('MASTER_PORT', '29500')\n# os.environ['RANK'] = os.environ.get('RANK', '0')\n# os.environ['WORLD_SIZE'] = os.environ.get('WORLD_SIZE', '1')\n\ndef run_distributed_example(rank, world_size):\n    # Initialize the process group with NCCL backend\n    print(f\"Initializing process group for rank {rank}/{world_size-1}...\")\n    dist.init_process_group(backend='nccl', rank=rank, world_size=world_size)\n    print(f\"Process group initialized on rank {rank}.\")\n\n    # Set device for the current process\n    torch.cuda.set_device(rank)\n\n    # Create a tensor on the GPU\n    tensor = torch.ones(10, device=f'cuda:{rank}') * (rank + 1)\n    print(f\"Rank {rank}: Initial tensor value: {tensor}\")\n\n    # Perform an all_reduce operation (summing tensors across all GPUs)\n    dist.all_reduce(tensor, op=dist.ReduceOp.SUM)\n\n    print(f\"Rank {rank}: Tensor after all_reduce: {tensor}\")\n\n    # Clean up the process group\n    dist.destroy_process_group()\n    print(f\"Rank {rank}: Process group destroyed.\")\n\n# To run this, you would typically use:\n# python -m torch.distributed.launch --nproc_per_node=2 your_script.py\n# Or set environment variables and run:\n# MASTER_ADDR=localhost MASTER_PORT=29500 RANK=0 WORLD_SIZE=2 python your_script.py\n# MASTER_ADDR=localhost MASTER_PORT=29500 RANK=1 WORLD_SIZE=2 python your_script.py\n\n# For simplicity, if running as a single process for structural check:\nif __name__ == '__main__':\n    # In a real scenario, rank and world_size would be provided by a launcher.\n    # This block is for structural demonstration only and will not perform\n    # actual distributed communication without a proper launcher.\n    try:\n        rank = int(os.environ.get('RANK', '0'))\n        world_size = int(os.environ.get('WORLD_SIZE', '1'))\n        if torch.cuda.is_available() and world_size > 0:\n             run_distributed_example(rank, world_size)\n        else:\n             print(\"CUDA not available or world_size is 0. Cannot run distributed example.\")\n    except RuntimeError as e:\n        print(f\"Error initializing distributed environment: {e}. This often happens if not run with a proper distributed launcher like torch.distributed.launch.\")\n","lang":"python","description":"This quickstart demonstrates how NCCL is typically used indirectly via PyTorch's `torch.distributed` module for multi-GPU collective communication, specifically an `all_reduce` operation. NCCL provides the underlying high-performance backend. A proper distributed launcher (e.g., `torch.distributed.launch` or `mpirun`) is required to run this code across multiple processes/GPUs. For direct Python bindings, consider `nccl4py` for explicit NCCL API calls.","tag":"stale","tag_description":"widespread failures or data too old to trust","last_tested":"2026-04-24","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":-1}]},"compatibility":{"tag":"stale","tag_description":"widespread failures or data too old to trust","last_tested":"2026-05-12","installed_version":null,"pypi_latest":"2.30.4","is_stale":null,"summary":{"python_range":"3.10–3.9","success_rate":33,"avg_install_s":7.6,"avg_import_s":null,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":null,"install_time_s":7.8,"import_time_s":null,"mem_mb":null,"disk_size":"412M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":null,"install_time_s":7.6,"import_time_s":null,"mem_mb":null,"disk_size":"414M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":null,"install_time_s":7.4,"import_time_s":null,"mem_mb":null,"disk_size":"406M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":null,"install_time_s":7.1,"import_time_s":null,"mem_mb":null,"disk_size":"406M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"cu12","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":"no_wheel","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":null,"install_time_s":8.1,"import_time_s":null,"mem_mb":null,"disk_size":"412M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-nccl-cu12","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]}}