UCX (Unified Communication X) for CUDA 12
libucx-cu12 is a Python package that provides the Unified Communication X (UCX) C library, specifically compiled with CUDA 12 support, as a distributable wheel. UCX is a low-level, high-performance communication framework for HPC applications, enabling optimized data exchange between CPUs, GPUs, and network interfaces (e.g., InfiniBand, RoCE). This package serves as a backend dependency for Python bindings like `ucx-py` (now `ucxx`), allowing Python applications to leverage UCX capabilities. The current version available on PyPI is `1.19.0`, with upstream UCX having a frequent release cadence including point releases and release candidates.
Common errors
-
ModuleNotFoundError: No module named 'ucxx'
cause The `libucx-cu12` package only provides the UCX C shared libraries; it does not include the Python bindings.fixInstall the Python bindings package: `pip install ucxx`. -
ImportError: libucx.so: cannot open shared object file: No such file or directory
cause The UCX shared libraries, which `libucx-cu12` is meant to provide, are not found by the system's dynamic linker or by the `ucxx` bindings during initialization. This could be due to an incomplete `libucx-cu12` installation, `LD_LIBRARY_PATH` issues, or other environment problems.fixVerify `libucx-cu12` is installed (`pip show libucx-cu12`). If it is, check your `LD_LIBRARY_PATH` if you're managing libraries manually, or ensure `ucxx` is correctly installed to link against it. Reinstalling `libucx-cu12` and `ucxx` can often resolve this. -
CUDA_ERROR_UNSUPPORTED_CPU_ARCHITECTURE
cause `libucx-cu12` is compiled for CUDA 12.x. The system's active CUDA runtime environment or GPU driver might be incompatible (e.g., an older CUDA 11.x or a very new 13.x/14.x with ABI changes).fixEnsure your system has CUDA 12.x installed and configured correctly, and that your GPU drivers support CUDA 12.x. Check `nvidia-smi` output for driver/CUDA version compatibility. -
UCX_ERROR : Failed to get network devices
cause UCX cannot detect or initialize the specified network interfaces (e.g., InfiniBand devices). This can be due to missing drivers, incorrect device names in `UCX_NET_DEVICES`, or insufficient permissions.fixVerify network interface drivers are installed and healthy (`ibdev2netdev` for InfiniBand). Check `UCX_NET_DEVICES` environment variable for correct device names. Ensure the user has appropriate permissions to access the network devices.
Warnings
- breaking The upstream UCX 1.20.0 release introduced a new GPU device API for direct GPU-to-GPU communication and related host/device management. While `libucx-cu12` is currently at 1.19.0, users planning to upgrade to `libucx-cu12` versions based on UCX 1.20.0 or later should be aware that their `ucxx` code interacting with GPU communication might need updates.
- gotcha `libucx-cu12` is specifically built against CUDA 12.x. Using it with significantly older or newer CUDA runtime versions on the system (e.g., CUDA 11.x or 13.x+) can lead to runtime errors or undefined behavior, as the underlying C libraries depend on specific CUDA ABI compatibility.
- gotcha UCX is a highly configurable library. Optimal performance, especially in HPC environments with InfiniBand or specific network fabrics, often requires setting various environment variables (e.g., `UCX_TLS`, `UCX_NET_DEVICES`, `UCX_MEMTYPE_CACHE_SIZE`). Default settings may not utilize hardware optimally.
- gotcha `libucx-cu12` provides the compiled UCX C library. To use it in Python, you *must* install a separate Python binding package, typically `ucx-py` (now `ucxx`). Installing `libucx-cu12` alone will not provide Python modules for import.
Install
-
pip install libucx-cu12 -
pip install ucxx
Imports
- No direct Python imports from 'libucx_cu12'
# libucx-cu12 provides the C library. For Python bindings, install 'ucxx' and import from it: from ucxx import init, get_version_info
Quickstart
import os
import subprocess
import sys
# Ensure libucx-cu12 is installed (it provides the C library)
# and then ucxx (for Python bindings)
try:
import ucxx
print(f"ucxx version: {ucxx.__version__}")
print(f"Underlying UCX version: {ucxx.get_version_info()}")
# A simple UCX initialization example via ucxx
# Note: UCX often requires specific environment variables for optimal performance.
# For a basic test, default initialization might work.
# For distributed setup, UCP_RNDV_THRESH is often set.
# os.environ['UCX_TLS'] = 'rc_x,sm,self'
# os.environ['UCX_NET_DEVICES'] = 'mlx5_0:1' # Example for InfiniBand
# Initialize UCX context (from ucxx)
ctx = ucxx.init(enable_delayed_start=False)
print("UCX context initialized successfully via ucxx.")
print(f"UCX context info: {ctx}")
# Additional UCX operations would follow here.
# For a complete example involving communication, see ucxx documentation.
except ImportError:
print("Error: 'ucxx' not found. Please install it after 'libucx-cu12'.")
print("Run: pip install ucxx")
except Exception as e:
print(f"An error occurred during UCX initialization: {e}")
print("Please check UCX environment variables and system dependencies (e.g., CUDA drivers).")
# You can also verify the installed UCX library version directly from the command line
# This would typically be from the C library provided by libucx-cu12
print("\nVerifying UCX shared library version (if ucx_info is in PATH):")
try:
# This might not always be in PATH depending on how libucx-cu12 is installed.
# For wheels, it typically installs to site-packages, not system PATH.
# However, ucxx.get_version_info() above is the primary way.
result = subprocess.run(['ucx_info', '-v'], capture_output=True, text=True, check=True)
print(result.stdout)
except FileNotFoundError:
print("`ucx_info` command not found. This is normal if UCX is only installed via Python wheels.")
print("The `ucxx.get_version_info()` call above is the recommended way to check.")
except subprocess.CalledProcessError as e:
print(f"Error running ucx_info: {e}\n{e.stderr}")