cuda-core (Pythonic CUDA Driver API)
The `cuda-core` library provides low-level Pythonic bindings to the NVIDIA CUDA Driver API. It enables direct interaction with NVIDIA GPUs for tasks like device querying, memory management, context creation, and kernel launches. As part of the broader `cuda-python` project, it is currently at version 0.7.0 and often sees updates aligned with new CUDA Toolkit releases.
Warnings
- breaking The `cuda-core` library provides bindings to the CUDA Driver API and *requires a compatible NVIDIA CUDA Toolkit to be installed on the system* (including drivers, runtime, and developer headers). It is not a standalone Python package that ships with CUDA itself.
- gotcha As a low-level Driver API, `cuda-core` necessitates explicit memory management. Users must manually allocate and deallocate GPU memory using functions like `drv.cuMemAlloc` and `drv.cuMemFree`. Failing to free allocated memory can lead to GPU memory leaks and resource exhaustion.
- gotcha `cuda.core` binds directly to the CUDA Driver API, which is lower-level than the CUDA Runtime API (often used implicitly by `nvcc` and libraries like `cuda.cudart`). It doesn't provide high-level abstractions like automatic memory management, unified memory, or stream synchronization helpers out-of-the-box.
- gotcha Mismatches between the `cuda-core` Python package version, the system's installed CUDA Toolkit version, and the NVIDIA GPU driver version can lead to `drv.CUException` errors (e.g., `CUDA_ERROR_NOT_INITIALIZED`, `CUDA_ERROR_NO_DEVICE`, `CUDA_ERROR_INVALID_DEVICE`).
Install
-
pip install cuda-core
Imports
- cuda.core
import cuda.core as drv
- CUException
from cuda.core import CUException
Quickstart
import cuda.core as drv
import numpy as np
try:
# Initialize the CUDA driver API
drv.init()
# Get device count
device_count = drv.cuDeviceGetCount()
if device_count == 0:
print("No CUDA devices found. Ensure GPU drivers and CUDA Toolkit are installed.")
else:
print(f"Found {device_count} CUDA device(s).")
# Get the first device and create a context
device = drv.cuDeviceGet(0)
# Flags: 0 for default context creation
context = drv.cuCtxCreate(0, device)
print(f"Created CUDA context on device 0: {device}")
# Prepare host data
host_data = np.arange(10, dtype=np.int32)
print(f"Host data: {host_data}")
# Allocate memory on the device
device_ptr = drv.cuMemAlloc(host_data.nbytes)
print(f"Allocated {host_data.nbytes} bytes on device at address: {device_ptr}")
# Copy host data to device
drv.cuMemcpyHtoD(device_ptr, host_data.ctypes.data, host_data.nbytes)
print("Copied host data to device.")
# Example: Copy device data back to host (optional, for verification)
retrieved_data = np.empty_like(host_data)
drv.cuMemcpyDtoH(retrieved_data.ctypes.data, device_ptr, retrieved_data.nbytes)
print(f"Retrieved data from device: {retrieved_data}")
# Clean up: Free device memory and destroy context
drv.cuMemFree(device_ptr)
drv.cuCtxDestroy(context)
print("Successfully freed device memory and destroyed context.")
except drv.CUException as e:
print(f"CUDA Error: {e}. This often indicates issues with CUDA Toolkit installation, drivers, or device availability.")
print("Please ensure you have a compatible NVIDIA GPU, up-to-date drivers, and a correctly installed CUDA Toolkit.")
except Exception as e:
print(f"An unexpected Python error occurred: {e}")