NVIDIA CUDA CUPTI Runtime Libraries
The `nvidia-cuda-cupti-cu12` package provides the CUDA Profiling Tools Interface (CUPTI) runtime libraries for CUDA 12.x. CUPTI is a dynamic library that enables the creation of profiling and tracing tools for CUDA applications. This package primarily supplies the low-level C libraries, with Python bindings provided by the `cupti-python` package. The current version is 12.9.79 and it is actively maintained by NVIDIA.
Common errors
-
CUPTI_ERROR_NOT_INITIALIZED
cause The CUDA Profiling Tools Interface (CUPTI) failed to initialize, often due to an incompatible CUDA driver version or incorrect initialization sequence for Activity API usage.fixEnsure your CUDA driver is compatible with your installed CUDA Toolkit and CUPTI version; update the CUDA driver if necessary. For Activity API usage, ensure CUPTI is initialized before any CUDA driver or runtime API calls. -
CUPTI_ERROR_NOT_COMPATIBLE
cause There is a version mismatch between the installed CUPTI library and the CUDA driver, or an attempt to enable an activity kind not supported by the current CUPTI version.fixVerify that your CUDA Toolkit, CUDA driver, and CUPTI versions are compatible. Upgrade your CUDA driver to a version released with or newer than your CUPTI version. Consult CUPTI documentation for supported activity kinds for your specific version. -
libcupti.so not found
cause The system's dynamic linker cannot find the `libcupti.so` shared library, typically because its installation directory is not included in the `LD_LIBRARY_PATH` environment variable, or the library file is missing.fixAdd the directory containing `libcupti.so` (e.g., `/usr/local/cuda/extras/CUPTI/lib64` or a similar path within your CUDA installation) to the `LD_LIBRARY_PATH` environment variable. For example: `export LD_LIBRARY_PATH=/path/to/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH`. -
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. YourPackageName requires nvidia-cuda-cupti-cu12==X.Y.Z; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 A.B.C which is incompatible.
cause When installing other Python packages (e.g., PyTorch) that have strict CUDA dependency requirements, `pip` detects a version conflict with the already installed `nvidia-cuda-cupti-cu12` package.fixEither install a version of `YourPackageName` that is compatible with your existing `nvidia-cuda-cupti-cu12`, or uninstall `nvidia-cuda-cupti-cu12` (if it was installed independently) and allow `pip` to install the compatible version required by `YourPackageName`. If using `conda`, ensure your `cudatoolkit` and `cudnn` versions align with all package requirements.
Warnings
- gotcha The `nvidia-cuda-cupti-cu12` package provides the underlying C/C++ libraries. For Python-level interaction and APIs, the `cupti-python` package must also be installed. Direct Python imports are typically from `cupti` (the `cupti-python` module), not `nvidia_cuda_cupti_cu12`.
- gotcha CUPTI Python relies on the `libcupti.so` C library. If `nvidia-cuda-cupti-cu12` is uninstalled or if `libcupti.so` cannot be found automatically, you may need to explicitly set the `LD_LIBRARY_PATH` environment variable to the directory containing `libcupti.so` (e.g., `$CUDA_TOOLKIT_INSTALL_PATH/extras/CUPTI/lib64`).
- breaking In CUDA Toolkit 12.0, the activity record `CUpti_ActivityKernel8` was deprecated and replaced by `CUpti_ActivityKernel9` to accommodate new fields for devices with compute capability 9.0 and higher. This impacts users interacting with the low-level CUPTI C API, and potentially `cupti-python` users working with older code that explicitly references these activity kinds.
- gotcha Older versions of `nvidia-cuda-cupti-cu12` (e.g., 12.4.127, 12.3.101) have been flagged with severe vulnerabilities. While the current version 12.9.79 should address these, always ensure you are running the latest stable version and keep your CUDA Toolkit and drivers updated.
- breaking The script attempts to import 'numba' but it is not installed. 'numba' is a separate Python package and must be explicitly installed if your application depends on it.
- breaking The `nvidia-cuda-cupti-cu12` package is hosted on the NVIDIA Python Package Index, not directly on PyPI. Attempting to install it directly via `pip install` without configuring the NVIDIA index will lead to a 'placeholder project' error during installation.
Install
-
pip install nvidia-cuda-cupti-cu12 -
pip install --extra-index-url https://pypi.ngc.nvidia.com nvidia-cuda-runtime-cu12 -
pip install cupti-python
Imports
- cupti
from cupti import cupti
Quickstart
import numpy as np
from numba import cuda
from cupti import cupti
@cuda.jit
def vector_add(A, B, C):
idx = cuda.grid(1)
if idx < A.size:
C[idx] = A[idx] + B[idx]
def func_buffer_requested():
buffer_size = 8 * 1024 * 1024 # 8MB buffer
max_num_records = 0
return buffer_size, max_num_records
def func_buffer_completed(activities: list):
for activity in activities:
if activity.kind == cupti.ActivityKind.CONCURRENT_KERNEL:
print(f"Kernel Name: {activity.name}")
print(f"Kernel Duration (ns): {activity.end - activity.start}")
# Initialize data
vector_length = 1024 * 1024
A = np.random.rand(vector_length)
B = np.random.rand(vector_length)
C = np.zeros_like(A)
threads_per_block = 128
blocks_per_grid = (vector_length + (threads_per_block - 1)) // threads_per_block
# Register CUPTI callbacks
cupti.activity_register_callbacks(func_buffer_requested, func_buffer_completed)
# Enable CUPTI activity collection for concurrent kernels
cupti.activity_enable(cupti.ActivityKind.CONCURRENT_KERNEL)
# Launch kernel
vector_add[blocks_per_grid, threads_per_block](A, B, C)
cuda.synchronize()
# Flush and disable CUPTI activity
cupti.activity_flush()
cupti.activity_disable(cupti.ActivityKind.CONCURRENT_KERNEL)