NVIDIA CUPTI Python Library
The `cupti-python` library provides Python bindings for the NVIDIA CUDA Profiling Tools Interface (CUPTI). It exposes low-level C functions to enable detailed instrumentation and profiling of CUDA applications. While it offers direct access to CUPTI's C API, it's also a dependency for higher-level profiling tools like `cupy_cupti.profiler` (which is part of the same distribution) that simplify starting and stopping profiling sessions. It is currently at version 13.2.0 and aligns its releases with major CUDA Toolkit versions.
Common errors
-
ModuleNotFoundError: No module named 'cupti_python'
cause The PyPI package `cupti-python` installs a Python package named `cupy_cupti`, not `cupti_python`.fixChange your import statement from `import cupti_python` to `import cupy_cupti`. -
OSError: libcupti.so: cannot open shared object file: No such file or directory
cause The NVIDIA CUPTI shared library (`libcupti.so`) is not found in the system's library search paths.fixVerify that the CUDA Toolkit is installed and its `lib64` directory (e.g., `/usr/local/cuda/lib64`) is correctly added to your `LD_LIBRARY_PATH` environment variable (on Linux). For example: `export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH`. -
RuntimeError: CUDA driver version is insufficient for CUDA runtime version.
cause There is a mismatch between your installed NVIDIA display driver and the CUDA Toolkit version being used by `cupti-python` (and any other CUDA-dependent libraries like CuPy).fixUpdate your NVIDIA GPU drivers to a version compatible with your CUDA Toolkit, or install a CUDA Toolkit version that is compatible with your current drivers. Refer to NVIDIA's CUDA compatibility matrix. -
Nsight Systems reports no GPU activity or missing CUPTI callbacks when using `cupy_cupti.profiler.start/stop`.
cause While `start()` and `stop()` mark regions, Nsight Systems needs to launch and monitor the Python process directly to capture all profiling data and callbacks.fixExecute your Python script by launching it with Nsight Systems: `nsys profile python your_script.py`. The `profiler.start()` and `stop()` calls will then serve to define specific regions within the Nsight Systems timeline.
Warnings
- gotcha The PyPI package name is `cupti-python`, but the actual Python package name you import is `cupy_cupti`. Importing `cupti_python` directly will result in a `ModuleNotFoundError`.
- breaking CUPTI API can change significantly across major CUDA Toolkit versions. While `cupti-python` aims for compatibility, using a version that mismatches your installed CUDA Toolkit or NVIDIA drivers can lead to runtime errors or incorrect profiling data.
- gotcha `libcupti.so` (the NVIDIA CUPTI shared library) must be discoverable by your system. If not found, you'll encounter `OSError: libcupti.so: cannot open shared object file`.
- gotcha The `cupy_cupti.profiler.start()` and `.stop()` methods primarily mark profiling regions. For comprehensive profiling data, you often need to run your Python script under an external profiling tool like NVIDIA Nsight Systems.
- gotcha The NVIDIA CUPTI library, and by extension `cupti-python`, is primarily supported on Linux operating systems. Windows support is generally not available or highly experimental.
Install
-
pip install cupti-python -
pip install cupti-python==13.2.0
Imports
- profiler
from cupti_python import profiler
from cupy_cupti import profiler
Quickstart
import os
import cupy_cupti.profiler as cupti_profiler
import cupy as cp
import sys
# This quickstart demonstrates starting and stopping CUPTI profiling.
# Actual profiling data collection (e.g., via callbacks or external tools)
# is beyond the scope of this basic example and typically requires tools like Nsight Systems.
if not cp.cuda.is_available():
print("CUDA is not available. Cannot run CUPTI profiling example.")
sys.exit(1)
else:
print("CUPTI Profiling Quickstart (requires CuPy installed):")
print("--------------------------------------------------")
# Define a simple CuPy operation to profile
def run_cuda_kernel():
a = cp.random.rand(100, 100).astype(cp.float32)
b = cp.random.rand(100, 100).astype(cp.float32)
c = a @ b
cp.cuda.Stream.null.synchronize() # Ensure ops complete before profiler stops
print(f"Executed a CuPy matrix multiplication. Result shape: {c.shape}")
try:
print("Starting CUPTI profiler...")
cupti_profiler.start()
run_cuda_kernel()
cupti_profiler.stop()
print("CUPTI profiler stopped.")
print("\nNote: For actual profile data, you would typically integrate with NVIDIA Nsight Systems ")
print("or set up CUPTI callbacks using the lower-level API. This script only marks a profiling region.")
except Exception as e:
print(f"An error occurred during profiling: {e}")
print("Ensure CUPTI libraries are discoverable (e.g., via LD_LIBRARY_PATH) and CUDA is properly set up.")