NVIDIA CUDA CUPTI Runtime Libraries

12.9.79 verified Tue May 12 auth: no python install: draft quickstart: stale

The `nvidia-cuda-cupti-cu12` package provides the CUDA Profiling Tools Interface (CUPTI) runtime libraries for CUDA 12.x. CUPTI is a dynamic library that enables the creation of profiling and tracing tools for CUDA applications. This package primarily supplies the low-level C libraries, with Python bindings provided by the `cupti-python` package. The current version is 12.9.79 and it is actively maintained by NVIDIA.

pip install nvidia-cuda-cupti-cu12

Common errors

error CUPTI_ERROR_NOT_INITIALIZED ↓

cause The CUDA Profiling Tools Interface (CUPTI) failed to initialize, often due to an incompatible CUDA driver version or incorrect initialization sequence for Activity API usage.

fix

Ensure your CUDA driver is compatible with your installed CUDA Toolkit and CUPTI version; update the CUDA driver if necessary. For Activity API usage, ensure CUPTI is initialized before any CUDA driver or runtime API calls.

error CUPTI_ERROR_NOT_COMPATIBLE ↓

cause There is a version mismatch between the installed CUPTI library and the CUDA driver, or an attempt to enable an activity kind not supported by the current CUPTI version.

fix

Verify that your CUDA Toolkit, CUDA driver, and CUPTI versions are compatible. Upgrade your CUDA driver to a version released with or newer than your CUPTI version. Consult CUPTI documentation for supported activity kinds for your specific version.

error libcupti.so not found ↓

cause The system's dynamic linker cannot find the `libcupti.so` shared library, typically because its installation directory is not included in the `LD_LIBRARY_PATH` environment variable, or the library file is missing.

fix

Add the directory containing libcupti.so (e.g., /usr/local/cuda/extras/CUPTI/lib64 or a similar path within your CUDA installation) to the LD_LIBRARY_PATH environment variable. For example: export LD_LIBRARY_PATH=/path/to/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH.

error ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. YourPackageName requires nvidia-cuda-cupti-cu12==X.Y.Z; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 A.B.C which is incompatible. ↓

cause When installing other Python packages (e.g., PyTorch) that have strict CUDA dependency requirements, `pip` detects a version conflict with the already installed `nvidia-cuda-cupti-cu12` package.

fix

Either install a version of YourPackageName that is compatible with your existing nvidia-cuda-cupti-cu12, or uninstall nvidia-cuda-cupti-cu12 (if it was installed independently) and allow pip to install the compatible version required by YourPackageName. If using conda, ensure your cudatoolkit and cudnn versions align with all package requirements.

Warnings

gotcha The `nvidia-cuda-cupti-cu12` package provides the underlying C/C++ libraries. For Python-level interaction and APIs, the `cupti-python` package must also be installed. Direct Python imports are typically from `cupti` (the `cupti-python` module), not `nvidia_cuda_cupti_cu12`. ↓

fix Install `cupti-python` alongside `nvidia-cuda-cupti-cu12` using `pip install cupti-python`.

gotcha CUPTI Python relies on the `libcupti.so` C library. If `nvidia-cuda-cupti-cu12` is uninstalled or if `libcupti.so` cannot be found automatically, you may need to explicitly set the `LD_LIBRARY_PATH` environment variable to the directory containing `libcupti.so` (e.g., `$CUDA_TOOLKIT_INSTALL_PATH/extras/CUPTI/lib64`). ↓

fix Ensure `nvidia-cuda-cupti-cu12` is installed or set `export LD_LIBRARY_PATH=$CUDA_TOOLKIT_INSTALL_PATH/extras/CUPTI/lib64` before running CUPTI Python applications.

breaking In CUDA Toolkit 12.0, the activity record `CUpti_ActivityKernel8` was deprecated and replaced by `CUpti_ActivityKernel9` to accommodate new fields for devices with compute capability 9.0 and higher. This impacts users interacting with the low-level CUPTI C API, and potentially `cupti-python` users working with older code that explicitly references these activity kinds. ↓

fix Update profiling tools and code to use `CUpti_ActivityKernel9` where appropriate. Review CUPTI release notes for specific migration details.

gotcha Older versions of `nvidia-cuda-cupti-cu12` (e.g., 12.4.127, 12.3.101) have been flagged with severe vulnerabilities. While the current version 12.9.79 should address these, always ensure you are running the latest stable version and keep your CUDA Toolkit and drivers updated. ↓

fix Always use the latest version of `nvidia-cuda-cupti-cu12` and ensure your CUDA Toolkit installation and GPU drivers are up-to-date.

breaking The script attempts to import 'numba' but it is not installed. 'numba' is a separate Python package and must be explicitly installed if your application depends on it. ↓

fix Install 'numba' using pip: `pip install numba`.

breaking The `nvidia-cuda-cupti-cu12` package is hosted on the NVIDIA Python Package Index, not directly on PyPI. Attempting to install it directly via `pip install` without configuring the NVIDIA index will lead to a 'placeholder project' error during installation. ↓

fix First, install `nvidia-pyindex` using `pip install nvidia-pyindex` to configure pip to use the NVIDIA index. Then, install `nvidia-cuda-cupti-cu12`.

Install

pip install --extra-index-url https://pypi.ngc.nvidia.com nvidia-cuda-runtime-cu12

pip install cupti-python

Install compatibility draft last tested: 2026-05-12

python os / libc variant status wheel install import disk

3.10 alpine (musl) --extra-index-url build_error - - - -

3.10 alpine (musl) cupti-python build_error - - - -

3.10 alpine (musl) nvidia-cuda-cupti-cu12 build_error - - - -

3.10 alpine (musl) --extra-index-url - - - -

3.10 alpine (musl) cupti-python - - - -

3.10 alpine (musl) nvidia-cuda-cupti-cu12 - - - -

3.10 slim (glibc) --extra-index-url wheel 2.6s - 150M

3.10 slim (glibc) cupti-python wheel 4.5s 0.28s 313M

3.10 slim (glibc) nvidia-cuda-cupti-cu12 wheel 1.7s - 61M

3.10 slim (glibc) --extra-index-url - - - -

3.10 slim (glibc) cupti-python - - 0.24s 313M

3.10 slim (glibc) nvidia-cuda-cupti-cu12 - - - -

3.11 alpine (musl) --extra-index-url build_error - - - -

3.11 alpine (musl) cupti-python build_error - - - -

3.11 alpine (musl) nvidia-cuda-cupti-cu12 build_error - - - -

3.11 alpine (musl) --extra-index-url - - - -

3.11 alpine (musl) cupti-python - - - -

3.11 alpine (musl) nvidia-cuda-cupti-cu12 - - - -

3.11 slim (glibc) --extra-index-url wheel 1.8s - 156M

3.11 slim (glibc) cupti-python wheel 4.3s 0.53s 325M

3.11 slim (glibc) nvidia-cuda-cupti-cu12 wheel 1.8s - 63M

3.11 slim (glibc) --extra-index-url - - - -

3.11 slim (glibc) cupti-python - - 0.49s 325M

3.11 slim (glibc) nvidia-cuda-cupti-cu12 - - - -

3.12 alpine (musl) --extra-index-url build_error - - - -

3.12 alpine (musl) cupti-python build_error - - - -

3.12 alpine (musl) nvidia-cuda-cupti-cu12 build_error - - - -

3.12 alpine (musl) --extra-index-url - - - -

3.12 alpine (musl) cupti-python - - - -

3.12 alpine (musl) nvidia-cuda-cupti-cu12 - - - -

3.12 slim (glibc) --extra-index-url wheel 1.7s - 146M

3.12 slim (glibc) cupti-python wheel 4.1s 0.47s 311M

3.12 slim (glibc) nvidia-cuda-cupti-cu12 wheel 1.7s - 55M

3.12 slim (glibc) --extra-index-url - - - -

3.12 slim (glibc) cupti-python - - 0.44s 311M

3.12 slim (glibc) nvidia-cuda-cupti-cu12 - - - -

3.13 alpine (musl) --extra-index-url build_error - - - -

3.13 alpine (musl) cupti-python build_error - - - -

3.13 alpine (musl) nvidia-cuda-cupti-cu12 build_error - - - -

3.13 alpine (musl) --extra-index-url - - - -

3.13 alpine (musl) cupti-python - - - -

3.13 alpine (musl) nvidia-cuda-cupti-cu12 - - - -

3.13 slim (glibc) --extra-index-url wheel 2.0s - 145M

3.13 slim (glibc) cupti-python wheel 4.5s 0.37s 310M

3.13 slim (glibc) nvidia-cuda-cupti-cu12 wheel 1.8s - 54M

3.13 slim (glibc) --extra-index-url - - - -

3.13 slim (glibc) cupti-python - - 0.41s 309M

3.13 slim (glibc) nvidia-cuda-cupti-cu12 - - - -

3.9 alpine (musl) --extra-index-url build_error - - - -

3.9 alpine (musl) cupti-python build_error - - - -

3.9 alpine (musl) nvidia-cuda-cupti-cu12 build_error - - - -

3.9 alpine (musl) --extra-index-url - - - -

3.9 alpine (musl) cupti-python - - - -

3.9 alpine (musl) nvidia-cuda-cupti-cu12 - - - -

3.9 slim (glibc) --extra-index-url wheel 4.5s - 149M

3.9 slim (glibc) cupti-python wheel 5.6s 0.33s 419M

3.9 slim (glibc) nvidia-cuda-cupti-cu12 wheel 2.0s - 60M

3.9 slim (glibc) --extra-index-url - - - -

3.9 slim (glibc) cupti-python - - 0.29s 419M

3.9 slim (glibc) nvidia-cuda-cupti-cu12 - - - -

Imports

cupti
```
from cupti import cupti
```
This import is for the 'cupti-python' package, which provides Python bindings to the CUPTI C libraries in 'nvidia-cuda-cupti-cu12'.

Quickstart stale last tested: 2026-04-24

This quickstart demonstrates how to use `cupti-python` to profile a simple Numba CUDA kernel. It registers callbacks to capture kernel launch information and then flushes the collected activities. Ensure you have `numba-cuda` and a CUDA-capable GPU with appropriate drivers installed. This code assumes `cupti-python` is installed and the `libcupti.so` library (provided by `nvidia-cuda-cupti-cu12`) is discoverable.

import numpy as np
from numba import cuda
from cupti import cupti

@cuda.jit
def vector_add(A, B, C):
    idx = cuda.grid(1)
    if idx < A.size:
        C[idx] = A[idx] + B[idx]

def func_buffer_requested():
    buffer_size = 8 * 1024 * 1024  # 8MB buffer
    max_num_records = 0
    return buffer_size, max_num_records

def func_buffer_completed(activities: list):
    for activity in activities:
        if activity.kind == cupti.ActivityKind.CONCURRENT_KERNEL:
            print(f"Kernel Name: {activity.name}")
            print(f"Kernel Duration (ns): {activity.end - activity.start}")

# Initialize data
vector_length = 1024 * 1024
A = np.random.rand(vector_length)
B = np.random.rand(vector_length)
C = np.zeros_like(A)

threads_per_block = 128
blocks_per_grid = (vector_length + (threads_per_block - 1)) // threads_per_block

# Register CUPTI callbacks
cupti.activity_register_callbacks(func_buffer_requested, func_buffer_completed)

# Enable CUPTI activity collection for concurrent kernels
cupti.activity_enable(cupti.ActivityKind.CONCURRENT_KERNEL)

# Launch kernel
vector_add[blocks_per_grid, threads_per_block](A, B, C)
cuda.synchronize()

# Flush and disable CUPTI activity
cupti.activity_flush()
cupti.activity_disable(cupti.ActivityKind.CONCURRENT_KERNEL)