NVIDIA Tools Extension (NVTX) Python Binding

12.9.79 verified Tue May 12 auth: no python install: stale quickstart: stale

NVTX (NVIDIA Tools Extension SDK) is a C-based API with Python wrappers for annotating application code with events, ranges, and resources. These annotations provide contextual information for NVIDIA developer tools like Nsight Systems and Nsight Compute, enabling visual profiling and performance analysis of CPU and GPU activities in Python applications. The `nvidia-nvtx-cu12` package provides bindings specifically for CUDA 12.x environments. It is actively maintained with frequent updates, often tied to CUDA toolkit releases.

pip install nvidia-nvtx-cu12

Common errors

error ModuleNotFoundError: No module named 'nvtx' ↓

cause The 'nvtx' Python package, which is provided by `nvidia-nvtx-cu12`, is not installed or not accessible in your current Python environment.

fix

Ensure nvidia-nvtx-cu12 is correctly installed using pip install nvidia-nvtx-cu12. If you are using a virtual environment, ensure it's activated.

error ImportError: DLL load failed: The specified module could not be found. ↓

cause This error, common on Windows, occurs when Python cannot find a required dynamic link library (DLL) that `nvidia-nvtx-cu12` or its underlying CUDA dependencies rely on. This often points to an incorrect or incomplete CUDA Toolkit installation, or missing CUDA binary paths in the system's PATH environment variable.

fix

Verify your CUDA Toolkit installation and ensure the CUDA bin directory (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin) is added to your system's PATH environment variable. On Linux, ensure LD_LIBRARY_PATH includes CUDA library paths.

error RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build? ↓

cause This error typically arises when a framework like PyTorch, which is attempting to use NVTX for profiling, detects that the underlying NVTX libraries or a CUDA-enabled build of the framework itself is not correctly configured or installed.

fix

Ensure you have a CUDA-enabled build of PyTorch (or your relevant framework) installed, matching your CUDA toolkit version. For PyTorch, install using pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 (adjust cu121 for your CUDA version).

error ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. ↓

cause This common pip message indicates a version mismatch or conflict between `nvidia-nvtx-cu12` (or other `nvidia-*-cu12` packages) and other installed packages, often a specific version of PyTorch or TensorFlow, which require precise versions of NVIDIA's CUDA-related Python wrappers.

fix

When installing frameworks like PyTorch, use their recommended installation command, which often specifies the exact nvidia-*-cu12 dependencies. Alternatively, create a fresh virtual environment and install only the necessary packages, carefully checking their compatibility matrix (e.g., PyTorch's website) for matching CUDA versions.

Warnings

gotcha When using NVTX with Python's `multiprocessing` module on Linux, the default `fork` start method can interfere with Nsight Systems' ability to inject and collect NVTX traces reliably. It is recommended to explicitly set the start method to `spawn`. ↓

fix Before creating any Pool objects or starting new processes, add: `import multiprocessing; multiprocessing.set_start_method("spawn", force=True)`

gotcha Nsight Systems trace features, including NVTX collection via process injection, may fail or cause instability in applications that use `seccomp` to restrict system calls. This can lead to process termination or hung applications. ↓

fix Disable `seccomp` restrictions for the profiled application if possible, or use non-injection based profiling features within Nsight Systems.

breaking Changes in the underlying NVTX C API between major CUDA Toolkit versions (e.g., CUDA 11.x to 12.x) can lead to compilation issues or runtime incompatibilities for other libraries that directly interface with NVTX's C API. While `nvidia-nvtx-cu12` is built for CUDA 12, users integrating multiple components should ensure NVTX version consistency. ↓

fix Ensure all components of your application are compiled and linked against a consistent NVTX and CUDA Toolkit version. Recompile dependent libraries if necessary.

gotcha The `nvtx` library offers functionality for automatic annotation of all function calls. However, enabling this feature introduces significant performance overhead (potentially slowing down execution by more than 10x) and should be used cautiously for targeted debugging, not general profiling. ↓

fix Use automatic annotation judiciously. For general profiling, prefer manual annotation with `@nvtx.annotate` or `with nvtx.annotate` on critical code sections.

gotcha Creating NVTX domains can be a relatively expensive operation. For optimal performance and clearer visualization, it is recommended to create a limited number of domains (e.g., one per major library or subsystem) and use categories for finer-grained grouping of events within those domains. ↓

fix Minimize the number of distinct `nvtx.Domain` objects created. Leverage `category` arguments for detailed event classification within a single domain.

breaking The `nvtx` Python module is not found, likely because the package has not been installed in the current environment. ↓

fix Install the 'nvtx' Python package. For CUDA-accelerated NVTX, install `nvidia-nvtx-cuXX` (replacing XX with your CUDA major version, e.g., `pip install nvidia-nvtx-cu12`). For a generic CPU-only version, install `nvtx-plugins-py` (`pip install nvtx-plugins-py`).

breaking The `nvidia-nvtx-cu12` package, along with other NVIDIA Python packages, is hosted on the NVIDIA Python Package Index, not directly on PyPI.org. Attempting to install it directly via `pip install nvidia-nvtx-cu12` without configuring the NVIDIA index will result in a 'placeholder project' error, preventing installation. ↓

fix To install this package, first ensure the NVIDIA Python Package Index is configured by installing `nvidia-pyindex`, then proceed with the package installation: ``` $ pip install nvidia-pyindex $ pip install nvidia-nvtx-cu12 ```

Install compatibility stale last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) build_error - - - -

3.10 alpine (musl) - - - -

3.10 slim (glibc) wheel 1.6s - 19M

3.10 slim (glibc) - - - -

3.11 alpine (musl) build_error - - - -

3.11 alpine (musl) - - - -

3.11 slim (glibc) wheel 1.7s - 21M

3.11 slim (glibc) - - - -

3.12 alpine (musl) build_error - - - -

3.12 alpine (musl) - - - -

3.12 slim (glibc) wheel 1.4s - 12M

3.12 slim (glibc) - - - -

3.13 alpine (musl) build_error - - - -

3.13 alpine (musl) - - - -

3.13 slim (glibc) wheel 1.5s - 12M

3.13 slim (glibc) - - - -

3.9 alpine (musl) build_error - - - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) wheel 1.9s - 18M

3.9 slim (glibc) - - - -

Imports

nvtx
```
import nvtx
```

Quickstart stale last tested: 2026-04-24

This example demonstrates how to use `nvtx.annotate` as a decorator for functions and as a context manager for code blocks, and `nvtx.mark` for instantaneous events. The annotated code itself does not directly produce a visible output, but generates profiling data that can be captured and visualized by NVIDIA Nsight Systems.

import time
import nvtx

@nvtx.annotate("my_outer_function", color="blue")
def my_function_to_profile():
    time.sleep(0.05) # Simulate some work
    with nvtx.annotate("inner_loop_work", color="red"):
        for i in range(2):
            time.sleep(0.02) # More work
            nvtx.mark(f"Iteration {i} complete", color="green")

if __name__ == "__main__":
    print("Running annotated code...")
    my_function_to_profile()
    print("Code finished. To profile this, save as e.g., 'demo.py' and run:\nnsys profile python demo.py")
    print("Then open the generated .qdrep file in NVIDIA Nsight Systems for visualization.")