NVIDIA Tools Extension (NVTX) Python Binding
NVTX (NVIDIA Tools Extension SDK) is a C-based API with Python wrappers for annotating application code with events, ranges, and resources. These annotations provide contextual information for NVIDIA developer tools like Nsight Systems and Nsight Compute, enabling visual profiling and performance analysis of CPU and GPU activities in Python applications. The `nvidia-nvtx-cu12` package provides bindings specifically for CUDA 12.x environments. It is actively maintained with frequent updates, often tied to CUDA toolkit releases.
Warnings
- gotcha When using NVTX with Python's `multiprocessing` module on Linux, the default `fork` start method can interfere with Nsight Systems' ability to inject and collect NVTX traces reliably. It is recommended to explicitly set the start method to `spawn`.
- gotcha Nsight Systems trace features, including NVTX collection via process injection, may fail or cause instability in applications that use `seccomp` to restrict system calls. This can lead to process termination or hung applications.
- breaking Changes in the underlying NVTX C API between major CUDA Toolkit versions (e.g., CUDA 11.x to 12.x) can lead to compilation issues or runtime incompatibilities for other libraries that directly interface with NVTX's C API. While `nvidia-nvtx-cu12` is built for CUDA 12, users integrating multiple components should ensure NVTX version consistency.
- gotcha The `nvtx` library offers functionality for automatic annotation of all function calls. However, enabling this feature introduces significant performance overhead (potentially slowing down execution by more than 10x) and should be used cautiously for targeted debugging, not general profiling.
- gotcha Creating NVTX domains can be a relatively expensive operation. For optimal performance and clearer visualization, it is recommended to create a limited number of domains (e.g., one per major library or subsystem) and use categories for finer-grained grouping of events within those domains.
Install
-
pip install nvidia-nvtx-cu12
Imports
- nvtx
import nvtx
Quickstart
import time
import nvtx
@nvtx.annotate("my_outer_function", color="blue")
def my_function_to_profile():
time.sleep(0.05) # Simulate some work
with nvtx.annotate("inner_loop_work", color="red"):
for i in range(2):
time.sleep(0.02) # More work
nvtx.mark(f"Iteration {i} complete", color="green")
if __name__ == "__main__":
print("Running annotated code...")
my_function_to_profile()
print("Code finished. To profile this, save as e.g., 'demo.py' and run:\nnsys profile python demo.py")
print("Then open the generated .qdrep file in NVIDIA Nsight Systems for visualization.")