{"id":4143,"library":"nvidia-nvtx-cu11","title":"NVIDIA Tools Extension (NVTX) for CUDA 11 (Python Bindings)","description":"NVTX (NVIDIA Tools Extension SDK) is a cross-platform C-based API with Python, C++, and Rust wrappers, used for annotating events, code ranges, and resources within applications. This allows developers to gain contextual information for performance analysis and visualization using NVIDIA developer tools such as Nsight Systems, Nsight Compute, and Nsight Graphics. The `nvidia-nvtx-cu11` package provides the NVTX Python bindings specifically compiled for CUDA 11 environments, enabling Python applications to leverage NVTX for profiling. It is currently at version 11.8.86. The underlying Python API is exposed via the `nvtx` module, which also has a separate PyPI distribution and a more frequent release cadence.","status":"active","version":"11.8.86","language":"en","source_language":"en","source_url":"https://github.com/NVIDIA/NVTX","tags":["nvidia","profiling","performance","gpu","cuda","nvtx","tools","developer tools"],"install":[{"cmd":"pip install nvidia-nvtx-cu11","lang":"bash","label":"Install with pip"},{"cmd":"conda install -c conda-forge nvtx","lang":"bash","label":"Install `nvtx` (general Python NVTX) with Conda"}],"dependencies":[{"reason":"Runtime dependency","package":"python","optional":false},{"reason":"Required for using payloads of types other than int or float","package":"numpy","optional":true}],"imports":[{"symbol":"nvtx","correct":"import nvtx"}],"quickstart":{"code":"import time\nimport nvtx\n\n@nvtx.annotate(color=\"blue\", message=\"my_function_range\")\ndef my_function():\n    for i in range(2):\n        with nvtx.annotate(\"loop_iteration\", color=\"red\", category=i):\n            time.sleep(0.05) # Simulate some work\n\nif __name__ == \"__main__\":\n    my_function()\n    print(\"Execution finished. To profile, run with NVIDIA Nsight Systems: `nsys profile -t nvtx python <your_script_name>.py`\")","lang":"python","description":"Annotate Python functions and code blocks using decorators (`@nvtx.annotate`) or context managers (`with nvtx.annotate:`). To visualize the annotations, run your script with NVIDIA Nsight Systems."},"warnings":[{"fix":"For C/C++ components, migrate to NVTX v3 and ensure you're explicitly including `<nvtx3/nvToolsExt.h>`. For Python, ensure your `nvidia-nvtx-cu11` (or `nvtx`) package is up-to-date and compatible with your CUDA toolkit version.","message":"NVTX v2 (e.g., nvToolsExt.h) is deprecated in favor of NVTX v3. Using older NVTX versions or C/C++ code compiled with NVTX v2 headers alongside newer CUDA toolkits (e.g., CUDA 12.x and above) can lead to compilation failures or undefined runtime behavior, especially with custom CUDA extensions. The Python bindings for NVTX generally use NVTX v3.","severity":"breaking","affected_versions":"NVTX C/C++ API versions prior to v3, potentially impacting Python bindings linked against older C/C++ NVTX libraries when upgrading CUDA."},{"fix":"Explicitly set the `multiprocessing` start method to 'spawn' at the beginning of your script: `import multiprocessing; multiprocessing.set_start_method(\"spawn\")`.","message":"Profiling Python applications that use the `multiprocessing` module with NVIDIA Nsight Systems and NVTX enabled can cause the script to hang indefinitely or result in incomplete profiling data on Linux. This is often due to the default 'fork' start method not being compatible with Nsight Systems' injection mechanism.","severity":"gotcha","affected_versions":"All versions of `nvtx` when used with `multiprocessing` on Linux with default 'fork' start method."},{"fix":"Prefer explicit annotation using `@nvtx.annotate()` decorators or `with nvtx.annotate():` context managers for critical code sections. Only use automatic annotation if the broad introspection outweighs the performance penalty.","message":"While NVTX Python bindings offer automatic function annotation, enabling it introduces significant overhead (potentially more than 10x) to every function call. This can severely degrade application runtime performance.","severity":"gotcha","affected_versions":"All versions of `nvtx`."},{"fix":"Reuse existing domain objects by calling `nvtx.get_domain(name=\"my_domain\")`. For finer-grained grouping of annotations within a domain, utilize categories instead, as they are less expensive to create and manage.","message":"Creating NVTX domains is a relatively expensive operation. It is recommended to create domains sparingly, typically one per library or major application component, rather than for fine-grained event grouping.","severity":"gotcha","affected_versions":"All versions of `nvtx`."},{"fix":"Annotate CPU-side calls that launch GPU kernels, manage GPU memory, or synchronize GPU operations. For detailed profiling within GPU kernels, rely on specialized GPU profiling tools like Nsight Compute, which can leverage NVTX for host-side context.","message":"NVTX is designed for annotating host-side (CPU) events and ranges that interact with GPU operations. It is not supported for direct annotation within GPU code (e.g., CUDA `__device__` functions).","severity":"gotcha","affected_versions":"All versions of NVTX."}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}