NVIDIA TensorRT for CUDA 12
TensorRT is a high-performance deep learning inference optimizer and runtime from NVIDIA. The `tensorrt-cu12` package provides the Python bindings specifically compiled for CUDA Toolkit 12.x. As of its latest version `10.16.1.11`, it supports optimizing and deploying trained deep learning models for faster inference on NVIDIA GPUs. Releases are frequent, typically aligning with major TensorRT core library and CUDA toolkit updates.
Common errors
-
ModuleNotFoundError: No module named 'tensorrt'
cause The `tensorrt-cu12` package is not installed in the current Python environment or the environment is not activated.fixRun `pip install tensorrt-cu12` to install the package. -
tensorrt.infer.NMSPlugin_TRT.NMSPlugin: no matching engine found for requested plugin
cause This error typically indicates that a required TensorRT plugin (either a built-in one or a custom one) is not available, is incompatible with the loaded engine/network, or its library could not be loaded.fixEnsure all necessary plugin libraries (.so or .dll files) are in a path accessible by TensorRT (e.g., `LD_LIBRARY_PATH` on Linux), and that custom plugins are built against the correct TensorRT version. For built-in plugins, verify your TensorRT installation is complete. -
libcudart.so.12.X: cannot open shared object file: No such file or directory
cause The system cannot find the CUDA runtime library, indicating either CUDA Toolkit 12.x is not installed or its library paths are not correctly configured for your system.fixInstall NVIDIA CUDA Toolkit 12.x and ensure its `lib` directory (e.g., `/usr/local/cuda-12.X/lib64`) is included in your system's `LD_LIBRARY_PATH` (Linux) or `PATH` (Windows). -
TypeError: argument 'network' (unambiguous type name missing)
cause This error often occurs when passing an incorrect type or `None` to a TensorRT function expecting a specific object, particularly when initializing the network or builder, or during engine building.fixDouble-check the arguments passed to TensorRT API calls, especially for `builder.create_network()` and `builder.build_engine()`. Ensure the network object is properly created with valid flags (e.g., `EXPLICIT_BATCH`) and populated before being used by the builder.
Warnings
- breaking TensorRT 10.13.2 (and subsequent versions) dropped support for Python versions older than 3.10 and CUDA 11.x. Users on older environments must upgrade their Python interpreter or use an older `tensorrt-cu12` package version.
- gotcha The `tensorrt-cu12` package requires a matching system-wide NVIDIA CUDA Toolkit and cuDNN installation (version 12.x) to be present and correctly configured (e.g., via `LD_LIBRARY_PATH` on Linux). These are *not* installed by `pip`.
- deprecated TensorRT 10.14 deprecated `pycuda` usages in its samples and shifted towards `cuda-python`. While not a direct breaking change for the core API, it indicates a shift in recommended practices for low-level CUDA interaction.
- breaking Several standard plugins (e.g., `cropAndResizeDynamic`, `DecodeBbox3DPlugin`) have been migrated from `IPluginV2` to `IPluginV3`, with `IPluginV2` versions being deprecated and scheduled for removal. Custom plugins implementing `IPluginV2` might need updates.
Install
-
pip install tensorrt-cu12
Imports
- tensorrt
import trt
import tensorrt as trt
- Logger
from tensorrt.infer import Logger
from tensorrt import Logger
Quickstart
import tensorrt as trt
import os
# A basic example: creating a TensorRT builder and network
# Note: A real application would involve loading an ONNX/UFF model and building an engine.
# Create a logger to track verbose output and errors
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
try:
# Create a builder
builder = trt.Builder(TRT_LOGGER)
print(f"TensorRT Builder created successfully. Max batch size: {builder.max_batch_size}")
# Create an empty network definition. EXPLICIT_BATCH is crucial for modern TensorRT.
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
print("Network created with EXPLICIT_BATCH flag.")
# Example: Add an input layer (simplified, a real model would have specific shapes)
input_tensor = network.add_input(name='input_tensor', dtype=trt.float32, shape=(1, 3, 224, 224))
print(f"Added input tensor with shape {input_tensor.shape}")
# In a real scenario, you'd parse a model, e.g., using trt.OnnxParser(network, TRT_LOGGER)
# and then configure the builder for engine creation and serialization.
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure you have a compatible NVIDIA GPU and the correct CUDA/cuDNN installations.")
# Clean up resources (important for complex applications)
# Note: In a production script, `del` might not be strictly necessary if objects go out of scope,
# but it's good practice for clarity or long-running processes.
# Also, ensure network and builder are valid objects before attempting to delete.
if 'network' in locals() and network is not None: del network
if 'builder' in locals() and builder is not None: del builder
if 'TRT_LOGGER' in locals() and TRT_LOGGER is not None: del TRT_LOGGER