NVIDIA TensorRT
NVIDIA TensorRT is a Python library and C++ SDK for high-performance deep learning inference. It optimizes trained neural networks for deployment on NVIDIA GPUs, focusing on throughput, latency, and memory efficiency. The current version is 10.16.1.11. NVIDIA typically releases minor updates to TensorRT frequently, often monthly or bi-monthly, with major versions released annually.
Common errors
-
ImportError: libnvinfer.so.10: cannot open shared object file: No such file or directory
cause The Python `tensorrt` package cannot find the core TensorRT shared libraries (libnvinfer.so) on your system. This usually means the TensorRT SDK is not installed, or its installation path is not in your system's `LD_LIBRARY_PATH`.fixEnsure the TensorRT SDK is correctly installed and its `lib` directory (e.g., `/usr/src/tensorrt/lib` or `~/TensorRT-*/lib`) is added to your `LD_LIBRARY_PATH` environment variable. For example: `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/TensorRT/lib`. -
[TensorRT] ERROR: Network must have at least one output.
cause During engine building, no output tensor was explicitly marked using `network.mark_output()`.fixAfter defining your network layers, identify the tensor(s) that should be the output(s) of the network and call `network.mark_output(output_tensor)` for each. -
ValueError: Invalid dtype, must be a numpy type.
cause This error can occur when passing a NumPy array with an incompatible data type (e.g., object dtype) to TensorRT operations or when initializing a NumPy array for use with `cuda-python`.fixEnsure all NumPy arrays used as input or for memory allocation have explicit, compatible data types (e.g., `np.float32`, `np.int32`) using `.astype(np.float32)`.
Warnings
- breaking TensorRT 10.13.2 and later dropped support for CUDA 11.x, Ubuntu 20.04, and Python versions older than 3.10. Ensure your environment meets the minimum requirements.
- breaking Starting with TensorRT 10.14, samples are no longer bundled with the Python packages and are instead available exclusively in the NVIDIA/TensorRT GitHub repository. Additionally, usage of `pycuda` has been replaced by `cuda-python` for CUDA API interactions.
- deprecated Several `IPluginV2` plugins (e.g., `cropAndResizeDynamic`, `DecodeBbox3DPlugin`, `modulatedDeformConvPlugin`) have been deprecated and migrated to `IPluginV3` versions. While `IPluginV2` versions might still work, they are slated for removal in future releases.
- gotcha The `pip install tensorrt` command installs the Python bindings, but core TensorRT shared libraries (`libnvinfer.so`, `libnvinfer_plugin.so`, etc.) require a system-level installation of the TensorRT SDK, which must be compatible with your NVIDIA GPU driver, CUDA Toolkit, and cuDNN versions. Mismatched versions are a frequent source of errors.
Install
-
pip install tensorrt numpy cuda-python
Imports
- tensorrt
import tensorrt as trt
- Logger
from tensorrt import Logger
- cudart
from cuda import cudart
- Builder
trt.Builder
- NetworkDefinitionCreationFlag
trt.NetworkDefinitionCreationFlag
Quickstart
import tensorrt as trt
import numpy as np
from cuda import cudart # Using cuda-python as per release notes
# 1. Create Logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def build_engine():
# 2. Create Builder
builder = trt.Builder(TRT_LOGGER)
# 3. Create NetworkDefinition
# EXPLICIT_BATCH is required for dynamic shapes or when batch size is a dimension
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
# 4. Create BuilderConfig
config = builder.create_builder_config()
config.max_workspace_size = 1 << 20 # 1 MiB
# Define input tensor (e.g., a simple 1x3x16x16 input)
input_tensor = network.add_input(name="input_tensor", dtype=trt.float32, shape=(1, 3, 16, 16))
# Add an identity layer (input -> output directly)
output_tensor = network.add_identity(input_tensor).get_output(0)
# 6. Mark output
network.mark_output(output_tensor)
output_tensor.name = "output_tensor"
# Build and return the engine
engine = builder.build_engine(network, config)
if not engine:
raise RuntimeError("Failed to build TensorRT engine")
return engine
def main():
engine = None
runtime = None
context = None
device_input = None
device_output = None
try:
engine = build_engine()
print("TensorRT engine built successfully!")
# Create runtime and execution context
runtime = trt.Runtime(TRT_LOGGER)
# For demonstration, we use the already built engine. In real apps, you might deserialize.
context = engine.create_execution_context()
# Prepare input data
host_input = np.random.rand(1, 3, 16, 16).astype(np.float32)
host_output = np.empty_like(host_input) # Output shape is same as input for identity
# Allocate device memory
_, device_input = cudart.cudaMalloc(host_input.nbytes)
_, device_output = cudart.cudaMalloc(host_output.nbytes)
# Copy input to device
cudart.cudaMemcpy(device_input, host_input.ctypes.data, host_input.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice)
# Execute inference
# The execute_v2 takes an iterable of device pointers in the order of inputs and outputs
bindings = [int(device_input), int(device_output)]
context.execute_v2(bindings)
# Copy output back to host
cudart.cudaMemcpy(host_output.ctypes.data, device_output, host_output.nbytes, cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost)
print(f"Input shape: {host_input.shape}")
print(f"Output shape: {host_output.shape}")
print(f"Input (first 5 elements): {host_input.flatten()[:5]}")
print(f"Output (first 5 elements): {host_output.flatten()[:5]}")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up resources
if device_input: cudart.cudaFree(device_input)
if device_output: cudart.cudaFree(device_output)
if context: del context
if engine: del engine
if runtime: del runtime
if __name__ == "__main__":
main()