TensorRT CUDA 12 Bindings
TensorRT-cu12-bindings provides Python bindings for NVIDIA's TensorRT, a high-performance deep learning inference optimizer and runtime. This specific package targets CUDA 12.x environments. It is actively developed by NVIDIA, with frequent releases aligning with major TensorRT and CUDA versions, typically every few months.
Common errors
-
No module named 'tensorrt'
cause The `tensorrt-cuXX-bindings` package is not installed or the Python environment is incorrect.fixEnsure you have installed the correct package using `pip install tensorrt-cu12-bindings` in your active Python environment. -
[TRT] ERROR: [checkRuntime.cpp::checkRuntime::0] CUDA driver version is insufficient for CUDA runtime version
cause The NVIDIA CUDA driver installed on your system is too old or incompatible with the CUDA runtime version linked by the `tensorrt-cu12-bindings` package.fixUpgrade your NVIDIA GPU drivers to a version compatible with CUDA 12.x. Check NVIDIA's official documentation for driver requirements for specific CUDA versions. -
[TRT] ERROR: Network must have at least one output.
cause The TensorRT network definition was created but no output tensor was explicitly marked using `network.mark_output()`.fixAfter defining the computational graph, ensure you call `network.mark_output(output_tensor)` for all desired output tensors. -
[TRT] ERROR: [builder.cpp::buildEngine::] Error Code 4: Internal Error (Could not find any implementation for node...)
cause TensorRT's builder could not find an implementation for a specific layer or operation on the target hardware, possibly due to unsupported operations, invalid input shapes, or insufficient `max_workspace_size`.fixReview your network definition for unsupported operations, ensure input dimensions are valid, and consider increasing `config.max_workspace_size` if memory is an issue. Check TensorRT documentation for layer support. -
TypeError: argument of type 'NoneType' is not iterable (or similar NoneType errors after API calls)
cause A TensorRT API call (e.g., `network.add_input`, `network.add_convolution`) failed to create the intended object and returned `None`, but the subsequent code tried to operate on it as if it were a valid object.fixCheck the return values of TensorRT API calls, especially after defining layers or adding inputs/outputs. This often indicates invalid parameters passed to the function or an internal TensorRT error preventing object creation.
Warnings
- breaking The `tensorrt-cuXX-bindings` packages are tightly coupled with specific CUDA versions (e.g., `cu12` for CUDA 12.x). Using a mismatched CUDA driver or toolkit version on your system will lead to runtime failures or import errors.
- breaking TensorRT 10.13.2 dropped official support for CUDA 11.X. Additionally, official samples and demos now require Python 3.10 or newer.
- breaking Custom TensorRT plugins (e.g., those implementing `IPluginV2`) are being migrated to `IPluginV3`. Older plugin versions may be deprecated and removed in future releases.
- deprecated Official TensorRT Python samples and tools have transitioned from `pycuda` to `cuda-python` for low-level CUDA interactions.
- gotcha Starting with TensorRT 10.14, samples and demos are no longer included directly within the `tensorrt-cuXX-bindings` Python packages.
Install
-
pip install tensorrt-cu12-bindings
Imports
- tensorrt
import tensorrt as trt
Quickstart
import tensorrt as trt
import numpy as np
# 1. Create a logger (TRT_LOGGER = trt.Logger(trt.Logger.INFO) for more verbose output)
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
# 2. Create builder, network, and configuration
builder = trt.Builder(TRT_LOGGER)
# Explicit batch is required for some features (e.g., dynamic shapes)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
# Configure builder options
# max_workspace_size: The maximum GPU memory size (in bytes) that TensorRT can use for temporary buffers.
config.max_workspace_size = 1 << 20 # 1 MiB (adjust as needed for larger models)
# 3. Define the network: a simple identity layer for demonstration
# Input shape (batch_size, channels, height, width)
input_shape = (1, 3, 224, 224)
input_tensor = network.add_input(name="input_tensor", dtype=trt.float32, shape=input_shape)
# Add an identity layer as a simple example operation
identity_layer = network.add_identity(input_tensor)
output_tensor = identity_layer.get_output(0)
# Mark the output tensor
network.mark_output(output_tensor)
output_tensor.name = "output_tensor"
# 4. Build the engine
print(f"Building TensorRT engine with input shape {input_shape}...")
engine = builder.build_engine(network, config)
if engine:
print("TensorRT engine built successfully!")
# Example: serialize the engine to disk
# with open("my_identity_engine.trt", "wb") as f:
# f.write(engine.serialize())
# print("Engine serialized to my_identity_engine.trt")
else:
print("Failed to build TensorRT engine.")
# Cleanup
del network, builder, config, engine