{"id":8709,"library":"tensorrt","title":"NVIDIA TensorRT","description":"NVIDIA TensorRT is a Python library and C++ SDK for high-performance deep learning inference. It optimizes trained neural networks for deployment on NVIDIA GPUs, focusing on throughput, latency, and memory efficiency. The current version is 10.16.1.11. NVIDIA typically releases minor updates to TensorRT frequently, often monthly or bi-monthly, with major versions released annually.","status":"active","version":"10.16.1.11","language":"en","source_language":"en","source_url":"https://github.com/nvidia/tensorrt","tags":["deep-learning","inference","optimization","nvidia","gpu","cuda"],"install":[{"cmd":"pip install tensorrt numpy cuda-python","lang":"bash","label":"Install Python package (metapackage)"}],"dependencies":[{"reason":"Core TensorRT Python bindings, pulled by 'tensorrt' metapackage.","package":"nvidia-tensorrt"},{"reason":"Required for array manipulation and data handling.","package":"numpy"},{"reason":"Recommended for CUDA API interactions (e.g., memory management) instead of pycuda since TensorRT 10.14.","package":"cuda-python"}],"imports":[{"note":"Standard alias for brevity.","symbol":"tensorrt","correct":"import tensorrt as trt"},{"note":"Accessing the main logger class.","symbol":"Logger","correct":"from tensorrt import Logger"},{"note":"For CUDA Runtime API calls, using the 'cuda-python' library.","symbol":"cudart","correct":"from cuda import cudart"},{"note":"Used to create and configure TensorRT engines.","symbol":"Builder","correct":"trt.Builder"},{"note":"Enum for network creation flags, e.g., EXPLICIT_BATCH.","symbol":"NetworkDefinitionCreationFlag","correct":"trt.NetworkDefinitionCreationFlag"}],"quickstart":{"code":"import tensorrt as trt\nimport numpy as np\nfrom cuda import cudart # Using cuda-python as per release notes\n\n# 1. Create Logger\nTRT_LOGGER = trt.Logger(trt.Logger.WARNING)\n\ndef build_engine():\n    # 2. Create Builder\n    builder = trt.Builder(TRT_LOGGER)\n\n    # 3. Create NetworkDefinition\n    # EXPLICIT_BATCH is required for dynamic shapes or when batch size is a dimension\n    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))\n\n    # 4. Create BuilderConfig\n    config = builder.create_builder_config()\n    config.max_workspace_size = 1 << 20 # 1 MiB\n\n    # Define input tensor (e.g., a simple 1x3x16x16 input)\n    input_tensor = network.add_input(name=\"input_tensor\", dtype=trt.float32, shape=(1, 3, 16, 16))\n\n    # Add an identity layer (input -> output directly)\n    output_tensor = network.add_identity(input_tensor).get_output(0)\n\n    # 6. Mark output\n    network.mark_output(output_tensor)\n    output_tensor.name = \"output_tensor\"\n\n    # Build and return the engine\n    engine = builder.build_engine(network, config)\n    if not engine:\n        raise RuntimeError(\"Failed to build TensorRT engine\")\n    return engine\n\ndef main():\n    engine = None\n    runtime = None\n    context = None\n    device_input = None\n    device_output = None\n    try:\n        engine = build_engine()\n        print(\"TensorRT engine built successfully!\")\n\n        # Create runtime and execution context\n        runtime = trt.Runtime(TRT_LOGGER)\n        # For demonstration, we use the already built engine. In real apps, you might deserialize.\n        context = engine.create_execution_context()\n\n        # Prepare input data\n        host_input = np.random.rand(1, 3, 16, 16).astype(np.float32)\n        host_output = np.empty_like(host_input) # Output shape is same as input for identity\n\n        # Allocate device memory\n        _, device_input = cudart.cudaMalloc(host_input.nbytes)\n        _, device_output = cudart.cudaMalloc(host_output.nbytes)\n\n        # Copy input to device\n        cudart.cudaMemcpy(device_input, host_input.ctypes.data, host_input.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice)\n\n        # Execute inference\n        # The execute_v2 takes an iterable of device pointers in the order of inputs and outputs\n        bindings = [int(device_input), int(device_output)]\n        context.execute_v2(bindings)\n\n        # Copy output back to host\n        cudart.cudaMemcpy(host_output.ctypes.data, device_output, host_output.nbytes, cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost)\n\n        print(f\"Input shape: {host_input.shape}\")\n        print(f\"Output shape: {host_output.shape}\")\n        print(f\"Input (first 5 elements): {host_input.flatten()[:5]}\")\n        print(f\"Output (first 5 elements): {host_output.flatten()[:5]}\")\n\n    except Exception as e:\n        print(f\"An error occurred: {e}\")\n    finally:\n        # Clean up resources\n        if device_input: cudart.cudaFree(device_input)\n        if device_output: cudart.cudaFree(device_output)\n        if context: del context\n        if engine: del engine\n        if runtime: del runtime\n\nif __name__ == \"__main__\":\n    main()\n","lang":"python","description":"This quickstart demonstrates how to build a simple TensorRT engine for an identity operation. It uses the `cuda-python` library for CUDA memory management, reflecting modern TensorRT usage. The process involves creating a logger, builder, network definition, configuration, defining input/output tensors, building the engine, and then performing a basic inference with device memory management."},"warnings":[{"fix":"Upgrade your CUDA toolkit to 12.x or later, your OS to Ubuntu 22.04 or later, and your Python environment to 3.10 or later.","message":"TensorRT 10.13.2 and later dropped support for CUDA 11.x, Ubuntu 20.04, and Python versions older than 3.10. Ensure your environment meets the minimum requirements.","severity":"breaking","affected_versions":">=10.13.2"},{"fix":"Refer to the official NVIDIA/TensorRT GitHub repository for samples. Update your code to use `cuda-python` (e.g., `from cuda import cudart`) instead of `pycuda` for CUDA memory management and operations.","message":"Starting with TensorRT 10.14, samples are no longer bundled with the Python packages and are instead available exclusively in the NVIDIA/TensorRT GitHub repository. Additionally, usage of `pycuda` has been replaced by `cuda-python` for CUDA API interactions.","severity":"breaking","affected_versions":">=10.14"},{"fix":"Review plugin usage and migrate to the corresponding `IPluginV3` versions where available to ensure future compatibility. Consult TensorRT release notes for specific plugin migrations.","message":"Several `IPluginV2` plugins (e.g., `cropAndResizeDynamic`, `DecodeBbox3DPlugin`, `modulatedDeformConvPlugin`) have been deprecated and migrated to `IPluginV3` versions. While `IPluginV2` versions might still work, they are slated for removal in future releases.","severity":"deprecated","affected_versions":">=10.12"},{"fix":"Follow the official NVIDIA TensorRT Installation Guide, ensuring you install the TensorRT SDK (via tarball, debian package, or Docker) with versions compatible with your system's CUDA, cuDNN, and GPU driver before running `pip install tensorrt`.","message":"The `pip install tensorrt` command installs the Python bindings, but core TensorRT shared libraries (`libnvinfer.so`, `libnvinfer_plugin.so`, etc.) require a system-level installation of the TensorRT SDK, which must be compatible with your NVIDIA GPU driver, CUDA Toolkit, and cuDNN versions. Mismatched versions are a frequent source of errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the TensorRT SDK is correctly installed and its `lib` directory (e.g., `/usr/src/tensorrt/lib` or `~/TensorRT-*/lib`) is added to your `LD_LIBRARY_PATH` environment variable. For example: `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/TensorRT/lib`.","cause":"The Python `tensorrt` package cannot find the core TensorRT shared libraries (libnvinfer.so) on your system. This usually means the TensorRT SDK is not installed, or its installation path is not in your system's `LD_LIBRARY_PATH`.","error":"ImportError: libnvinfer.so.10: cannot open shared object file: No such file or directory"},{"fix":"After defining your network layers, identify the tensor(s) that should be the output(s) of the network and call `network.mark_output(output_tensor)` for each.","cause":"During engine building, no output tensor was explicitly marked using `network.mark_output()`.","error":"[TensorRT] ERROR: Network must have at least one output."},{"fix":"Ensure all NumPy arrays used as input or for memory allocation have explicit, compatible data types (e.g., `np.float32`, `np.int32`) using `.astype(np.float32)`.","cause":"This error can occur when passing a NumPy array with an incompatible data type (e.g., object dtype) to TensorRT operations or when initializing a NumPy array for use with `cuda-python`.","error":"ValueError: Invalid dtype, must be a numpy type."}]}