LiteRT
LiteRT is Google's high-performance, open-source inference framework for deploying Machine Learning and Generative AI models on edge devices, including mobile, desktop, web, and IoT platforms. It evolved from TensorFlow Lite, offering enhanced performance, unified APIs, and broad hardware acceleration (CPU, GPU, NPU). It is production-ready, powering on-device GenAI experiences in various Google products. The current PyPI version is 2.1.4.
Warnings
- breaking LiteRT 2.x introduces the `CompiledModel API` as the recommended runtime interface for state-of-the-art hardware acceleration, diverging significantly from the older `Interpreter API` (inherited from TensorFlow Lite). C++ constructors are hidden, requiring `Create()` methods for object instantiation. Direct C header usage is removed. Access to `Tensor`, `Subgraph`, `Signature` from `litert::Model` has been removed, replaced by `SimpleTensor` and `SimpleSignature` accessed via `CompiledModel`.
- deprecated While the `Interpreter API` (the original TensorFlow Lite runtime) is still functional for backward compatibility, all future feature updates and performance enhancements will be exclusive to LiteRT's `CompiledModel API`. The `Interpreter API` will not receive these advancements.
- gotcha Version mismatches between LiteRT Python packages and other associated libraries (e.g., `litert_torch` or NPU SDKs) can lead to `ImportError` exceptions or runtime crashes, especially when using nightly builds or advanced features like Ahead-of-Time (AOT) compilation with NPU delegates.
- gotcha The `ai-edge-litert` (as `tflite_runtime`) Python package is optimized for model inference and does not include all TensorFlow or LiteRT functionalities. Features like the LiteRT Converter or support for 'Select TF ops' are not present in this smaller runtime package.
- gotcha Multi-threaded execution for LiteRT operators can improve performance but may also lead to increased resource consumption and higher performance variability in certain applications. Redundant data copies (e.g., when not using `ByteBuffers` with the Java API) can also significantly degrade performance.
Install
-
pip install ai-edge-litert
Imports
- Interpreter
from tflite_runtime.interpreter import Interpreter
- convert
from litert_torch import convert
Quickstart
import numpy as np
from tflite_runtime.interpreter import Interpreter
import os
# Ensure you have a .tflite model file, e.g., downloaded from Google AI Edge.
# For this example, we'll assume 'model.tflite' exists in the current directory.
# Replace 'model.tflite' with your actual model path.
model_path = os.environ.get('LITERT_MODEL_PATH', 'model.tflite')
try:
# Load the TFLite model and allocate tensors.
interpreter = Interpreter(model_path=model_path)
interpreter.allocate_tensors()
# Get input and output tensor details.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Assuming a single input tensor for simplicity
input_shape = input_details[0]['shape']
input_dtype = input_details[0]['dtype']
# Create a dummy input tensor (replace with actual data for your model)
input_data = np.array(np.random.random_sample(input_shape), dtype=input_dtype)
# Set the tensor to point to the input data to be inferred.
interpreter.set_tensor(input_details[0]['index'], input_data)
# Run inference.
interpreter.invoke()
# Get the output tensor.
# Assuming a single output tensor for simplicity
output_data = interpreter.get_tensor(output_details[0]['index'])
print(f"Model loaded from: {model_path}")
print(f"Input shape: {input_shape}, Dtype: {input_dtype}")
print(f"Output data shape: {output_data.shape}, Dtype: {output_data.dtype}")
print(f"First 5 output values: {output_data.flatten()[:5]}")
except FileNotFoundError:
print(f"Error: Model file not found at '{model_path}'. Please provide a valid .tflite model path.")
except Exception as e:
print(f"An error occurred during model inference: {e}")