LiteRT
raw JSON → 2.1.4 verified Wed Apr 15 auth: no python
LiteRT is Google's high-performance, open-source inference framework for deploying Machine Learning and Generative AI models on edge devices, including mobile, desktop, web, and IoT platforms. It evolved from TensorFlow Lite, offering enhanced performance, unified APIs, and broad hardware acceleration (CPU, GPU, NPU). It is production-ready, powering on-device GenAI experiences in various Google products. The current PyPI version is 2.1.4.
pip install ai-edge-litert Common errors
error ModuleNotFoundError: No module named 'ai_edge_litert' ↓
cause The 'ai_edge_litert' module is not installed or the installation is incomplete.
fix
Ensure the module is installed by running 'pip install ai-edge-litert'.
error ImportError: cannot import name 'LiteRT' from 'ai_edge_litert' ↓
cause The 'LiteRT' class or function does not exist in the 'ai_edge_litert' module.
fix
Verify the correct import statement by consulting the module's documentation.
error ImportError: generic_type: type 'InterpreterWrapper' is already registered! ↓
cause A conflict arises from multiple registrations of the 'InterpreterWrapper' type, possibly due to incompatible library versions.
fix
Ensure all related libraries are compatible and consider updating or reinstalling them.
error free(): invalid pointer ↓
cause An invalid pointer error occurs, potentially due to memory management issues within the library.
fix
Check for updates or patches that address this issue, and ensure all dependencies are correctly installed.
error Subgraph 0 partially compiled: 523 / 563 ops offloaded to 2 partitions. ↓
cause Not all operations were successfully offloaded to the NPU during model compilation.
fix
Review the model's operations and ensure they are supported by the target NPU; consult the documentation for supported operations.
Warnings
breaking LiteRT 2.x introduces the `CompiledModel API` as the recommended runtime interface for state-of-the-art hardware acceleration, diverging significantly from the older `Interpreter API` (inherited from TensorFlow Lite). C++ constructors are hidden, requiring `Create()` methods for object instantiation. Direct C header usage is removed. Access to `Tensor`, `Subgraph`, `Signature` from `litert::Model` has been removed, replaced by `SimpleTensor` and `SimpleSignature` accessed via `CompiledModel`. ↓
fix Migrate C++ code to use `Create()` methods and the `CompiledModel API`. For Python, the `tflite_runtime.interpreter.Interpreter` still works for basic inference, but consider if `CompiledModel` features are needed for advanced acceleration. Review the official LiteRT documentation for migration guides.
deprecated While the `Interpreter API` (the original TensorFlow Lite runtime) is still functional for backward compatibility, all future feature updates and performance enhancements will be exclusive to LiteRT's `CompiledModel API`. The `Interpreter API` will not receive these advancements. ↓
fix For new projects or when seeking the best performance and latest features, prioritize using the `CompiledModel API`. Existing projects using the `Interpreter API` should plan for a migration to leverage future improvements.
gotcha Version mismatches between LiteRT Python packages and other associated libraries (e.g., `litert_torch` or NPU SDKs) can lead to `ImportError` exceptions or runtime crashes, especially when using nightly builds or advanced features like Ahead-of-Time (AOT) compilation with NPU delegates. ↓
fix Ensure all related LiteRT packages and SDKs are from the same release channel and ideally the same build date, particularly for nightly versions. Consult the specific version requirements for NPU delegates.
gotcha The `ai-edge-litert` (as `tflite_runtime`) Python package is optimized for model inference and does not include all TensorFlow or LiteRT functionalities. Features like the LiteRT Converter or support for 'Select TF ops' are not present in this smaller runtime package. ↓
fix If you need model conversion capabilities or models that rely on 'Select TF ops', you must install the full `tensorflow` PyPI package instead of or in addition to `ai-edge-litert`.
gotcha Multi-threaded execution for LiteRT operators can improve performance but may also lead to increased resource consumption and higher performance variability in certain applications. Redundant data copies (e.g., when not using `ByteBuffers` with the Java API) can also significantly degrade performance. ↓
fix Carefully benchmark your application with varying thread counts to find the optimal balance for your specific device and use case. Design your data pipeline to minimize redundant copies, particularly when passing inputs to and reading outputs from the model.
Imports
- Interpreter
from tflite_runtime.interpreter import Interpreter - convert wrong
from ai_edge_litert.aot import aot_compilecorrectfrom litert_torch import convert
Quickstart
import numpy as np
from tflite_runtime.interpreter import Interpreter
import os
# Ensure you have a .tflite model file, e.g., downloaded from Google AI Edge.
# For this example, we'll assume 'model.tflite' exists in the current directory.
# Replace 'model.tflite' with your actual model path.
model_path = os.environ.get('LITERT_MODEL_PATH', 'model.tflite')
try:
# Load the TFLite model and allocate tensors.
interpreter = Interpreter(model_path=model_path)
interpreter.allocate_tensors()
# Get input and output tensor details.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Assuming a single input tensor for simplicity
input_shape = input_details[0]['shape']
input_dtype = input_details[0]['dtype']
# Create a dummy input tensor (replace with actual data for your model)
input_data = np.array(np.random.random_sample(input_shape), dtype=input_dtype)
# Set the tensor to point to the input data to be inferred.
interpreter.set_tensor(input_details[0]['index'], input_data)
# Run inference.
interpreter.invoke()
# Get the output tensor.
# Assuming a single output tensor for simplicity
output_data = interpreter.get_tensor(output_details[0]['index'])
print(f"Model loaded from: {model_path}")
print(f"Input shape: {input_shape}, Dtype: {input_dtype}")
print(f"Output data shape: {output_data.shape}, Dtype: {output_data.dtype}")
print(f"First 5 output values: {output_data.flatten()[:5]}")
except FileNotFoundError:
print(f"Error: Model file not found at '{model_path}'. Please provide a valid .tflite model path.")
except Exception as e:
print(f"An error occurred during model inference: {e}")