{"id":6305,"library":"ai-edge-litert","title":"LiteRT","description":"LiteRT is Google's high-performance, open-source inference framework for deploying Machine Learning and Generative AI models on edge devices, including mobile, desktop, web, and IoT platforms. It evolved from TensorFlow Lite, offering enhanced performance, unified APIs, and broad hardware acceleration (CPU, GPU, NPU). It is production-ready, powering on-device GenAI experiences in various Google products. The current PyPI version is 2.1.4.","status":"active","version":"2.1.4","language":"en","source_language":"en","source_url":"https://github.com/google-ai-edge/LiteRT","tags":["machine-learning","on-device-ai","edge-ai","llm","generative-ai","inference","tensorflow-lite","mobile"],"install":[{"cmd":"pip install ai-edge-litert","lang":"bash","label":"Install LiteRT Python runtime"}],"dependencies":[{"reason":"Required for converting PyTorch models to LiteRT format using litert_torch.","package":"pytorch","optional":true},{"reason":"The full TensorFlow package is required for certain advanced APIs like the LiteRT Converter or if models have dependencies on 'Select TF ops', which are not included in the smaller ai-edge-litert runtime package.","package":"tensorflow","optional":true},{"reason":"Common dependency for array manipulation when working with ML models.","package":"numpy"}],"imports":[{"note":"For Python inference on edge devices, the 'ai-edge-litert' PyPI package essentially provides the 'tflite_runtime' module, which contains the Interpreter class for running .tflite models efficiently. This is the recommended import for runtime inference.","symbol":"Interpreter","correct":"from tflite_runtime.interpreter import Interpreter"},{"note":"To convert PyTorch models to LiteRT format, use the `litert_torch.convert` function. Older references to `ai_edge_litert.aot` for Ahead-of-Time compilation might exist but `litert_torch` is the updated path for PyTorch conversion.","wrong":"from ai_edge_litert.aot import aot_compile","symbol":"convert","correct":"from litert_torch import convert"}],"quickstart":{"code":"import numpy as np\nfrom tflite_runtime.interpreter import Interpreter\nimport os\n\n# Ensure you have a .tflite model file, e.g., downloaded from Google AI Edge.\n# For this example, we'll assume 'model.tflite' exists in the current directory.\n# Replace 'model.tflite' with your actual model path.\nmodel_path = os.environ.get('LITERT_MODEL_PATH', 'model.tflite')\n\ntry:\n    # Load the TFLite model and allocate tensors.\n    interpreter = Interpreter(model_path=model_path)\n    interpreter.allocate_tensors()\n\n    # Get input and output tensor details.\n    input_details = interpreter.get_input_details()\n    output_details = interpreter.get_output_details()\n\n    # Assuming a single input tensor for simplicity\n    input_shape = input_details[0]['shape']\n    input_dtype = input_details[0]['dtype']\n\n    # Create a dummy input tensor (replace with actual data for your model)\n    input_data = np.array(np.random.random_sample(input_shape), dtype=input_dtype)\n\n    # Set the tensor to point to the input data to be inferred.\n    interpreter.set_tensor(input_details[0]['index'], input_data)\n\n    # Run inference.\n    interpreter.invoke()\n\n    # Get the output tensor.\n    # Assuming a single output tensor for simplicity\n    output_data = interpreter.get_tensor(output_details[0]['index'])\n\n    print(f\"Model loaded from: {model_path}\")\n    print(f\"Input shape: {input_shape}, Dtype: {input_dtype}\")\n    print(f\"Output data shape: {output_data.shape}, Dtype: {output_data.dtype}\")\n    print(f\"First 5 output values: {output_data.flatten()[:5]}\")\n\nexcept FileNotFoundError:\n    print(f\"Error: Model file not found at '{model_path}'. Please provide a valid .tflite model path.\")\nexcept Exception as e:\n    print(f\"An error occurred during model inference: {e}\")\n","lang":"python","description":"This quickstart demonstrates how to load and run a LiteRT (.tflite) model using the Python runtime. It initializes the interpreter, prepares a dummy input tensor, performs inference, and retrieves the output. Replace `model.tflite` with the actual path to your LiteRT model."},"warnings":[{"fix":"Migrate C++ code to use `Create()` methods and the `CompiledModel API`. For Python, the `tflite_runtime.interpreter.Interpreter` still works for basic inference, but consider if `CompiledModel` features are needed for advanced acceleration. Review the official LiteRT documentation for migration guides.","message":"LiteRT 2.x introduces the `CompiledModel API` as the recommended runtime interface for state-of-the-art hardware acceleration, diverging significantly from the older `Interpreter API` (inherited from TensorFlow Lite). C++ constructors are hidden, requiring `Create()` methods for object instantiation. Direct C header usage is removed. Access to `Tensor`, `Subgraph`, `Signature` from `litert::Model` has been removed, replaced by `SimpleTensor` and `SimpleSignature` accessed via `CompiledModel`.","severity":"breaking","affected_versions":"2.x and later"},{"fix":"For new projects or when seeking the best performance and latest features, prioritize using the `CompiledModel API`. Existing projects using the `Interpreter API` should plan for a migration to leverage future improvements.","message":"While the `Interpreter API` (the original TensorFlow Lite runtime) is still functional for backward compatibility, all future feature updates and performance enhancements will be exclusive to LiteRT's `CompiledModel API`. The `Interpreter API` will not receive these advancements.","severity":"deprecated","affected_versions":"2.x and later"},{"fix":"Ensure all related LiteRT packages and SDKs are from the same release channel and ideally the same build date, particularly for nightly versions. Consult the specific version requirements for NPU delegates.","message":"Version mismatches between LiteRT Python packages and other associated libraries (e.g., `litert_torch` or NPU SDKs) can lead to `ImportError` exceptions or runtime crashes, especially when using nightly builds or advanced features like Ahead-of-Time (AOT) compilation with NPU delegates.","severity":"gotcha","affected_versions":"All versions, particularly with nightly builds or complex toolchains"},{"fix":"If you need model conversion capabilities or models that rely on 'Select TF ops', you must install the full `tensorflow` PyPI package instead of or in addition to `ai-edge-litert`.","message":"The `ai-edge-litert` (as `tflite_runtime`) Python package is optimized for model inference and does not include all TensorFlow or LiteRT functionalities. Features like the LiteRT Converter or support for 'Select TF ops' are not present in this smaller runtime package.","severity":"gotcha","affected_versions":"All versions of the `ai-edge-litert` PyPI package"},{"fix":"Carefully benchmark your application with varying thread counts to find the optimal balance for your specific device and use case. Design your data pipeline to minimize redundant copies, particularly when passing inputs to and reading outputs from the model.","message":"Multi-threaded execution for LiteRT operators can improve performance but may also lead to increased resource consumption and higher performance variability in certain applications. Redundant data copies (e.g., when not using `ByteBuffers` with the Java API) can also significantly degrade performance.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}