ONNX Runtime Web

1.24.3 · active · verified Sun Apr 19

ONNX Runtime Web (ORT Web) is a JavaScript library designed to execute ONNX (Open Neural Network Exchange) machine learning models directly within web browsers and Node.js environments. The current stable version is 1.24.3. The project maintains a regular release cadence, with patch releases frequently published between major quarterly releases, ensuring timely bug fixes, security enhancements, and performance updates. Key differentiators include its ability to run models client-side, reducing server-client communication and enhancing user privacy. It leverages WebAssembly (WASM) for efficient CPU execution and provides GPU acceleration through WebGL (in maintenance mode) and the more modern WebGPU (experimental, launched in v1.17). ORT Web supports a broad range of ONNX operators and offers optimizations for performance and memory usage, making it suitable for deploying various AI models like image classification, object detection, and generative AI directly in web applications.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize an ONNX Runtime Web session, configure execution providers, load a pre-trained ONNX model (MNIST example), create an input Tensor, run inference, and access the output results, including fallback strategies for execution providers.

import * as ort from 'onnxruntime-web';

const runModel = async () => {
  // Configure ONNX Runtime Web environment (optional, for performance)
  // Enable WebAssembly SIMD for performance if supported
  ort.env.wasm.simd = true;
  // Set number of threads for WebAssembly (auto-detected if 0, 1 disables multithreading)
  // Requires cross-origin isolation headers for multithreading in browsers.
  ort.env.wasm.numThreads = Math.min(navigator.hardwareConcurrency / 2, 4) || 1; // Example setting

  // Choose execution providers. WebGPU is experimental but offers best performance.
  // Fallback to WASM (CPU) if WebGPU is not available.
  const executionProviders = ['webgpu', 'wasm'];

  // Load a sample ONNX model (e.g., a simple identity model from a public URL).
  // In a real app, place your model.onnx in the 'public' folder or fetch from a CDN.
  const modelPath = 'https://onnxruntime.ai/onnx-models/mnist-8.onnx'; // Example model

  let session;
  try {
    session = await ort.InferenceSession.create(modelPath, { executionProviders });
    console.log('Inference session created successfully.');
  } catch (e) {
    console.error(`Failed to create inference session: ${e.message}. Trying WASM only.`);
    // Fallback if initial providers fail
    session = await ort.InferenceSession.create(modelPath, { executionProviders: ['wasm'] });
    console.log('Inference session created with WASM fallback.');
  }

  // Create a dummy input tensor for an MNIST model (28x28 grayscale image).
  // Model expects input 'Input3' of shape [1, 1, 28, 28] and type 'float32'.
  const inputData = new Float32Array(1 * 1 * 28 * 28).fill(0.5); // Example: fill with 0.5
  const inputTensor = new ort.Tensor('float32', inputData, [1, 1, 28, 28]);

  // Prepare input feeds, matching model's expected input names.
  const feeds = { 'Input3': inputTensor };

  // Run inference.
  try {
    const results = await session.run(feeds);

    // Get output (model has one output 'Plus21_Output').
    const outputTensor = results['Plus21_Output'];
    console.log('Inference results:', outputTensor.data);
    console.log('Output tensor type:', outputTensor.type);
    console.log('Output tensor dimensions:', outputTensor.dims);
  } catch (e) {
    console.error(`Failed to run inference: ${e.message}`);
  }
};

runModel();

view raw JSON →