ONNX Runtime Web
ONNX Runtime Web (ORT Web) is a JavaScript library designed to execute ONNX (Open Neural Network Exchange) machine learning models directly within web browsers and Node.js environments. The current stable version is 1.24.3. The project maintains a regular release cadence, with patch releases frequently published between major quarterly releases, ensuring timely bug fixes, security enhancements, and performance updates. Key differentiators include its ability to run models client-side, reducing server-client communication and enhancing user privacy. It leverages WebAssembly (WASM) for efficient CPU execution and provides GPU acceleration through WebGL (in maintenance mode) and the more modern WebGPU (experimental, launched in v1.17). ORT Web supports a broad range of ONNX operators and offers optimizations for performance and memory usage, making it suitable for deploying various AI models like image classification, object detection, and generative AI directly in web applications.
Common errors
-
Uncaught (in promise) Error: no available backend found. ERR: [wasm] Error: Aborted(CompileError: WebAssembly.instantiate(): expected magic word 00 61 73 6d, found 3c 21 44 4f @+0)
cause The WebAssembly (`.wasm`) artifact is not being served correctly by the web server, or the URL configured for its path is incorrect. The browser received HTML or another file type instead of the WASM binary.fixVerify that your web server serves `.wasm` files with the `application/wasm` MIME type and that `ort.env.wasm.wasmPaths` is correctly configured to point to the directory containing the WASM files. Ensure the WASM files are in your `public` folder or a location accessible by the browser. -
Uncaught ReferenceError: InferenceSession is not defined
cause Attempting to use `InferenceSession` (or `Tensor`, `env`, etc.) without a proper ES module import or when a CommonJS `require` call is expected but not handled by the build system.fixUse ES module syntax: `import * as ort from 'onnxruntime-web';` and then `ort.InferenceSession.create(...)`. Alternatively, `import { InferenceSession } from 'onnxruntime-web';`. Ensure your build tool (Webpack, Rollup, Vite) is configured for ES modules. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Input 'input_name' is not found in the model
cause The name used for the input tensor in the `feeds` object does not match any of the expected input names defined in the ONNX model.fixInspect your ONNX model to determine the exact input tensor names. Tools like Netron can visualize the model graph and its input/output names. Ensure the keys in your `feeds` object (`await session.run(feeds)`) precisely match these names. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: input_name for the following indices index: X Got: Y Expected: Z
cause The shape (dimensions) or data type of an input `Tensor` provided to `session.run()` does not match the shape or data type expected by the ONNX model.fixReview the ONNX model's input signature (using Netron or similar tools) for the expected dimensions and data type. Ensure the `Tensor` you create (e.g., `new ort.Tensor(type, data, dims)`) exactly matches these specifications. Pay close attention to batch size (often the first dimension). -
Failed to create inference session: Error: previous call to 'initWasm()' failed.
cause This often occurs when trying to initialize a new ONNX Runtime Web session after a previous initialization attempt failed or was not properly cleaned up, or if multiple WASM execution environments are being initialized incorrectly within the same context.fixEnsure `ort.InferenceSession.create()` is called only once per model or use a single, shared session instance. If using multiple models, manage their sessions carefully. This error can also stem from underlying WASM loading issues (e.g., incorrect `wasmPaths` or MIME types) which prevented the initial `initWasm()` call from succeeding.
Warnings
- breaking Multiple security vulnerabilities (heap out-of-bounds read/write, integer truncation) were fixed in patch releases like v1.24.3 and v1.24.2. Older versions are vulnerable to these issues, which could lead to arbitrary code execution or data leakage if processing maliciously crafted ONNX models.
- breaking As of v1.24.1, ONNX Runtime (the core project, affecting underlying builds) no longer publishes x86_64 binaries for macOS/iOS, and the minimum macOS version supported is raised to 14.0. This may impact developers targeting older macOS/iOS versions or building custom WASM artifacts for these platforms.
- gotcha Multi-threading for WebAssembly (WASM) in browsers requires `SharedArrayBuffer` and `Atomics`, which in turn necessitates setting specific Cross-Origin Isolation headers (`Cross-Origin-Opener-Policy: same-origin` and `Cross-Origin-Embedder-Policy: require-corp`) on your web server. Without these headers, ONNX Runtime Web will silently fall back to single-threaded WASM, potentially impacting performance.
- gotcha The `onnxruntime-web` package's WASM files (`.wasm`, `.mjs`) must be served with correct MIME types by your web server (e.g., `application/wasm` for `.wasm`, `application/javascript` for `.mjs`). Incorrect MIME types can lead to `expected magic word 00 61 73 6d, found 3c 21 44 4f @+0` errors, indicating the browser is not correctly interpreting the WebAssembly binary.
- deprecated WebGL execution provider is in maintenance mode, and it is recommended to use WebGPU for better performance and broader operator coverage in modern browsers. WebGPU was officially launched in v1.17 and offers significant advantages for complex ML workloads.
Install
-
npm install onnxruntime-web -
yarn add onnxruntime-web -
pnpm add onnxruntime-web
Imports
- ort
const ort = require('onnxruntime-web');import * as ort from 'onnxruntime-web';
- InferenceSession
import { InferenceSession } from 'onnxruntime-web/webgpu';import { InferenceSession, Tensor } from 'onnxruntime-web'; - Tensor
import { Tensor } from 'onnxruntime-web/webgpu';import { Tensor } from 'onnxruntime-web'; - ort.env
import { env } from 'onnxruntime-web'; env.wasm.numThreads = 4;import * as ort from 'onnxruntime-web'; ort.env.wasm.numThreads = 4;
Quickstart
import * as ort from 'onnxruntime-web';
const runModel = async () => {
// Configure ONNX Runtime Web environment (optional, for performance)
// Enable WebAssembly SIMD for performance if supported
ort.env.wasm.simd = true;
// Set number of threads for WebAssembly (auto-detected if 0, 1 disables multithreading)
// Requires cross-origin isolation headers for multithreading in browsers.
ort.env.wasm.numThreads = Math.min(navigator.hardwareConcurrency / 2, 4) || 1; // Example setting
// Choose execution providers. WebGPU is experimental but offers best performance.
// Fallback to WASM (CPU) if WebGPU is not available.
const executionProviders = ['webgpu', 'wasm'];
// Load a sample ONNX model (e.g., a simple identity model from a public URL).
// In a real app, place your model.onnx in the 'public' folder or fetch from a CDN.
const modelPath = 'https://onnxruntime.ai/onnx-models/mnist-8.onnx'; // Example model
let session;
try {
session = await ort.InferenceSession.create(modelPath, { executionProviders });
console.log('Inference session created successfully.');
} catch (e) {
console.error(`Failed to create inference session: ${e.message}. Trying WASM only.`);
// Fallback if initial providers fail
session = await ort.InferenceSession.create(modelPath, { executionProviders: ['wasm'] });
console.log('Inference session created with WASM fallback.');
}
// Create a dummy input tensor for an MNIST model (28x28 grayscale image).
// Model expects input 'Input3' of shape [1, 1, 28, 28] and type 'float32'.
const inputData = new Float32Array(1 * 1 * 28 * 28).fill(0.5); // Example: fill with 0.5
const inputTensor = new ort.Tensor('float32', inputData, [1, 1, 28, 28]);
// Prepare input feeds, matching model's expected input names.
const feeds = { 'Input3': inputTensor };
// Run inference.
try {
const results = await session.run(feeds);
// Get output (model has one output 'Plus21_Output').
const outputTensor = results['Plus21_Output'];
console.log('Inference results:', outputTensor.data);
console.log('Output tensor type:', outputTensor.type);
console.log('Output tensor dimensions:', outputTensor.dims);
} catch (e) {
console.error(`Failed to run inference: ${e.message}`);
}
};
runModel();