Tesseract.js Core: WebAssembly OCR Engine
tesseract.js-core is the foundational WebAssembly (WASM) module that powers the higher-level tesseract.js OCR library. It compiles the original Tesseract C++ engine to JavaScript and WASM using Emscripten, enabling Optical Character Recognition directly in browser and Node.js environments. The current stable version is `7.0.0` as of December 2025. This package provides the low-level API for interacting with the Tesseract engine, offering optimized builds like 'Relaxed SIMD' for performance on supported hardware and 'LSTM-only' builds for reduced size when only the modern LSTM OCR engine is needed. It typically releases new major versions to incorporate updates from the upstream Tesseract C++ project and Emscripten. Key differentiators include its pure JavaScript/WASM implementation, enabling client-side OCR, and direct access to Tesseract's core functionality, which is crucial for custom integrations or highly performance-sensitive applications that bypass the abstractions of `tesseract.js`.
Common errors
-
TypeError: TesseractCore is not a function
cause Attempting to use the imported module directly without calling its factory function. Emscripten modules typically export a factory function that must be awaited to get the actual module instance.fixEnsure you call the imported factory function and `await` its result: `const TesseractCore = await TesseractCoreInit();` -
Error: Cannot find module 'tesseract.js-core' or 'tesseract.js-core/tesseract-core'
cause Incorrect import path for the specific build or general package. `tesseract.js-core` provides multiple build variants (e.g., `tesseract-core`, `tesseract-core-simd`, `tesseract-core-lstm`) each with its own entry point.fixVerify the exact path for the desired build variant (e.g., `import TesseractCoreInit from 'tesseract.js-core/tesseract-core';`). Check the `node_modules/tesseract.js-core` directory for available `.js` files. -
WebAssembly.compile(): Wasm code is not valid: Invalid opcode
cause The WebAssembly module contains instructions (like Relaxed SIMD) not supported by the current browser or Node.js runtime environment.fixIf using a `-simd` or `-relaxed-simd` build, try switching to a less optimized build (e.g., `tesseract-core` or `tesseract-core-lstm`) that uses a more widely compatible WebAssembly feature set. Ensure your environment is up-to-date. -
Failed to load /tesseract-core.wasm: 404 Not Found
cause In browser environments, the `.wasm` file (and potentially `.traineddata` files) are not correctly served from the expected path by the web server.fixConfigure your web server to serve `.wasm` (and `.traineddata`) files from the root or the path where the `tesseract-core.js` wrapper expects them. Often, Emscripten modules expect the `.wasm` file to be co-located with the `.js` glue code or served from a configurable `locateFile` path.
Warnings
- breaking Major versions of `tesseract.js-core` must match the major versions of `tesseract.js` they are used with. For example, `tesseract.js-core v7` should only be used with `tesseract.js v7`. Mismatched versions will likely lead to runtime errors or undefined behavior.
- breaking The WASM module name changed from `TesseractCoreWASM` to `TesseractCore` in `v4.0.4`. Direct integrations relying on the specific global name or export might be affected.
- gotcha The `v7.0.0` release introduced a new 'Relaxed SIMD' build, which utilizes WebAssembly Relaxed SIMD dot-product instructions for significant performance improvements (~1.6x faster on supported hardware). However, this feature is not universally supported across all browsers and runtimes.
- gotcha Starting with `v5.0.0`, 'LSTM-only' builds were introduced. These builds are approximately 0.75MB smaller but exclusively support the Tesseract LSTM model and do not include support for the older Tesseract Legacy engine.
Install
-
npm install tesseract.js-core -
yarn add tesseract.js-core -
pnpm add tesseract.js-core
Imports
- TesseractCore
import { TesseractCore } from 'tesseract.js-core'; // Incorrect named import for factory functionimport TesseractCoreInit from 'tesseract.js-core/tesseract-core';
- TesseractCoreSIMD
const TesseractCoreSIMD = require('tesseract.js-core/tesseract-core-simd'); // Await is needed for the factory functionimport TesseractCoreSIMDInit from 'tesseract.js-core/tesseract-core-simd';
- TesseractCoreLSTM
import TesseractCoreLSTM from 'tesseract.js-core/tesseract-core-lstm.wasm'; // Import the JS wrapper, not the raw WASM
import TesseractCoreLSTMInit from 'tesseract.js-core/tesseract-core-lstm';
Quickstart
import { readFileSync } from 'fs';
import TesseractCoreInit from 'tesseract.js-core/tesseract-core';
const runOcr = async (imagePath) => {
console.log('Loading TesseractCore module...');
// The factory function returns a Promise that resolves to the Emscripten module
const TesseractCore = await TesseractCoreInit();
console.log('TesseractCore module loaded. Initializing...');
// This is a minimal example showing low-level API interaction.
// In a real application, consider using tesseract.js for a higher-level API.
const core = new TesseractCore.TesseractApi();
core.Init(process.env.LANG_PATH || '/usr/share/tessdata', 'eng'); // Initialize with language data
const imageBuffer = readFileSync(imagePath);
// Assuming imageBuffer is a PNG or JPG and Tesseract can handle its format directly
// In a browser, you might pass a Canvas/ImageData object
core.SetImage(imageBuffer, imageBuffer.width, imageBuffer.height, 4, imageBuffer.width * 4);
core.Recognize();
const text = core.GetUTF8Text();
console.log('Recognized Text:', text);
core.End();
TesseractCore.destroy(core); // Clean up Emscripten instance
};
// Example usage: Assumes 'image.png' exists in the same directory
// For a real scenario, replace 'image.png' with your actual image path.
// process.env.LANG_PATH should point to directory containing .traineddata files.
runOcr('image.png').catch(console.error);