{"id":12755,"library":"tesseract.js-core","title":"Tesseract.js Core: WebAssembly OCR Engine","description":"tesseract.js-core is the foundational WebAssembly (WASM) module that powers the higher-level tesseract.js OCR library. It compiles the original Tesseract C++ engine to JavaScript and WASM using Emscripten, enabling Optical Character Recognition directly in browser and Node.js environments. The current stable version is `7.0.0` as of December 2025. This package provides the low-level API for interacting with the Tesseract engine, offering optimized builds like 'Relaxed SIMD' for performance on supported hardware and 'LSTM-only' builds for reduced size when only the modern LSTM OCR engine is needed. It typically releases new major versions to incorporate updates from the upstream Tesseract C++ project and Emscripten. Key differentiators include its pure JavaScript/WASM implementation, enabling client-side OCR, and direct access to Tesseract's core functionality, which is crucial for custom integrations or highly performance-sensitive applications that bypass the abstractions of `tesseract.js`.","status":"active","version":"6.1.2","language":"javascript","source_language":"en","source_url":"https://github.com/naptha/tesseract.js-core","tags":["javascript","ocr","tesseract","emscripten","port","c++","api","recognize"],"install":[{"cmd":"npm install tesseract.js-core","lang":"bash","label":"npm"},{"cmd":"yarn add tesseract.js-core","lang":"bash","label":"yarn"},{"cmd":"pnpm add tesseract.js-core","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"The package exports a default factory function (e.g., `TesseractCoreInit`) from specific build paths, not the package root. You need to call this function to get the WASM module instance.","wrong":"import { TesseractCore } from 'tesseract.js-core'; // Incorrect named import for factory function","symbol":"TesseractCore","correct":"import TesseractCoreInit from 'tesseract.js-core/tesseract-core';"},{"note":"For builds optimized with WebAssembly SIMD instructions, import from the `-simd` path. Remember to await the factory function call to get the module.","wrong":"const TesseractCoreSIMD = require('tesseract.js-core/tesseract-core-simd'); // Await is needed for the factory function","symbol":"TesseractCoreSIMD","correct":"import TesseractCoreSIMDInit from 'tesseract.js-core/tesseract-core-simd';"},{"note":"For smaller builds supporting only the LSTM recognition engine, use the `-lstm` path. Always import the `.js` wrapper file, which handles loading the corresponding `.wasm`.","wrong":"import TesseractCoreLSTM from 'tesseract.js-core/tesseract-core-lstm.wasm'; // Import the JS wrapper, not the raw WASM","symbol":"TesseractCoreLSTM","correct":"import TesseractCoreLSTMInit from 'tesseract.js-core/tesseract-core-lstm';"}],"quickstart":{"code":"import { readFileSync } from 'fs';\nimport TesseractCoreInit from 'tesseract.js-core/tesseract-core';\n\nconst runOcr = async (imagePath) => {\n  console.log('Loading TesseractCore module...');\n  // The factory function returns a Promise that resolves to the Emscripten module\n  const TesseractCore = await TesseractCoreInit();\n  console.log('TesseractCore module loaded. Initializing...');\n\n  // This is a minimal example showing low-level API interaction.\n  // In a real application, consider using tesseract.js for a higher-level API.\n  const core = new TesseractCore.TesseractApi();\n  core.Init(process.env.LANG_PATH || '/usr/share/tessdata', 'eng'); // Initialize with language data\n\n  const imageBuffer = readFileSync(imagePath);\n  // Assuming imageBuffer is a PNG or JPG and Tesseract can handle its format directly\n  // In a browser, you might pass a Canvas/ImageData object\n  core.SetImage(imageBuffer, imageBuffer.width, imageBuffer.height, 4, imageBuffer.width * 4);\n\n  core.Recognize();\n  const text = core.GetUTF8Text();\n  console.log('Recognized Text:', text);\n\n  core.End();\n  TesseractCore.destroy(core); // Clean up Emscripten instance\n};\n\n// Example usage: Assumes 'image.png' exists in the same directory\n// For a real scenario, replace 'image.png' with your actual image path.\n// process.env.LANG_PATH should point to directory containing .traineddata files.\nrunOcr('image.png').catch(console.error);","lang":"typescript","description":"Demonstrates how to load the `tesseract.js-core` WebAssembly module, initialize the Tesseract API, set an image buffer, perform OCR, and retrieve the recognized text using low-level methods. This example simulates a Node.js environment."},"warnings":[{"fix":"Always install `tesseract.js-core` and `tesseract.js` with matching major version numbers. Refer to the `tesseract.js` documentation for compatible core versions.","message":"Major versions of `tesseract.js-core` must match the major versions of `tesseract.js` they are used with. For example, `tesseract.js-core v7` should only be used with `tesseract.js v7`. Mismatched versions will likely lead to runtime errors or undefined behavior.","severity":"breaking","affected_versions":">=4.0.0"},{"fix":"Update references to the core module to use `TesseractCore` instead of `TesseractCoreWASM`.","message":"The WASM module name changed from `TesseractCoreWASM` to `TesseractCore` in `v4.0.4`. Direct integrations relying on the specific global name or export might be affected.","severity":"breaking","affected_versions":">=4.0.4"},{"fix":"If using the 'Relaxed SIMD' build and encountering `WebAssembly.compile()` errors, ensure the target environment supports `wf-wasm-simd-relaxed`. Fall back to the standard SIMD build (`tesseract-core-simd`) or the non-SIMD build (`tesseract-core`) if compatibility issues arise.","message":"The `v7.0.0` release introduced a new 'Relaxed SIMD' build, which utilizes WebAssembly Relaxed SIMD dot-product instructions for significant performance improvements (~1.6x faster on supported hardware). However, this feature is not universally supported across all browsers and runtimes.","severity":"gotcha","affected_versions":">=7.0.0"},{"fix":"If you require the Tesseract Legacy engine, avoid using the `*-lstm` specific builds. Most modern use cases are fine with LSTM-only builds, as LSTM is the default and generally more accurate engine.","message":"Starting with `v5.0.0`, 'LSTM-only' builds were introduced. These builds are approximately 0.75MB smaller but exclusively support the Tesseract LSTM model and do not include support for the older Tesseract Legacy engine.","severity":"gotcha","affected_versions":">=5.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Ensure you call the imported factory function and `await` its result: `const TesseractCore = await TesseractCoreInit();`","cause":"Attempting to use the imported module directly without calling its factory function. Emscripten modules typically export a factory function that must be awaited to get the actual module instance.","error":"TypeError: TesseractCore is not a function"},{"fix":"Verify the exact path for the desired build variant (e.g., `import TesseractCoreInit from 'tesseract.js-core/tesseract-core';`). Check the `node_modules/tesseract.js-core` directory for available `.js` files.","cause":"Incorrect import path for the specific build or general package. `tesseract.js-core` provides multiple build variants (e.g., `tesseract-core`, `tesseract-core-simd`, `tesseract-core-lstm`) each with its own entry point.","error":"Error: Cannot find module 'tesseract.js-core' or 'tesseract.js-core/tesseract-core'"},{"fix":"If using a `-simd` or `-relaxed-simd` build, try switching to a less optimized build (e.g., `tesseract-core` or `tesseract-core-lstm`) that uses a more widely compatible WebAssembly feature set. Ensure your environment is up-to-date.","cause":"The WebAssembly module contains instructions (like Relaxed SIMD) not supported by the current browser or Node.js runtime environment.","error":"WebAssembly.compile(): Wasm code is not valid: Invalid opcode"},{"fix":"Configure your web server to serve `.wasm` (and `.traineddata`) files from the root or the path where the `tesseract-core.js` wrapper expects them. Often, Emscripten modules expect the `.wasm` file to be co-located with the `.js` glue code or served from a configurable `locateFile` path.","cause":"In browser environments, the `.wasm` file (and potentially `.traineddata` files) are not correctly served from the expected path by the web server.","error":"Failed to load /tesseract-core.wasm: 404 Not Found"}],"ecosystem":"npm"}