{"id":12133,"library":"tesseract.js","title":"Tesseract.js - Pure JavaScript OCR","description":"Tesseract.js is a JavaScript library that provides Optical Character Recognition (OCR) capabilities directly in both browser and Node.js environments. It functions by wrapping a WebAssembly port of the popular Tesseract OCR engine, enabling it to extract text from images in nearly any language. The current stable version is `7.0.0`, which brings significant recognition speed improvements (15-35% faster) through optimized WebAssembly and hardware capabilities. The project generally follows a regular release cadence, with major versions often introducing performance enhancements and minor versions addressing bugs or adding small features. A key differentiator is its ability to run Tesseract purely in JavaScript, without requiring native system dependencies. However, it explicitly states that it does not provide direct PDF file support or modify the core Tesseract recognition model to improve accuracy.","status":"active","version":"7.0.0","language":"javascript","source_language":"en","source_url":"https://github.com/naptha/tesseract.js","tags":["javascript","typescript"],"install":[{"cmd":"npm install tesseract.js","lang":"bash","label":"npm"},{"cmd":"yarn add tesseract.js","lang":"bash","label":"yarn"},{"cmd":"pnpm add tesseract.js","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"Use named import for ES Modules in Node.js (v16+) and modern browsers. The `require()` syntax is generally not recommended for Tesseract.js v7 and may lead to issues.","wrong":"const { createWorker } = require('tesseract.js');","symbol":"createWorker","correct":"import { createWorker } from 'tesseract.js';"},{"note":"When Tesseract.js is included via a `<script>` tag from a CDN, the `Tesseract` object becomes globally available.","symbol":"Tesseract (global)","correct":"Tesseract.createWorker('eng');"},{"note":"For TypeScript users, import types explicitly using `import type` for clarity and better tree-shaking.","symbol":"RecognizeResult (type)","correct":"import type { RecognizeResult } from 'tesseract.js';"}],"quickstart":{"code":"import { createWorker } from 'tesseract.js';\n\n(async () => {\n  const worker = await createWorker('eng', 1, {\n    logger: m => console.log(m) // Optional: Log progress to console\n  });\n\n  // Example image URL\n  const imageUrl = 'https://tesseract.projectnaptha.com/img/eng_bw.png';\n\n  console.log('Recognizing text from:', imageUrl);\n  const ret = await worker.recognize(imageUrl);\n\n  console.log('Detected text:\\n', ret.data.text);\n\n  // Accessing other output formats (if enabled in worker config)\n  // console.log('Words:', ret.data.words.map(w => w.text));\n\n  await worker.terminate();\n  console.log('Worker terminated.');\n})();","lang":"typescript","description":"This quickstart demonstrates how to create a Tesseract.js worker, load the English language model, recognize text from a remote image, log the output, and properly terminate the worker."},"warnings":[{"fix":"Upgrade your Node.js environment to v16 or newer to use Tesseract.js v7.","message":"Tesseract.js v7.0.0 dropped support for Node.js v14.","severity":"breaking","affected_versions":">=7.0.0"},{"fix":"To enable specific output formats (e.g., `blocks`, `words`), you must explicitly configure them when creating the worker or using `worker.setParameters`.","message":"Starting with v6.0.0, all Tesseract output formats other than `text` are disabled by default to reduce runtime and memory usage.","severity":"breaking","affected_versions":">=6.0.0"},{"fix":"For parallel processing of multiple images, use `createScheduler` to manage jobs across multiple workers, or ensure each `worker.recognize` call completes before initiating another on the same worker.","message":"Running multiple `worker.recognize` calls concurrently on the same worker is not recommended and can lead to unexpected behavior or resource exhaustion, even though a bug related to this was fixed in v5.0.5.","severity":"gotcha","affected_versions":">=5.0.5"},{"fix":"For PDF processing, pre-convert PDF pages into image formats (e.g., PNG, JPEG) before passing them to Tesseract.js. For advanced model tuning or features beyond core Tesseract, consider other OCR solutions or pre-process images for better Tesseract results.","message":"Tesseract.js does not provide direct support for PDF files; it operates on images. Additionally, the project focuses on bringing the Tesseract engine to JavaScript and does not modify the core Tesseract recognition model to improve accuracy.","severity":"gotcha","affected_versions":"*"},{"fix":"Upgrade to Tesseract.js v7 to benefit from the performance enhancements and optimize OCR processing times.","message":"Tesseract.js v7 introduces a new `relaxedsimd` build that significantly improves recognition speed (15-35%) by leveraging the latest WebAssembly and hardware capabilities, especially on newer Intel processors.","severity":"gotcha","affected_versions":"<7.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Upgrade your Node.js environment to v16 or newer. If you must use Node.js v14, you would need to use an older Tesseract.js version (e.g., v5) or provide a global `fetch` polyfill.","cause":"Attempting to run Tesseract.js v7 on Node.js v14 or older environments that lack a native `fetch` implementation.","error":"TypeError: fetch is not a function"},{"fix":"Verify the language code is correct (e.g., 'eng' for English). Ensure there is network access to the Tesseract.js CDN. If loading local data, confirm the `langPath` option is correctly set when calling `createWorker` and that the language files exist at that location.","cause":"The specified language data for the Tesseract worker could not be loaded, possibly due to an incorrect language code, network issues preventing download from the CDN, or an incorrect `langPath` configuration for local files.","error":"Error: 'eng' language data not found"},{"fix":"Use `await` for each `worker.recognize` call or, for parallel image processing, utilize `Tesseract.createScheduler()` to manage jobs across multiple workers efficiently.","cause":"This warning typically occurs when multiple `worker.recognize` calls are initiated on the same worker without awaiting previous calls, leading to too many event listeners being attached.","error":"(node:...) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 recognize listeners added to [Worker]. Use emitter.setMaxListeners() to increase limit"},{"fix":"Switch to ES Module `import` syntax: `import { createWorker } from 'tesseract.js';`. Ensure your environment supports ES Modules.","cause":"You are attempting to use CommonJS `require()` syntax in a JavaScript environment configured for ES Modules (e.g., Node.js with `\"type\": \"module\"` in `package.json` or modern browser environments) where Tesseract.js is primarily distributed as an ESM package.","error":"ERR_REQUIRE_ESM is not defined in ES module scope"}],"ecosystem":"npm"}