{"id":7782,"library":"tesserocr","title":"tesserocr","description":"tesserocr is a simple, Pillow-friendly Python wrapper around the Tesseract-OCR API, built using Cython. It is currently at version 2.10.0 and maintains an active release schedule with several minor and patch updates throughout the year, primarily focusing on Tesseract/Leptonica version upgrades and Python compatibility.","status":"active","version":"2.10.0","language":"en","source_language":"en","source_url":"https://github.com/sirfz/tesserocr","tags":["ocr","tesseract","image processing","computer vision","cython"],"install":[{"cmd":"pip install tesserocr","lang":"bash","label":"Install Python wrapper"}],"dependencies":[{"reason":"Required for image processing (e.g., loading and passing images to Tesseract).","package":"Pillow","optional":false}],"imports":[{"symbol":"PyTessBaseAPI","correct":"from tesserocr import PyTessBaseAPI"},{"symbol":"image_to_text","correct":"from tesserocr import image_to_text"},{"symbol":"get_tesseract_version","correct":"from tesserocr import get_tesseract_version"},{"note":"Used for specifying result iterator levels (e.g., character, word, line).","symbol":"RIL","correct":"from tesserocr import RIL"}],"quickstart":{"code":"import tesserocr\nfrom PIL import Image\nfrom io import BytesIO\nimport base64\n\n# A simple base64 encoded image containing \"Hello World\" for a runnable example\n# In a real scenario, you'd load an image from file: Image.open(\"path/to/image.png\")\nimg_data = \"iVBORw0KGgoAAAANSUhEUgAAAKMAAABRCAYAAADtL/VCAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAEnQAABJ0AdFcPaeAAAAABTBMVEX///8AAAD///+xO/FBAAACHklEQVR4Xu2ZzytEQRjHn+3+oBv9B5gC927O4QeYM3Y3B38B3L2ZzR+gE2fPnsR/mD0Y+G9zZmbO+J0P921m3zMzt9O9eA+i+3zP/M78e3/qXWwP+qL1l1gYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGBgYGv80v+gN3m82rF44BAAAAAElFTkSuQmCC\"\nimage_bytes = base64.b64decode(img_data)\nimage = Image.open(BytesIO(image_bytes))\n\n# Initialize Tesseract API\n# You might need to specify lang='eng' or tessdata_dir='path/to/tessdata'\n# if Tesseract is not configured globally or language files are not found.\napi = tesserocr.PyTessBaseAPI(lang='eng')\napi.SetImage(image)\ntext = api.GetUTF8Text()\nconfidence = api.MeanTextConf()\napi.End()\n\nprint(f\"Detected Text: {text.strip()}\")\nprint(f\"Confidence: {confidence}\")\n\n# Shorter convenience function for quick OCR\ntext_short = tesserocr.image_to_text(image, lang='eng')\nprint(f\"Detected Text (short function): {text_short.strip()}\")\n\n# Example of getting Tesseract version\nprint(f\"Tesseract Version: {tesserocr.get_tesseract_version()}\")","lang":"python","description":"This quickstart demonstrates basic text recognition from an image using `tesserocr.PyTessBaseAPI` for more control and `tesserocr.image_to_text` for simplicity. It includes a base64-encoded image to make the example self-contained and runnable."},"warnings":[{"fix":"Install Tesseract-OCR on your system. For Debian/Ubuntu: `sudo apt-get install tesseract-ocr tesseract-ocr-eng`. For macOS: `brew install tesseract`. For Windows, use official installers or Chocolatey.","message":"Tesseract-OCR is a required system-level dependency and must be installed separately on your operating system (e.g., via `apt-get`, `brew`, `choco`). `tesserocr` is a Python wrapper, not a complete Tesseract distribution.","severity":"breaking","affected_versions":"All versions"},{"fix":"Ensure `.traineddata` files (e.g., `eng.traineddata`) are in a directory Tesseract can find, or explicitly pass `tessdata_dir='/path/to/tessdata'` and `lang='eng'` to `PyTessBaseAPI`.","message":"Tesseract language data files (`.traineddata`) must be accessible to Tesseract. By default, Tesseract looks in its standard data path, but if you have custom paths or multiple installations, you might need to specify `tessdata_dir` during `PyTessBaseAPI` initialization.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If building `tesserocr` from source, ensure your Cython version is `<3.0.0`. Rely on pre-built wheels when possible, as they handle this dependency automatically.","message":"Earlier versions (pre-2.6.1) had more relaxed Cython version requirements. Starting with v2.6.1, an upper bound `<3.0.0` was added to Cython to avoid breaking changes introduced in Cython 3.0.","severity":"deprecated","affected_versions":"<2.6.1 (for potential issues with Cython 3.x), >=2.6.1 (for enforcement)"},{"fix":"Prefer using `pip install tesserocr` which will attempt to download a pre-built wheel. If encountering build errors, check the GitHub releases for supported Python/OS combinations or consult the build instructions for your specific environment.","message":"While `tesserocr` generally supports multiple Python versions, building from source can be complex due to native Tesseract and Leptonica dependencies. Pre-built wheels are available for common Python versions and platforms, but may not cover all niche environments.","severity":"gotcha","affected_versions":"All versions, especially when using less common Python versions, architectures, or operating systems."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install Tesseract-OCR on your operating system. For example, `sudo apt-get install tesseract-ocr` (Linux) or `brew install tesseract` (macOS).","cause":"The Tesseract OCR engine executable is not installed on the system, or its path is not included in the system's PATH environment variable.","error":"TesseractNotFoundError: Failed to find Tesseract command"},{"fix":"Ensure that the Tesseract language data files are installed and correctly placed. On Linux, this might be `sudo apt-get install tesseract-ocr-eng`. Alternatively, specify the `tessdata_dir` argument when initializing `PyTessBaseAPI` (e.g., `PyTessBaseAPI(lang='eng', tessdata_dir='/path/to/tessdata/')`).","cause":"Tesseract cannot find the required language data files (`.traineddata`). This often happens if the files are missing, corrupted, or located in a non-standard directory.","error":"Error opening data file /usr/local/share/tessdata/eng.traineddata"},{"fix":"Install the package using pip: `pip install tesserocr`.","cause":"The `tesserocr` Python package has not been installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'tesserocr'"}]}