{"library":"pytesseract","title":"Pytesseract","description":"Pytesseract is a Python wrapper for Google's Tesseract-OCR Engine, providing an optical character recognition (OCR) tool for Python. It enables users to recognize and extract text embedded in images, supporting various image types through the Pillow library. The project is actively maintained with regular updates to support newer Python versions and improve functionality. Its current version is 0.3.13.","status":"active","version":"0.3.13","language":"en","source_language":"en","source_url":"https://github.com/madmaze/pytesseract","tags":["OCR","image processing","Tesseract","text extraction"],"install":[{"cmd":"pip install pytesseract Pillow","lang":"bash","label":"Install Python package"},{"cmd":"sudo apt install tesseract-ocr\n# For specific languages: sudo apt install tesseract-ocr-eng","lang":"bash","label":"Install Tesseract-OCR (Linux - Debian/Ubuntu)"},{"cmd":"brew install tesseract","lang":"bash","label":"Install Tesseract-OCR (macOS)"},{"cmd":"Follow installer at https://github.com/UB-Mannheim/tesseract/wiki","lang":"text","label":"Install Tesseract-OCR (Windows)"}],"dependencies":[{"reason":"Required for image manipulation and handling within Python. pytesseract depends on Pillow for opening and processing image files before passing them to the Tesseract engine.","package":"Pillow"},{"reason":"This is the core OCR engine that pytesseract wraps. It must be installed separately on the system and its executable must be accessible via PATH or explicitly specified in Python.","package":"Tesseract-OCR Engine","optional":false}],"imports":[{"symbol":"pytesseract","correct":"import pytesseract"},{"symbol":"Image","correct":"from PIL import Image"}],"quickstart":{"code":"from PIL import Image, ImageDraw, ImageFont\nimport pytesseract\nimport os\n\n# NOTE: Ensure Tesseract-OCR is installed on your system and its executable is in your PATH.\n# If not, you might need to specify the path to tesseract.exe:\n# pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'\n\n# Create a dummy image for demonstration\nimg_width, img_height = 400, 100\nimg = Image.new('RGB', (img_width, img_height), color = 'white')\ndraw = ImageDraw.Draw(img)\n\ntry:\n    # Try to use a common system font\n    font = ImageFont.truetype(\"arial.ttf\", 24)\nexcept IOError:\n    # Fallback if Arial is not found (e.g., on some Linux systems without it)\n    font = ImageFont.load_default()\n\ntext = \"Hello, Pytesseract OCR!\"\ndraw.text((50, 30), text, fill='black', font=font)\n\n# Perform OCR on the image\nextracted_text = pytesseract.image_to_string(img)\n\nprint(f\"Extracted Text: {extracted_text.strip()}\")\n\n# Example of getting Tesseract version\ntesseract_version = pytesseract.get_tesseract_version()\nprint(f\"Tesseract Version: {tesseract_version}\")","lang":"python","description":"This quickstart demonstrates how to use `pytesseract` to extract text from an image. It first creates a simple in-memory image with text using Pillow, then uses `pytesseract.image_to_string()` to perform OCR. It also shows how to check the installed Tesseract version. Remember that the Tesseract-OCR engine itself must be installed on your system for `pytesseract` to function."},"warnings":[{"fix":"Upgrade Python environment to 3.7 or newer. The current recommended `requires_python` is `>=3.8`.","message":"Python 2 and Python 3.5 support was dropped in `v0.3.7`. Python 3.6 support was dropped in `v0.3.9` as it reached End of Life. Users on older Python versions must upgrade to at least Python 3.7+ (preferably 3.8+).","severity":"breaking","affected_versions":"<=0.3.6 (for Python 2/3.5), <=0.3.8 (for Python 3.6)"},{"fix":"Install the Tesseract-OCR engine for your operating system and ensure its executable is in your system's PATH. On Windows, you might need to manually add `tesseract.exe`'s directory to PATH or explicitly set `pytesseract.pytesseract.tesseract_cmd` in your Python script.","message":"Pytesseract is a wrapper; the Tesseract-OCR engine must be installed separately on your operating system (e.g., via apt, brew, or Windows installer). Failing to install the Tesseract engine is the most common reason for errors like `TesseractNotFoundError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Apply image preprocessing techniques (e.g., binarization, denoising, deskewing, resizing) using libraries like Pillow or OpenCV. Experiment with Tesseract configuration options (e.g., `--psm` for page segmentation mode, `--oem` for OCR engine mode) and specify language (`lang`).","message":"OCR accuracy is highly dependent on image quality, resolution, contrast, and text style. Pytesseract may struggle with low-quality, noisy, complex layouts, or handwritten text, often returning gibberish or incorrect results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Before calling any `pytesseract` function, add a line like `pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'` to your script. Use raw strings (`r'...'`) for Windows paths to avoid issues with backslashes.","message":"Explicitly setting the `tesseract_cmd` path can be necessary, especially on Windows or if the Tesseract executable is not automatically found in your system's PATH environment variable. Forgetting this can lead to `TesseractNotFoundError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"No direct fix needed unless explicit caching behavior is desired. Review code that calls `get_tesseract_version` if performance regressions are observed.","message":"The caching of `get_tesseract_version` was made optional and disabled by default in `v0.3.11`. If you relied on this caching behavior, you might notice a performance difference or need to re-enable it manually.","severity":"deprecated","affected_versions":">=0.3.11"}],"env_vars":null,"last_verified":"2026-04-05T00:00:00.000Z","next_check":"2026-07-04T00:00:00.000Z"}