Surya OCR: Document Layout and Text Recognition

0.17.1 · active · verified Tue Apr 14

Surya OCR is a Python library offering state-of-the-art optical character recognition (OCR), document layout analysis, reading order detection, and table recognition for over 90 languages. It's built on deep learning models, providing high accuracy for complex document structures. The current version is 0.17.1, and it undergoes active development with frequent releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Surya OCR model and perform OCR on a dummy image. The `SuryaOCR.create_model()` call will automatically download the necessary deep learning models on its first execution. It performs detection, recognition, and layout analysis to return structured text and bounding boxes. Ensure you have `pillow` installed for image handling.

import asyncio
from surya.model.surya import SuryaOCR
from PIL import Image as PILImage, ImageDraw, ImageFont

# Create a dummy image for demonstration
def create_dummy_image():
    img = PILImage.new('RGB', (800, 600), color = 'white')
    d = ImageDraw.Draw(img)
    try:
        fnt = ImageFont.truetype("arial.ttf", 40)
    except IOError:
        fnt = ImageFont.load_default()
    d.text((50,50), "Hello, Surya OCR!", fill=(0,0,0), font=fnt)
    d.text((50,150), "This is a test document.", fill=(0,0,0), font=fnt)
    return img

async def main():
    print("Loading Surya OCR models...")
    # This will download models on first run
    model = SuryaOCR.create_model()
    print("Models loaded. Creating dummy image...")
    image = create_dummy_image()

    print("Running OCR...")
    # Run OCR (detection, recognition, and layout)
    # For real use, replace [image] with a list of PIL.Image objects
    results = await model.ocr([image], languages=["en"])

    print("OCR Results:")
    for page in results:
        for line in page.text_lines:
            print(f"  Line: '{line.text}', Bbox: {line.bbox}")
        # Optional: Print words
        # for word in page.words:
        #     print(f"  Word: '{word.text}', Bbox: {word.bbox}")

if __name__ == "__main__":
    asyncio.run(main())

view raw JSON →