Surya OCR: Document Layout and Text Recognition
Surya OCR is a Python library offering state-of-the-art optical character recognition (OCR), document layout analysis, reading order detection, and table recognition for over 90 languages. It's built on deep learning models, providing high accuracy for complex document structures. The current version is 0.17.1, and it undergoes active development with frequent releases.
Warnings
- breaking Version 0.17.0 introduced a new architecture for the layout model. While high-level APIs might remain compatible, internal behavior, performance characteristics, and potentially the exact structure or interpretation of layout-specific outputs could have changed. If you relied on specific nuances of the previous layout model, verify your results.
- gotcha Surya OCR models are deep learning models and require significant computational resources for optimal performance. CPU-only inference can be very slow, especially for large documents or batch processing. GPU acceleration via `onnxruntime-gpu` and a compatible CUDA setup is highly recommended.
- gotcha The necessary deep learning models are downloaded on the first invocation of `SuryaOCR.create_model()` (or similar model loading functions). This initial download requires an internet connection and can take several minutes depending on network speed and model size.
- gotcha Surya-ocr requires Python version >= 3.10 and < 4.0. Using an incompatible Python version will lead to installation failures or runtime errors.
Install
-
pip install surya-ocr -
pip install surya-ocr[gpu]
Imports
- SuryaOCR
from surya.model.surya import SuryaOCR
- run_detection, run_recognition, run_layout
from surya import run_detection, run_recognition, run_layout
- Image
from PIL import Image as PILImage
Quickstart
import asyncio
from surya.model.surya import SuryaOCR
from PIL import Image as PILImage, ImageDraw, ImageFont
# Create a dummy image for demonstration
def create_dummy_image():
img = PILImage.new('RGB', (800, 600), color = 'white')
d = ImageDraw.Draw(img)
try:
fnt = ImageFont.truetype("arial.ttf", 40)
except IOError:
fnt = ImageFont.load_default()
d.text((50,50), "Hello, Surya OCR!", fill=(0,0,0), font=fnt)
d.text((50,150), "This is a test document.", fill=(0,0,0), font=fnt)
return img
async def main():
print("Loading Surya OCR models...")
# This will download models on first run
model = SuryaOCR.create_model()
print("Models loaded. Creating dummy image...")
image = create_dummy_image()
print("Running OCR...")
# Run OCR (detection, recognition, and layout)
# For real use, replace [image] with a list of PIL.Image objects
results = await model.ocr([image], languages=["en"])
print("OCR Results:")
for page in results:
for line in page.text_lines:
print(f" Line: '{line.text}', Bbox: {line.bbox}")
# Optional: Print words
# for word in page.words:
# print(f" Word: '{word.text}', Bbox: {word.bbox}")
if __name__ == "__main__":
asyncio.run(main())