{"id":6260,"library":"surya-ocr","title":"Surya OCR: Document Layout and Text Recognition","description":"Surya OCR is a Python library offering state-of-the-art optical character recognition (OCR), document layout analysis, reading order detection, and table recognition for over 90 languages. It's built on deep learning models, providing high accuracy for complex document structures. The current version is 0.17.1, and it undergoes active development with frequent releases.","status":"active","version":"0.17.1","language":"en","source_language":"en","source_url":"https://github.com/VikParuchuri/surya","tags":["OCR","document processing","deep learning","layout analysis","table recognition","reading order"],"install":[{"cmd":"pip install surya-ocr","lang":"bash","label":"Base installation"},{"cmd":"pip install surya-ocr[gpu]","lang":"bash","label":"With GPU support (requires CUDA)"}],"dependencies":[{"reason":"Core deep learning framework for models.","package":"torch","optional":false},{"reason":"Utilized for transformer-based models.","package":"transformers","optional":false},{"reason":"Image processing library.","package":"Pillow","optional":false},{"reason":"Runtime for ONNX models (CPU acceleration).","package":"onnxruntime","optional":false},{"reason":"Recommended for significantly faster inference on compatible CUDA-enabled GPUs.","package":"onnxruntime-gpu","optional":true}],"imports":[{"note":"The primary OCR class is nested within `surya.model`.","wrong":"from surya.ocr import SuryaOCR","symbol":"SuryaOCR","correct":"from surya.model.surya import SuryaOCR"},{"note":"High-level functions for specific tasks.","symbol":"run_detection, run_recognition, run_layout","correct":"from surya import run_detection, run_recognition, run_layout"},{"note":"Surya typically expects Pillow Image objects.","symbol":"Image","correct":"from PIL import Image as PILImage"}],"quickstart":{"code":"import asyncio\nfrom surya.model.surya import SuryaOCR\nfrom PIL import Image as PILImage, ImageDraw, ImageFont\n\n# Create a dummy image for demonstration\ndef create_dummy_image():\n    img = PILImage.new('RGB', (800, 600), color = 'white')\n    d = ImageDraw.Draw(img)\n    try:\n        fnt = ImageFont.truetype(\"arial.ttf\", 40)\n    except IOError:\n        fnt = ImageFont.load_default()\n    d.text((50,50), \"Hello, Surya OCR!\", fill=(0,0,0), font=fnt)\n    d.text((50,150), \"This is a test document.\", fill=(0,0,0), font=fnt)\n    return img\n\nasync def main():\n    print(\"Loading Surya OCR models...\")\n    # This will download models on first run\n    model = SuryaOCR.create_model()\n    print(\"Models loaded. Creating dummy image...\")\n    image = create_dummy_image()\n\n    print(\"Running OCR...\")\n    # Run OCR (detection, recognition, and layout)\n    # For real use, replace [image] with a list of PIL.Image objects\n    results = await model.ocr([image], languages=[\"en\"])\n\n    print(\"OCR Results:\")\n    for page in results:\n        for line in page.text_lines:\n            print(f\"  Line: '{line.text}', Bbox: {line.bbox}\")\n        # Optional: Print words\n        # for word in page.words:\n        #     print(f\"  Word: '{word.text}', Bbox: {word.bbox}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n","lang":"python","description":"This quickstart demonstrates how to initialize the Surya OCR model and perform OCR on a dummy image. The `SuryaOCR.create_model()` call will automatically download the necessary deep learning models on its first execution. It performs detection, recognition, and layout analysis to return structured text and bounding boxes. Ensure you have `pillow` installed for image handling."},"warnings":[{"fix":"Thoroughly test existing code against the new version. Consult release notes and documentation for any API or output changes related to layout analysis. Retrain or re-evaluate any custom post-processing logic.","message":"Version 0.17.0 introduced a new architecture for the layout model. While high-level APIs might remain compatible, internal behavior, performance characteristics, and potentially the exact structure or interpretation of layout-specific outputs could have changed. If you relied on specific nuances of the previous layout model, verify your results.","severity":"breaking","affected_versions":">=0.17.0"},{"fix":"Install `surya-ocr[gpu]` if you have a CUDA-enabled GPU and ensure your CUDA toolkit is correctly set up. Verify `onnxruntime-gpu` is indeed being utilized (e.g., by monitoring GPU usage).","message":"Surya OCR models are deep learning models and require significant computational resources for optimal performance. CPU-only inference can be very slow, especially for large documents or batch processing. GPU acceleration via `onnxruntime-gpu` and a compatible CUDA setup is highly recommended.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure an active internet connection during the first run. For production environments or air-gapped systems, consider pre-downloading and packaging models if the library supports it (check advanced documentation) or running an initial setup script in a connected environment.","message":"The necessary deep learning models are downloaded on the first invocation of `SuryaOCR.create_model()` (or similar model loading functions). This initial download requires an internet connection and can take several minutes depending on network speed and model size.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure your Python environment is within the specified range. Use tools like `pyenv` or `conda` to manage multiple Python versions if needed.","message":"Surya-ocr requires Python version >= 3.10 and < 4.0. Using an incompatible Python version will lead to installation failures or runtime errors.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}