Document Text Recognition (docTR)

1.0.1 · active · verified Thu Apr 16

docTR (Document Text Recognition) is an open-source Python library leveraging deep learning for high-performance Optical Character Recognition (OCR) on documents. It provides state-of-the-art text detection and recognition for scanned documents, images, and PDFs. Actively maintained by Mindee, it supports multi-language recognition, handwriting, and GPU acceleration, currently at version 1.0.1.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an image, initialize a pre-trained OCR model, and extract text using docTR's core functionality. It leverages `DocumentFile` to handle input and `ocr_predictor` for the end-to-end OCR pipeline.

import os
from doctr.io import DocumentFile
from doctr.models import ocr_predictor

# For demonstration, create a dummy image file if it doesn't exist
# In a real scenario, you'd have an actual image or PDF path
dummy_image_path = 'sample.png'
if not os.path.exists(dummy_image_path):
    try:
        from PIL import Image
        # Create a simple image with text
        img = Image.new('RGB', (200, 100), color = (255, 255, 255))
        from PIL import ImageDraw, ImageFont
        d = ImageDraw.Draw(img)
        try:
            # Try a common font, or fallback
            font = ImageFont.truetype("arial.ttf", 20)
        except IOError:
            font = ImageFont.load_default()
        d.text((10,10), "Hello docTR!", fill=(0,0,0), font=font)
        img.save(dummy_image_path)
        print(f"Created dummy image: {dummy_image_path}")
    except ImportError:
        print("Pillow not installed, cannot create dummy image. Please provide a real image file.")
        print("Skipping quickstart example as no image is available.")
        dummy_image_path = None

if dummy_image_path and os.path.exists(dummy_image_path):
    # Load your document (image or PDF)
    # For a PDF: doc = DocumentFile.from_pdf("path/to/your/document.pdf")
    # For multiple images: doc = DocumentFile.from_images(["path/to/img1.jpg", "path/to/img2.png"])
    doc = DocumentFile.from_images(dummy_image_path)

    # Load a pre-trained OCR model
    # Since v1.0.0, PyTorch is the default and only backend.
    model = ocr_predictor(pretrained=True)

    # Analyze the document
    result = model(doc)

    # Print the extracted text content
    # The result object contains detailed information about words, lines, blocks, and pages.
    print("\n--- OCR Result ---")
    for page in result.pages:
        for block in page.blocks:
            for line in block.lines:
                print(" ".join([word.value for word in line.words]))

    # You can also export the full structured output as JSON
    # print(result.export())
else:
    print("Quickstart skipped due to missing image.")

view raw JSON →