PaddleOCR

3.4.0 · active · verified Sat Apr 11

PaddleOCR is an awesome multilingual OCR and document parsing toolkit built upon the PaddlePaddle deep learning framework. It provides robust capabilities for text detection, text recognition, and structured document understanding, capable of transforming images and PDFs into structured data (JSON/Markdown) for AI and LLM-based applications. The library is currently at version 3.4.0 and maintains a frequent release cadence, with minor versions released every few months, incorporating new models and features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to perform basic OCR on an image using the `PaddleOCR` class. It initializes the OCR engine (downloading models if not present), processes a dummy image, and prints the detected text along with its confidence score. For document parsing with VLM, use `PaddleOCRVL`.

from paddleocr import PaddleOCR
import os
import cv2
import numpy as np

# Create a dummy image for demonstration
img_path = 'temp_ocr_test_image.png'
img = np.zeros((100, 300, 3), dtype=np.uint8)
cv2.putText(img, 'Hello PaddleOCR!', (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imwrite(img_path, img)

# Initialize PaddleOCR with default language (Chinese & English) or specify 'en' for English
# Models will be downloaded automatically on first use
ocr = PaddleOCR(use_angle_cls=True, lang='en', show_log=False)

# Perform OCR on the image
result = ocr.ocr(img_path, cls=True)

# Print detected text and confidence scores
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(f"Text: {line[1][0]}, Confidence: {line[1][1]:.2f}")

# Clean up dummy image
os.remove(img_path)

view raw JSON →