PaddleOCR
PaddleOCR is an awesome multilingual OCR and document parsing toolkit built upon the PaddlePaddle deep learning framework. It provides robust capabilities for text detection, text recognition, and structured document understanding, capable of transforming images and PDFs into structured data (JSON/Markdown) for AI and LLM-based applications. The library is currently at version 3.4.0 and maintains a frequent release cadence, with minor versions released every few months, incorporating new models and features.
Warnings
- breaking PaddleOCR 3.x introduced significant interface changes compared to 2.x versions. Code written for 2.x will likely break with 3.x.
- gotcha Incompatible `PaddlePaddle` framework versions can lead to runtime errors, especially with GPU usage (CUDA/cuDNN issues). `paddleocr` requires PaddlePaddle 3.0 or above.
- gotcha When processing PDF files, `AttributeError` related to `pymupdf` (e.g., 'Document' object has no attribute 'metadata') can occur due to version conflicts.
- gotcha GPU installations can frequently encounter `RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library` or similar. This often means PaddleOCR cannot find required CUDA/cuDNN libraries.
- gotcha Default models may not achieve optimal accuracy for specific text types (e.g., numeric-only) or challenging image conditions (low contrast, noise).
Install
-
pip install paddlepaddle pip install paddleocr -
pip install paddlepaddle-gpu # Or specific CUDA version, see PaddlePaddle docs pip install "paddleocr[all]"
Imports
- PaddleOCR
from paddleocr import PaddleOCR
- PaddleOCRVL
from paddleocr import PaddleOCRVL
Quickstart
from paddleocr import PaddleOCR
import os
import cv2
import numpy as np
# Create a dummy image for demonstration
img_path = 'temp_ocr_test_image.png'
img = np.zeros((100, 300, 3), dtype=np.uint8)
cv2.putText(img, 'Hello PaddleOCR!', (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imwrite(img_path, img)
# Initialize PaddleOCR with default language (Chinese & English) or specify 'en' for English
# Models will be downloaded automatically on first use
ocr = PaddleOCR(use_angle_cls=True, lang='en', show_log=False)
# Perform OCR on the image
result = ocr.ocr(img_path, cls=True)
# Print detected text and confidence scores
for idx in range(len(result)):
res = result[idx]
for line in res:
print(f"Text: {line[1][0]}, Confidence: {line[1][1]:.2f}")
# Clean up dummy image
os.remove(img_path)