RapidOCR

3.8.1 · active · verified Sat Apr 11

RapidOCR is an open-source, multi-platform, multi-language Optical Character Recognition (OCR) toolkit designed for fast and offline deployment. It leverages various inference engines like ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT, and PyTorch, offering both speed and extensive compatibility by converting PaddleOCR models to ONNX format. The library is currently at version 3.8.1 and maintains a very active release cadence.

Warnings

Install

Imports

Quickstart

This quickstart initializes the RapidOCR engine, which automatically handles model downloads on the first execution. It then performs OCR on a sample image from a URL and prints the extracted text. For visualization, ensure OpenCV is installed (e.g., `pip install opencv-python`).

import os
from rapidocr import RapidOCR

# Initialize the OCR engine. This will automatically download models on first run.
# Ensure 'onnxruntime' or another backend is installed (e.g., pip install rapidocr onnxruntime)
engine = RapidOCR()

# Example image from a public URL
img_url = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/resources/test_files/ch_en_num.jpg"

# Process the image
result = engine(img_url)

# Print the extracted text results
for line in result:
    # Each line typically contains bounding box, text, and confidence
    if len(line) >= 2:
        print(f"Text: {line[1]}")

# You can also visualize the results (requires OpenCV)
# result.vis("vis_result.jpg")
# print("Visualization saved to vis_result.jpg")

view raw JSON →