LayoutParser

0.3.4 · active · verified Thu Apr 16

LayoutParser is a unified toolkit for Deep Learning Based Document Image Analysis, providing a comprehensive set of tools for tasks like document layout detection, OCR, and visualization. It is currently at version 0.3.4 and maintains an active development cycle with regular patch releases and significant minor/major updates that introduce new models and backend support.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an image from a URL, use `AutoLayoutModel` (recommended for v0.3.0+) to detect the document layout, and print the detected blocks. For visualization, ensure `matplotlib` is installed and uncomment the relevant lines.

import layoutparser as lp
from PIL import Image
import io
import requests

# Download a sample image
image_url = "https://layout-parser.github.io/assets/images/publaynet.png"
response = requests.get(image_url)
image_bytes = io.BytesIO(response.content)

# Load the image using PIL, then convert to layoutparser.Image
pil_image = Image.open(image_bytes)
lp_image = lp.Image(pil_image)

# Load a pre-trained layout model (using AutoLayoutModel since v0.3.0+)
# Requires 'layoutparser[detectron2]' installed.
model = lp.AutoLayoutModel(model_path="lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config")

# Detect the layout
layout = model.detect(lp_image)

# Print detected blocks and their types
print(f"Detected {len(layout)} blocks:")
for block in layout:
    print(f"  - Type: {block.type}, Box: {block.coordinates}")

# (Optional) Visualize the layout
# You might need matplotlib for this to display the image
# import matplotlib.pyplot as plt
# fig = lp.draw_box(lp_image, layout, box_width=3)
# plt.imshow(fig)
# plt.show()

view raw JSON →