EfficientDet for PyTorch
effdet is a PyTorch implementation of the EfficientDet object detection model. It aims to faithfully reproduce the original TensorFlow models while providing PyTorch flexibility. The library is currently at version 0.4.1 and has a steady release cadence, often aligning with updates to its `timm` dependency and PyTorch versions, focusing on performance and accuracy.
Warnings
- breaking The bounding box output format for 'Predict' and 'Train' benches changed from `XYWH` (x, y, width, height) to `XYXY` (x1, y1, x2, y2). Users upgrading from older versions or following outdated tutorials must adjust their parsing logic.
- breaking `effdet` has tight dependencies on `timm` (PyTorch Image Models) versions. Upgrading `effdet` often requires updating `timm` to a specific version (e.g., `>=0.3` or `>=0.9`) due to API changes in `timm`'s helper functions and model backbones.
- gotcha Input image dimensions must be divisible by 128 due to the EfficientDet's BiFPN (Bi-directional Feature Pyramid Network) architecture, which processes features at various scales (P3 to P7).
- deprecated The default focal loss implementation changed. The older version, which might have more numerical stability issues but potentially lower memory usage, can be explicitly enabled during training.
Install
-
pip install effdet
Imports
- create_model
from effdet import create_model
Quickstart
import torch
from effdet import create_model
from effdet.data import resolve_input_config
from torchvision import transforms
from PIL import Image
import os
# Create a dummy image for a runnable example
dummy_image_path = "dummy_image.png"
if not os.path.exists(dummy_image_path):
img = Image.new('RGB', (640, 640), color = 'red')
img.save(dummy_image_path)
# 1. Load a pre-trained EfficientDet model
# Use 'tf_efficientdet_d0' for a small, fast model.
# bench_task='predict' is crucial for inference mode.
model_name = 'tf_efficientdet_d0'
model = create_model(model_name, pretrained=True, bench_task='predict')
model.eval()
# 2. Prepare the image for inference
img = Image.open(dummy_image_path).convert('RGB')
# Resolve input configuration from the model's pretrained_cfg
input_config = resolve_input_config(model.pretrained_cfg)
# Define image transformation pipeline
transform = transforms.Compose([
transforms.Resize(input_config['input_size']),
transforms.ToTensor(),
transforms.Normalize(mean=input_config['mean'], std=input_config['std'])
])
# Apply transformations and add a batch dimension
input_tensor = transform(img).unsqueeze(0)
# 3. Perform inference
with torch.no_grad():
output = model(input_tensor)
# The output format is typically [x1, y1, x2, y2, score, class]
# Print top 5 detected objects (if any)
if output.numel() > 0:
print(f"Detected objects (top 5, if available):\n{output[0][:5]}")
else:
print("No objects detected.")
# Clean up dummy image
os.remove(dummy_image_path)