Grounding DINO Python Wrapper

0.4.0 · active · verified Thu Apr 16

groundingdino-py is a Python wrapper for the Grounding DINO open-set object detection model, simplifying its installation and usage. It provides a user-friendly interface to load models, process images, and perform text-prompted object detection. The current version is 0.4.0, with releases occurring semi-frequently to address installation issues and add features.

Common errors

Warnings

Install

Imports

Quickstart

Initializes the GroundingDINO model, downloads a sample image (if not present), and performs open-set object detection using a text prompt. The model weights are downloaded automatically on the first instantiation. The `predict_image` method returns bounding box coordinates, confidence scores, and the corresponding detected phrases.

import os
from groundingdino import GroundingDINO

# Instantiate the model
model = GroundingDINO() # Weights are downloaded on first run (approx. 2GB)

# Download a sample image (or use your own local path)
image_url = "https://raw.githubusercontent.com/giswqs/groundingdino-py/main/images/dog.jpg"
image_path = "dog.jpg"
if not os.path.exists(image_path):
    print(f"Downloading {image_path}...")
    model.download_file(image_url, image_path) # Helper method from the wrapper

# Define the text prompt
text_prompt = "a dog, a leash"

# Predict objects in the image
# Returns bounding boxes, confidence scores, and detected phrases
boxes, logits, phrases = model.predict_image(image_path, text_prompt)

print(f"Image: {image_path}")
print(f"Text prompt: '{text_prompt}'")
print(f"Detected boxes (xyxy format): {boxes}")
print(f"Confidence scores: {logits}")
print(f"Detected phrases: {phrases}")

# You can also customize confidence thresholds during prediction:
# boxes, logits, phrases = model.predict_image(image_path, text_prompt, box_threshold=0.3, text_threshold=0.25)

view raw JSON →