Grounding DINO Python Wrapper
groundingdino-py is a Python wrapper for the Grounding DINO open-set object detection model, simplifying its installation and usage. It provides a user-friendly interface to load models, process images, and perform text-prompted object detection. The current version is 0.4.0, with releases occurring semi-frequently to address installation issues and add features.
Common errors
-
ModuleNotFoundError: No module named 'GroundingDINO'
cause Attempting to use import paths from the original GroundingDINO repository (e.g., IDEA-Research/GroundingDINO) instead of the `groundingdino-py` wrapper.fixChange the import statement to `from groundingdino import GroundingDINO`. -
RuntimeError: CUDA error: out of memory
cause Processing images that are too large, running multiple GPU-intensive tasks, or using a GPU with insufficient VRAM for the model.fixReduce the input image size, free up GPU memory by closing other applications, or use a GPU with more VRAM. For `groundingdino-py[cuda]`, consider installing `xformers` for potential memory optimization. -
AttributeError: 'GroundingDINO' object has no attribute 'predict'
cause Using a method name (e.g., `predict`) that exists in the original GroundingDINO project but not in the `groundingdino-py` wrapper, or an older/incompatible version of the wrapper.fixRefer to the `groundingdino-py` documentation and use the correct method, which is `model.predict_image()`. -
`torch.cuda.is_available()` returns False` despite having a GPU.
cause `groundingdino-py[cuda]` was not installed, or the installed `torch` version is not compatible with your CUDA toolkit or GPU drivers.fixReinstall with `pip install --force-reinstall groundingdino-py[cuda]`. Ensure your CUDA toolkit and GPU drivers are up to date and compatible with the PyTorch version installed. Check `torch.version.cuda`.
Warnings
- breaking The `groundingdino-py` wrapper uses different import paths and method names compared to the original GroundingDINO repository (IDEA-Research/GroundingDINO). Direct copy-pasting code from the original project will likely fail.
- gotcha For GPU acceleration, you must install the library with the `[cuda]` extra (e.g., `pip install groundingdino-py[cuda]`). This requires a compatible PyTorch installation and CUDA toolkit on your system.
- gotcha The Grounding DINO model weights (approx. 2GB) are downloaded automatically on the first instantiation of the `GroundingDINO()` class. This requires an active internet connection and can take some time.
- gotcha Grounding DINO models are computationally intensive. Running inference on a CPU will be significantly slower than on a dedicated GPU.
Install
-
pip install groundingdino-py -
pip install groundingdino-py[cuda]
Imports
- GroundingDINO
from GroundingDINO import groundingdino_model
from groundingdino import GroundingDINO
Quickstart
import os
from groundingdino import GroundingDINO
# Instantiate the model
model = GroundingDINO() # Weights are downloaded on first run (approx. 2GB)
# Download a sample image (or use your own local path)
image_url = "https://raw.githubusercontent.com/giswqs/groundingdino-py/main/images/dog.jpg"
image_path = "dog.jpg"
if not os.path.exists(image_path):
print(f"Downloading {image_path}...")
model.download_file(image_url, image_path) # Helper method from the wrapper
# Define the text prompt
text_prompt = "a dog, a leash"
# Predict objects in the image
# Returns bounding boxes, confidence scores, and detected phrases
boxes, logits, phrases = model.predict_image(image_path, text_prompt)
print(f"Image: {image_path}")
print(f"Text prompt: '{text_prompt}'")
print(f"Detected boxes (xyxy format): {boxes}")
print(f"Confidence scores: {logits}")
print(f"Detected phrases: {phrases}")
# You can also customize confidence thresholds during prediction:
# boxes, logits, phrases = model.predict_image(image_path, text_prompt, box_threshold=0.3, text_threshold=0.25)