CLIP Interrogator
CLIP Interrogator is a Python library that generates text prompts from images, leveraging large language and vision models like CLIP and BLIP. It's particularly useful for generating prompts for text-to-image AI models. The current version is 0.6.0, and it has an active but infrequent release cadence, with major updates addressing model support and performance optimizations.
Warnings
- breaking Support for `.pkl` cache files was dropped in favor of `safetensors` format. Users upgrading from versions prior to 0.5.4 may need to clear their old cache or re-download models.
- gotcha CLIP Interrogator, especially with larger caption models like `blip-large` or `blip2-flan-t5-xl`, requires significant VRAM (10GB+). If you experience out-of-memory errors, switch to smaller models or use `apply_low_vram_defaults()`.
- gotcha Models are downloaded to the user's cache directory (e.g., `~/.cache/clip_interrogator` or Hugging Face cache) on first initialization. This can take time and consume several GBs of disk space.
Install
-
pip install clip-interrogator
Imports
- Config
from clip_interrogator import Config
- Interrogator
from clip_interrogator import Interrogator
- LabelTable
from clip_interrogator import LabelTable
- list_caption_models
from clip_interrogator import list_caption_models
- list_clip_models
from clip_interrogator import list_clip_models
Quickstart
import os
from PIL import Image
from clip_interrogator import Config, Interrogator
# Create a dummy image for the example to be runnable
dummy_image_path = "dummy_image_for_ci.jpg"
try:
Image.new('RGB', (224, 224), color = 'red').save(dummy_image_path)
except ImportError:
# Fallback if Pillow is not available for some reason (unlikely for this lib)
with open(dummy_image_path, 'w') as f:
f.write("dummy content")
# Configure CLIP Interrogator
ci_config = Config()
# Set model names (these will be downloaded on first run and cached)
ci_config.clip_model_name = "ViT-L-14/openai"
ci_config.caption_model_name = "blip-large" # Other options: blip-base, blip2-2.7b, blip2-flan-t5-xl, git-large-coco
# Apply low VRAM settings if available (recommended for GPUs with <12GB VRAM)
# This method was introduced in v0.5.4
if hasattr(ci_config, 'apply_low_vram_defaults'):
ci_config.apply_low_vram_defaults()
# Initialize the Interrogator. This will download models if not already cached.
print("Initializing CLIP Interrogator (models may download on first run)...")
try:
ci = Interrogator(ci_config)
print("CLIP Interrogator initialized.")
# Load an image
image = Image.open(dummy_image_path).convert("RGB")
# Perform interrogation
prompt = ci.interrogate(image)
print(f"Generated prompt: {prompt}")
except Exception as e:
print(f"Error during interrogation: {e}. Please ensure sufficient VRAM and disk space for models.")
finally:
# Clean up the dummy image
if os.path.exists(dummy_image_path):
os.remove(dummy_image_path)