LlamaIndex OpenAI Multi-Modal LLMs Integration

0.6.2 · active · verified Sun Apr 12

This library provides an integration for LlamaIndex to use OpenAI's multi-modal Large Language Models (LLMs), such as GPT-4V and GPT-4o, for tasks involving both text and image inputs. It allows users to leverage OpenAI's capabilities for image understanding, reasoning, and multi-modal Retrieval Augmented Generation (RAG) applications within the LlamaIndex framework. The current version is 0.6.2 and it is part of the broader LlamaIndex ecosystem for building LLM applications.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `OpenAIMultiModal` class with an OpenAI vision model (e.g., `gpt-4-vision-preview` or `gpt-4o`), load image documents from URLs, and then use the LLM to get a descriptive response based on both a text prompt and the provided images. Ensure `OPENAI_API_KEY` is set in your environment.

import os
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

# Ensure your OpenAI API key is set as an environment variable
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")

# Example image URL (replace with a real URL or local path for actual use)
# For a runnable example, you'd need a publicly accessible image URL or a local image file.
# For demonstration purposes, we'll use a placeholder URL and note its purpose.
image_urls = [
    "https://docs.llamaindex.ai/en/stable/_static/assets/img/llama-index-logo.png"
]

# Load image documents from URLs
image_documents = load_image_urls(image_urls)

# Initialize the OpenAI Multi-Modal LLM
openai_mm_llm = OpenAIMultiModal(
    model="gpt-4-vision-preview", # Or "gpt-4o"
    api_key=os.environ["OPENAI_API_KEY"],
    max_new_tokens=300,
)

# Complete a prompt with image documents
response = openai_mm_llm.complete(
    prompt="What is in the image? Describe it.",
    image_documents=image_documents,
)

print(response.text)

view raw JSON →