LlamaIndex OpenAI Question Generator
The `llama-index-question-gen-openai` package provides an integration for LlamaIndex, enabling the generation of sub-questions using OpenAI's function calling API. It leverages the fine-tuned capabilities of the latest OpenAI models to output structured JSON objects, aiming to reduce output parsing issues compared to generic LLM question generators. The current version is 0.3.1, and it is part of the actively developed LlamaIndex ecosystem with frequent updates.
Warnings
- gotcha The `OpenAIQuestionGenerator` is specifically designed for OpenAI models that support the function calling API (e.g., `gpt-3.5-turbo-0613`, `gpt-4`). It will not work with older OpenAI completion-based models or other generic LLMs that do not support this API, which are typically handled by `LLMQuestionGenerator`.
- breaking As of LlamaIndex v0.10.x, the library adopted a modular, namespaced package structure. This means integration packages like `llama-index-question-gen-openai` must be installed explicitly. Importing components directly from `llama_index` or `llama_index.core` for integrations that have moved to separate packages will result in `ImportError` if the specific integration package is not installed.
- gotcha An `OPENAI_API_KEY` must be set as an environment variable (e.g., `export OPENAI_API_KEY='sk-...'` or via a `.env` file) for all OpenAI integrations, including `OpenAIQuestionGenerator`. Failure to do so will result in authentication errors when making API calls.
- gotcha When using `OpenAIQuestionGenerator` within a `SubQuestionQueryEngine` with multiple tools, the combined descriptions of these tools can exceed OpenAI's function calling API character limit (currently 1024 characters). This will raise a `ValueError`.
- gotcha Network issues, incorrect API keys, or exceeding rate limits can lead to `APIConnectionError` or `RateLimitError` when calling the OpenAI API. These are common with external API interactions and can disrupt workflows.
Install
-
pip install llama-index-question-gen-openai
Imports
- OpenAIQuestionGenerator
from llama_index.question_gen.openai import OpenAIQuestionGenerator
Quickstart
import os
from llama_index.question_gen.openai import OpenAIQuestionGenerator
from llama_index.core.tools import ToolMetadata, QueryEngineTool
from llama_index.core import QueryBundle, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
# Set your OpenAI API key (replace with your actual key or load from .env)
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")
# Create a dummy data file for demonstration
with open("data.txt", "w") as f:
f.write("The capital of France is Paris. Paris is known for its Eiffel Tower.")
f.write("The capital of Germany is Berlin. Berlin has a rich history.")
# Load data and create a simple index for a tool
reader = SimpleDirectoryReader("./")
documents = reader.load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Define a tool for the question generator to use
tool = QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name="city_info",
description="Provides information about cities and their landmarks."
),
)
# Initialize OpenAIQuestionGenerator
# It uses OpenAI's function calling API by default.
question_gen = OpenAIQuestionGenerator.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-0613"))
# Generate sub-questions based on a complex query and available tools
query_bundle = QueryBundle("Tell me about the capitals of European countries and their famous landmarks.")
sub_questions = question_gen.generate(
tools=[tool],
query=query_bundle
)
print(f"Generated {len(sub_questions)} sub-questions:")
for sq in sub_questions:
print(f"- Question: {sq.sub_question}, Tool: {sq.tool_name}")