LlamaIndex OpenAI Question Generator

0.3.1 · active · verified Sun Apr 12

The `llama-index-question-gen-openai` package provides an integration for LlamaIndex, enabling the generation of sub-questions using OpenAI's function calling API. It leverages the fine-tuned capabilities of the latest OpenAI models to output structured JSON objects, aiming to reduce output parsing issues compared to generic LLM question generators. The current version is 0.3.1, and it is part of the actively developed LlamaIndex ecosystem with frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `OpenAIQuestionGenerator` and use it within a LlamaIndex application to generate sub-questions. It requires an `OPENAI_API_KEY` and sets up a basic `QueryEngineTool` that the generator can reference. The generated sub-questions are tailored to the provided tools and original query.

import os
from llama_index.question_gen.openai import OpenAIQuestionGenerator
from llama_index.core.tools import ToolMetadata, QueryEngineTool
from llama_index.core import QueryBundle, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Set your OpenAI API key (replace with your actual key or load from .env)
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")

# Create a dummy data file for demonstration
with open("data.txt", "w") as f:
    f.write("The capital of France is Paris. Paris is known for its Eiffel Tower.")
    f.write("The capital of Germany is Berlin. Berlin has a rich history.")

# Load data and create a simple index for a tool
reader = SimpleDirectoryReader("./")
documents = reader.load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Define a tool for the question generator to use
tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="city_info",
        description="Provides information about cities and their landmarks."
    ),
)

# Initialize OpenAIQuestionGenerator
# It uses OpenAI's function calling API by default.
question_gen = OpenAIQuestionGenerator.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-0613"))

# Generate sub-questions based on a complex query and available tools
query_bundle = QueryBundle("Tell me about the capitals of European countries and their famous landmarks.")
sub_questions = question_gen.generate(
    tools=[tool],
    query=query_bundle
)

print(f"Generated {len(sub_questions)} sub-questions:")
for sq in sub_questions:
    print(f"- Question: {sq.sub_question}, Tool: {sq.tool_name}")

view raw JSON →