LlamaIndex OpenAI Embeddings

0.6.0 · active · verified Fri Apr 10

This library provides an integration for LlamaIndex, a data framework for LLM applications, to utilize OpenAI's embedding models. It enables users to convert text into numerical vector representations (embeddings) using various OpenAI models. Currently at version 0.6.0, it follows the rapid development and release cadence of the broader LlamaIndex ecosystem.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to install the `llama-index-embeddings-openai` package, set your OpenAI API key, and initialize `OpenAIEmbedding` either globally via `Settings` or as a local instance to generate text embeddings. It shows how to get embeddings for both single and multiple text inputs.

import os
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding

# Set your OpenAI API key as an environment variable
# It's recommended to load this from a .env file in production
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")

# Ensure the API key is set
if not os.environ["OPENAI_API_KEY"] or os.environ["OPENAI_API_KEY"] == "YOUR_OPENAI_API_KEY":
    raise ValueError("OPENAI_API_KEY environment variable not set. Please set it to your OpenAI API key.")

# Initialize the OpenAI Embedding model and set it as the global default
# By default, uses 'text-embedding-ada-002'
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

# Alternatively, create a local instance without setting it globally
embed_model_local = OpenAIEmbedding(model="text-embedding-3-small")

# Get a single text embedding using the local instance
text = "This is a test sentence for embedding with a local model."
embedding = embed_model_local.get_text_embedding(text)
print(f"Embedding length: {len(embedding)}")
# print(f"First 5 elements of embedding: {embedding[:5]}...")

# Get embeddings for multiple texts using the global default
texts_list = ["Hello world!", "LlamaIndex is great.", "OpenAI embeddings are powerful."]
embeddings_for_list = Settings.embed_model.get_text_embeddings(texts_list)
print(f"Number of embeddings for list: {len(embeddings_for_list)}")
for i, emb in enumerate(embeddings_for_list):
    print(f"Embedding {i} length: {len(emb)}")

view raw JSON →