LlamaIndex OpenAI Embeddings
This library provides an integration for LlamaIndex, a data framework for LLM applications, to utilize OpenAI's embedding models. It enables users to convert text into numerical vector representations (embeddings) using various OpenAI models. Currently at version 0.6.0, it follows the rapid development and release cadence of the broader LlamaIndex ecosystem.
Warnings
- breaking Breaking Change (LlamaIndex v0.9.x): Embedding providers are no longer re-exported from `llama_index.core`. You must import `OpenAIEmbedding` directly from `llama_index.embeddings.openai`.
- breaking Breaking Change (LlamaIndex v0.11.x): Default LLM and embedding models are no longer set automatically via `Settings`. You must explicitly set `Settings.embed_model`.
- gotcha OpenAI API Key is mandatory and must be configured. Lack of a valid key will result in `APIConnectionError` or `AuthenticationError`.
- gotcha Older `llama-index` core versions (e.g., 0.10.6) might encounter issues with `callback_manager` assignments leading to `ValueError` or crashes when using `OpenAIEmbedding`.
- gotcha Rate limits and connection errors can occur due to frequent API calls or network issues, especially when processing many documents.
Install
-
pip install llama-index-embeddings-openai
Imports
- OpenAIEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding
- Settings
from llama_index.core import Settings
Quickstart
import os
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
# Set your OpenAI API key as an environment variable
# It's recommended to load this from a .env file in production
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")
# Ensure the API key is set
if not os.environ["OPENAI_API_KEY"] or os.environ["OPENAI_API_KEY"] == "YOUR_OPENAI_API_KEY":
raise ValueError("OPENAI_API_KEY environment variable not set. Please set it to your OpenAI API key.")
# Initialize the OpenAI Embedding model and set it as the global default
# By default, uses 'text-embedding-ada-002'
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
# Alternatively, create a local instance without setting it globally
embed_model_local = OpenAIEmbedding(model="text-embedding-3-small")
# Get a single text embedding using the local instance
text = "This is a test sentence for embedding with a local model."
embedding = embed_model_local.get_text_embedding(text)
print(f"Embedding length: {len(embedding)}")
# print(f"First 5 elements of embedding: {embedding[:5]}...")
# Get embeddings for multiple texts using the global default
texts_list = ["Hello world!", "LlamaIndex is great.", "OpenAI embeddings are powerful."]
embeddings_for_list = Settings.embed_model.get_text_embeddings(texts_list)
print(f"Number of embeddings for list: {len(embeddings_for_list)}")
for i, emb in enumerate(embeddings_for_list):
print(f"Embedding {i} length: {len(emb)}")