Voyage AI Python Library
Voyage AI provides a Python library (`voyageai`) that offers API endpoints for its state-of-the-art embedding and reranking models. These models convert unstructured data (text, images, video) into dense numerical vectors (embeddings), enabling advanced information retrieval tasks like semantic search and Retrieval-Augmented Generation (RAG). The library is actively maintained, with version 0.3.7 released recently, and it integrates seamlessly with various AI stacks.
Warnings
- breaking Python versions 3.8 and lower are no longer supported. Users should upgrade to Python 3.9 or higher.
- deprecated Older embedding models (e.g., `voyage-01`, `voyage-lite-01`, `voyage-lite-01-instruct`) are superseded by the Voyage 4 series (`voyage-4-large`, `voyage-4`, `voyage-4-lite`). The new series offers improved accuracy and a shared embedding space, making it easier to switch models without re-indexing.
- gotcha The `voyageai.Client` by default has `max_retries=0`, meaning it will not automatically retry API requests in case of transient errors (e.g., rate limits, temporary network issues).
- gotcha Users on the free tier or without a payment method are subject to strict API rate limits (e.g., 3 requests per minute, 10,000 tokens per minute). Exceeding these limits will result in `429 (Rate Limit Exceeded)` errors.
- gotcha For optimal retrieval and search performance, specify the `input_type` parameter (e.g., `"query"` or `"document"`) when calling `client.embed()`. This allows Voyage AI to prepend an appropriate prompt, making the embeddings more tailored for the intended use case.
Install
-
pip install -U voyageai
Imports
- Client
from voyageai import Client
Quickstart
import os
from voyageai import Client
# It's recommended to set your API key as an environment variable (VOYAGE_API_KEY).
# For demonstration, you can set it directly, but avoid this in production.
api_key = os.environ.get('VOYAGE_API_KEY', 'YOUR_VOYAGE_API_KEY')
if not api_key or api_key == 'YOUR_VOYAGE_API_KEY':
print("Warning: VOYAGE_API_KEY environment variable not set or is a placeholder. Please set it for actual use.")
# Exit or raise error if API key is critical for quickstart execution
# For this example, we'll proceed with a dummy key but it will fail API calls.
client = Client(api_key=api_key, max_retries=3) # max_retries=3 added for robustness
texts = [
"hello, world",
"welcome to voyage ai!",
"Voyage AI provides cutting-edge embedding and reranking models."
]
try:
# Generate embeddings using a recommended model from the Voyage 4 series
result = client.embed(
texts,
model="voyage-4-large",
input_type="document" # Recommended for optimal retrieval
)
print(f"Generated {len(result.embeddings)} embeddings.")
print(f"Each embedding has {len(result.embeddings[0])} dimensions.")
print(f"First embedding (truncated): {result.embeddings[0][:10]}...")
print(f"Total tokens used: {result.total_tokens}")
# Example of reranking
query = "AI models for text processing"
documents = [
"Voyage AI offers models for natural language understanding.",
"The latest phone has a great camera and long battery life.",
"Our embedding models are state-of-the-art for semantic search."
]
rerank_result = client.rerank(query, documents, model="rerank-2.5")
print("\nReranking results:")
for i, r in enumerate(rerank_result.results):
print(f" Rank {i+1}: Score={r.relevance_score:.4f}, Document='{r.document.text}'")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your VOYAGE_API_KEY is correctly set and you have network access.")