Voyage AI Python Library

0.3.7 · active · verified Sat Apr 11

Voyage AI provides a Python library (`voyageai`) that offers API endpoints for its state-of-the-art embedding and reranking models. These models convert unstructured data (text, images, video) into dense numerical vectors (embeddings), enabling advanced information retrieval tasks like semantic search and Retrieval-Augmented Generation (RAG). The library is actively maintained, with version 0.3.7 released recently, and it integrates seamlessly with various AI stacks.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Voyage AI client and generate text embeddings and rerank documents using recommended models. It emphasizes setting the API key via an environment variable (`VOYAGE_API_KEY`) and configuring `max_retries` for robustness.

import os
from voyageai import Client

# It's recommended to set your API key as an environment variable (VOYAGE_API_KEY).
# For demonstration, you can set it directly, but avoid this in production.
api_key = os.environ.get('VOYAGE_API_KEY', 'YOUR_VOYAGE_API_KEY')

if not api_key or api_key == 'YOUR_VOYAGE_API_KEY':
    print("Warning: VOYAGE_API_KEY environment variable not set or is a placeholder. Please set it for actual use.")
    # Exit or raise error if API key is critical for quickstart execution
    # For this example, we'll proceed with a dummy key but it will fail API calls.

client = Client(api_key=api_key, max_retries=3) # max_retries=3 added for robustness

texts = [
    "hello, world",
    "welcome to voyage ai!",
    "Voyage AI provides cutting-edge embedding and reranking models."
]

try:
    # Generate embeddings using a recommended model from the Voyage 4 series
    result = client.embed(
        texts, 
        model="voyage-4-large", 
        input_type="document" # Recommended for optimal retrieval
    )
    
    print(f"Generated {len(result.embeddings)} embeddings.")
    print(f"Each embedding has {len(result.embeddings[0])} dimensions.")
    print(f"First embedding (truncated): {result.embeddings[0][:10]}...")
    print(f"Total tokens used: {result.total_tokens}")

    # Example of reranking
    query = "AI models for text processing"
    documents = [
        "Voyage AI offers models for natural language understanding.",
        "The latest phone has a great camera and long battery life.",
        "Our embedding models are state-of-the-art for semantic search."
    ]
    rerank_result = client.rerank(query, documents, model="rerank-2.5")
    print("\nReranking results:")
    for i, r in enumerate(rerank_result.results):
        print(f"  Rank {i+1}: Score={r.relevance_score:.4f}, Document='{r.document.text}'")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure your VOYAGE_API_KEY is correctly set and you have network access.")

view raw JSON →