Annoy (Approximate Nearest Neighbors)

1.17.3 · active · verified Sat Apr 11

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings designed for efficient similarity search in high-dimensional spaces. It's optimized for memory usage and can create large, read-only, file-based data structures that are memory-mapped, enabling multiple processes to share the same index. The library is actively maintained by Spotify with frequent minor releases.

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize an Annoy index, add items (vectors), build the index for efficient search, save it to disk, load it back (memory-mapped), and then perform nearest neighbor queries using an item ID or a new vector. The `AnnoyIndex` constructor takes the vector dimension `f` and the distance `metric` (e.g., 'euclidean', 'angular'). The `build` method specifies the number of trees (`n_trees`) and jobs (`n_jobs`).

import os
from annoy import AnnoyIndex
import random

f = 40  # Length of item vector that will be indexed
t = AnnoyIndex(f, 'euclidean')  # or 'angular', 'manhattan', 'hamming', 'dot'

# Add items to the index
for i in range(1000):
    v = [random.gauss(0, 1) for _ in range(f)]
    t.add_item(i, v)

# Build the index with n_trees trees. n_jobs=-1 uses all CPU cores.
t.build(10, n_jobs=-1) 

# Save and load the index
index_path = 'test.ann'
t.save(index_path)

u = AnnoyIndex(f, 'euclidean')
u.load(index_path) # super fast, will just mmap the file

# Query for nearest neighbors
query_item_id = 0
k = 10 # Number of neighbors to retrieve

nearest_neighbors = u.get_nns_by_item(query_item_id, k)
print(f"Nearest neighbors for item {query_item_id}: {nearest_neighbors}")

query_vector = [random.gauss(0, 1) for _ in range(f)]
nearest_neighbors_by_vector = u.get_nns_by_vector(query_vector, k)
print(f"Nearest neighbors for a random vector: {nearest_neighbors_by_vector}")

# Clean up the created index file
if os.path.exists(index_path):
    os.remove(index_path)

view raw JSON →