Annoy (Approximate Nearest Neighbors)

raw JSON →
1.17.3 verified Thu May 14 auth: no python

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings designed for efficient similarity search in high-dimensional spaces. It's optimized for memory usage and can create large, read-only, file-based data structures that are memory-mapped, enabling multiple processes to share the same index. The library is actively maintained by Spotify with frequent minor releases.

pip install annoy
error ModuleNotFoundError: No module named 'annoy'
cause The 'annoy' library is not installed in the Python environment.
fix
Install the 'annoy' library using pip: 'pip install annoy'.
error ImportError: cannot import name 'AnnoyIndex' from 'annoy'
cause Incorrect import statement; 'AnnoyIndex' should be imported directly from 'annoy'.
fix
Use the correct import statement: 'from annoy import AnnoyIndex'.
error TypeError: 'NoneType' object is not subscriptable
cause Attempting to access elements of a None object, possibly due to a failed 'AnnoyIndex' initialization.
fix
Ensure 'AnnoyIndex' is properly initialized with correct parameters before use.
error ValueError: Number of trees must be greater than zero
cause The 'n_trees' parameter in 'AnnoyIndex.build()' is set to zero or a negative number.
fix
Set 'n_trees' to a positive integer when building the index: 'index.build(n_trees)'.
error RuntimeError: You must build the index before querying
cause Attempting to query the 'AnnoyIndex' before building it.
fix
Build the index using 'index.build(n_trees)' before performing queries.
gotcha Once the `build()` method is called on an `AnnoyIndex` instance, no more items can be added to that index. Annoy is designed for static, read-only indexes after creation. If you need a mutable index, consider rebuilding or using an alternative library.
fix Plan your data ingestion to add all items before calling `.build()`. If your dataset changes, you must rebuild the entire index.
gotcha Item IDs must be non-negative integers. Annoy allocates memory for `max(id)+1` items, assuming dense integer IDs from 0 to N-1. Using sparse or very large IDs can lead to excessive memory allocation or unexpected behavior.
fix Map your arbitrary item identifiers to a dense range of non-negative integers (e.g., 0, 1, ..., N-1) before adding them to Annoy.
gotcha The `n_trees` parameter (during build) affects build time and index size; higher values give better accuracy but larger indexes. The `search_k` parameter (during search) affects search time; higher values give better accuracy but longer search times. You must tune these parameters for your specific accuracy and performance needs.
fix Experiment with different `n_trees` (e.g., 10-1000) during index creation and `search_k` (e.g., `n_trees * 2` or more) during query time to find the optimal trade-off for your dataset and latency requirements.
deprecated Older versions (prior to 1.17.2) were known to have memory leaks, especially during index building or repeated operations.
fix Upgrade to version 1.17.2 or newer to benefit from memory leak fixes.
breaking Version 1.16.1 introduced stricter checks, preventing saving an index that hasn't been built or building an index that has already been built.
fix Ensure `build()` is called exactly once before `save()`, and only call `build()` on an index that has not been built yet.
gotcha Compilation issues have occurred on specific platforms, such as OS X (fixed in 1.17.3) and certain GCC versions with AVX instructions (fixed in 1.16.1). These can prevent successful installation or lead to runtime errors.
fix Ensure you are using the latest stable version of Annoy. If issues persist, check the GitHub issues for platform-specific workarounds or compiler flags.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) build_error - - - - - -
3.10 alpine (musl) - - - - - -
3.10 slim (glibc) build_error - 2.7s - - - -
3.10 slim (glibc) - - - - - -
3.11 alpine (musl) build_error - - - - - -
3.11 alpine (musl) - - - - - -
3.11 slim (glibc) build_error - 2.6s - - - -
3.11 slim (glibc) - - - - - -
3.12 alpine (musl) build_error - - - - - -
3.12 alpine (musl) - - - - - -
3.12 slim (glibc) build_error - 3.4s - - - -
3.12 slim (glibc) - - - - - -
3.13 alpine (musl) build_error - - - - - -
3.13 alpine (musl) - - - - - -
3.13 slim (glibc) build_error - 3.1s - - - -
3.13 slim (glibc) - - - - - -
3.9 alpine (musl) build_error - - - - - -
3.9 alpine (musl) - - - - - -
3.9 slim (glibc) build_error - 3.2s - - - -
3.9 slim (glibc) - - - - - -

This example demonstrates how to initialize an Annoy index, add items (vectors), build the index for efficient search, save it to disk, load it back (memory-mapped), and then perform nearest neighbor queries using an item ID or a new vector. The `AnnoyIndex` constructor takes the vector dimension `f` and the distance `metric` (e.g., 'euclidean', 'angular'). The `build` method specifies the number of trees (`n_trees`) and jobs (`n_jobs`).

import os
from annoy import AnnoyIndex
import random

f = 40  # Length of item vector that will be indexed
t = AnnoyIndex(f, 'euclidean')  # or 'angular', 'manhattan', 'hamming', 'dot'

# Add items to the index
for i in range(1000):
    v = [random.gauss(0, 1) for _ in range(f)]
    t.add_item(i, v)

# Build the index with n_trees trees. n_jobs=-1 uses all CPU cores.
t.build(10, n_jobs=-1) 

# Save and load the index
index_path = 'test.ann'
t.save(index_path)

u = AnnoyIndex(f, 'euclidean')
u.load(index_path) # super fast, will just mmap the file

# Query for nearest neighbors
query_item_id = 0
k = 10 # Number of neighbors to retrieve

nearest_neighbors = u.get_nns_by_item(query_item_id, k)
print(f"Nearest neighbors for item {query_item_id}: {nearest_neighbors}")

query_vector = [random.gauss(0, 1) for _ in range(f)]
nearest_neighbors_by_vector = u.get_nns_by_vector(query_vector, k)
print(f"Nearest neighbors for a random vector: {nearest_neighbors_by_vector}")

# Clean up the created index file
if os.path.exists(index_path):
    os.remove(index_path)