{"id":3869,"library":"annoy","title":"Annoy (Approximate Nearest Neighbors)","description":"Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings designed for efficient similarity search in high-dimensional spaces. It's optimized for memory usage and can create large, read-only, file-based data structures that are memory-mapped, enabling multiple processes to share the same index. The library is actively maintained by Spotify with frequent minor releases.","status":"active","version":"1.17.3","language":"en","source_language":"en","source_url":"https://github.com/spotify/annoy","tags":["nearest neighbors","approximate nearest neighbors","vector search","similarity search","indexing","memory efficient","ann"],"install":[{"cmd":"pip install annoy","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"AnnoyIndex","correct":"from annoy import AnnoyIndex"}],"quickstart":{"code":"import os\nfrom annoy import AnnoyIndex\nimport random\n\nf = 40  # Length of item vector that will be indexed\nt = AnnoyIndex(f, 'euclidean')  # or 'angular', 'manhattan', 'hamming', 'dot'\n\n# Add items to the index\nfor i in range(1000):\n    v = [random.gauss(0, 1) for _ in range(f)]\n    t.add_item(i, v)\n\n# Build the index with n_trees trees. n_jobs=-1 uses all CPU cores.\nt.build(10, n_jobs=-1) \n\n# Save and load the index\nindex_path = 'test.ann'\nt.save(index_path)\n\nu = AnnoyIndex(f, 'euclidean')\nu.load(index_path) # super fast, will just mmap the file\n\n# Query for nearest neighbors\nquery_item_id = 0\nk = 10 # Number of neighbors to retrieve\n\nnearest_neighbors = u.get_nns_by_item(query_item_id, k)\nprint(f\"Nearest neighbors for item {query_item_id}: {nearest_neighbors}\")\n\nquery_vector = [random.gauss(0, 1) for _ in range(f)]\nnearest_neighbors_by_vector = u.get_nns_by_vector(query_vector, k)\nprint(f\"Nearest neighbors for a random vector: {nearest_neighbors_by_vector}\")\n\n# Clean up the created index file\nif os.path.exists(index_path):\n    os.remove(index_path)","lang":"python","description":"This example demonstrates how to initialize an Annoy index, add items (vectors), build the index for efficient search, save it to disk, load it back (memory-mapped), and then perform nearest neighbor queries using an item ID or a new vector. The `AnnoyIndex` constructor takes the vector dimension `f` and the distance `metric` (e.g., 'euclidean', 'angular'). The `build` method specifies the number of trees (`n_trees`) and jobs (`n_jobs`)."},"warnings":[{"fix":"Plan your data ingestion to add all items before calling `.build()`. If your dataset changes, you must rebuild the entire index.","message":"Once the `build()` method is called on an `AnnoyIndex` instance, no more items can be added to that index. Annoy is designed for static, read-only indexes after creation. If you need a mutable index, consider rebuilding or using an alternative library.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Map your arbitrary item identifiers to a dense range of non-negative integers (e.g., 0, 1, ..., N-1) before adding them to Annoy.","message":"Item IDs must be non-negative integers. Annoy allocates memory for `max(id)+1` items, assuming dense integer IDs from 0 to N-1. Using sparse or very large IDs can lead to excessive memory allocation or unexpected behavior.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Experiment with different `n_trees` (e.g., 10-1000) during index creation and `search_k` (e.g., `n_trees * 2` or more) during query time to find the optimal trade-off for your dataset and latency requirements.","message":"The `n_trees` parameter (during build) affects build time and index size; higher values give better accuracy but larger indexes. The `search_k` parameter (during search) affects search time; higher values give better accuracy but longer search times. You must tune these parameters for your specific accuracy and performance needs.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to version 1.17.2 or newer to benefit from memory leak fixes.","message":"Older versions (prior to 1.17.2) were known to have memory leaks, especially during index building or repeated operations.","severity":"deprecated","affected_versions":"<1.17.2"},{"fix":"Ensure `build()` is called exactly once before `save()`, and only call `build()` on an index that has not been built yet.","message":"Version 1.16.1 introduced stricter checks, preventing saving an index that hasn't been built or building an index that has already been built.","severity":"breaking","affected_versions":">=1.16.1"},{"fix":"Ensure you are using the latest stable version of Annoy. If issues persist, check the GitHub issues for platform-specific workarounds or compiler flags.","message":"Compilation issues have occurred on specific platforms, such as OS X (fixed in 1.17.3) and certain GCC versions with AVX instructions (fixed in 1.16.1). These can prevent successful installation or lead to runtime errors.","severity":"gotcha","affected_versions":"Various pre-1.17.3, pre-1.16.1"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}