Hnswlib
Hnswlib is a lightweight, header-only C++ library with Python bindings designed for fast Approximate Nearest Neighbor (ANN) search. It implements the Hierarchical Navigable Small Worlds (HNSW) algorithm, enabling efficient similarity search in high-dimensional vector spaces. The library supports dynamic updates (insertion and deletion of elements) and various distance metrics like L2, Inner Product, and Cosine similarity. Its current stable release on PyPI is 0.8.0, with version 0.9.0 recently released on GitHub, and it maintains a relatively active release cadence.
Warnings
- breaking Indices saved with very old versions (prior to v0.3.4) are not supported and cannot be loaded with newer `hnswlib` versions.
- breaking Saving and loading of large pickled indices (greater than 4GB) in versions prior to 0.6.2 could lead to data corruption.
- breaking Indices built with AVX512 or AVX optimizations (enabled during compilation) in `hnswlib` v0.6.1 and later may not be backwards-compatible with older SSE or non-AVX512 architectures. This can cause issues when moving indices between machines with different CPU capabilities.
- gotcha The `ef` parameter, which controls the query-time accuracy/speed trade-off, is *not* saved as part of the index. It must be manually set after loading a saved index.
- breaking In `hnswlib` v0.8.0, statistic aggregation was removed by default for multi-threaded search to improve speed. Users who relied on this feature might observe changes in behavior or require explicit configuration if it's still needed.
- gotcha When performing brute-force searches with filters, versions prior to 0.9.0 (currently on GitHub, not yet PyPI stable) contained bugs that could lead to incorrect results or missing normalization checks. Additionally, searching for `k` elements when fewer than `k` are available now explicitly throws an exception.
Install
-
pip install hnswlib
Imports
- Index
import hnswlib import numpy as np index = hnswlib.Index(space='l2', dim=128)
Quickstart
import hnswlib
import numpy as np
import os
# Define data parameters
dim = 128
num_elements = 10000
# Generate random data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)
# Initialize the HNSW index
# Possible space options: 'l2', 'ip' (inner product), 'cosine'
space_name = 'l2' # Euclidean distance
index = hnswlib.Index(space=space_name, dim=dim)
# Set index parameters BEFORE adding data
# max_elements: current capacity
# ef_construction: accuracy vs. construction speed trade-off
# M: number of bi-directional links per data point
index.init_index(max_elements=num_elements, ef_construction=200, M=16)
# Add items to the index
index.add_items(data, data_labels)
# Set query time accuracy/speed trade-off
# Note: This parameter is NOT saved with the index and must be set after loading.
index.set_ef(50)
# Generate a query vector
query_vector = np.float32(np.random.random((1, dim)))
# Perform a k-nearest neighbor query
k = 5
labels, distances = index.knn_query(query_vector, k=k)
print(f"Query vector: {query_vector[0][:5]}...")
print(f"Nearest neighbor labels: {labels[0]}")
print(f"Distances to neighbors: {distances[0]}")
# Example of saving and loading the index
index_path = 'my_hnsw_index.bin'
index.save_index(index_path)
loaded_index = hnswlib.Index(space=space_name, dim=dim)
loaded_index.load_index(index_path)
loaded_index.set_ef(50) # Re-set ef after loading
loaded_labels, loaded_distances = loaded_index.knn_query(query_vector, k=k)
print(f"Loaded index nearest neighbor labels: {loaded_labels[0]}")
os.remove(index_path)