Hnswlib

0.8.0 · active · verified Tue Apr 14

Hnswlib is a lightweight, header-only C++ library with Python bindings designed for fast Approximate Nearest Neighbor (ANN) search. It implements the Hierarchical Navigable Small Worlds (HNSW) algorithm, enabling efficient similarity search in high-dimensional vector spaces. The library supports dynamic updates (insertion and deletion of elements) and various distance metrics like L2, Inner Product, and Cosine similarity. Its current stable release on PyPI is 0.8.0, with version 0.9.0 recently released on GitHub, and it maintains a relatively active release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create an HNSW index, initialize it with parameters, add vector data, perform a k-nearest neighbor query, and then save and load the index. It highlights the importance of re-setting the `ef` parameter after loading the index.

import hnswlib
import numpy as np
import os

# Define data parameters
dim = 128
num_elements = 10000

# Generate random data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)

# Initialize the HNSW index
# Possible space options: 'l2', 'ip' (inner product), 'cosine'
space_name = 'l2' # Euclidean distance
index = hnswlib.Index(space=space_name, dim=dim)

# Set index parameters BEFORE adding data
# max_elements: current capacity
# ef_construction: accuracy vs. construction speed trade-off
# M: number of bi-directional links per data point
index.init_index(max_elements=num_elements, ef_construction=200, M=16)

# Add items to the index
index.add_items(data, data_labels)

# Set query time accuracy/speed trade-off
# Note: This parameter is NOT saved with the index and must be set after loading.
index.set_ef(50)

# Generate a query vector
query_vector = np.float32(np.random.random((1, dim)))

# Perform a k-nearest neighbor query
k = 5
labels, distances = index.knn_query(query_vector, k=k)

print(f"Query vector: {query_vector[0][:5]}...")
print(f"Nearest neighbor labels: {labels[0]}")
print(f"Distances to neighbors: {distances[0]}")

# Example of saving and loading the index
index_path = 'my_hnsw_index.bin'
index.save_index(index_path)

loaded_index = hnswlib.Index(space=space_name, dim=dim)
loaded_index.load_index(index_path)
loaded_index.set_ef(50) # Re-set ef after loading

loaded_labels, loaded_distances = loaded_index.knn_query(query_vector, k=k)
print(f"Loaded index nearest neighbor labels: {loaded_labels[0]}")

os.remove(index_path)

view raw JSON →