Chroma HNSWlib
Chroma HNSWlib is a Python library that serves as Chroma's fork of the highly efficient HNSW (Hierarchical Navigable Small World) C++ library for fast approximate nearest neighbor (ANN) search. It provides Python bindings to the C++ implementation, enabling high-performance vector similarity search capabilities often used as an underlying component for vector databases like ChromaDB. The current version is 0.7.6, and releases are automated via GitHub actions upon new version tags.
Warnings
- breaking Direct installation fails for Python 3.13 due to a lack of pre-built wheels and potential compilation issues. Additionally, there are no pre-built wheels for Python 3.12 on Windows.
- gotcha Building `chroma-hnswlib` from source (e.g., if a wheel is not available for your OS/Python version, or using `--no-binary`) requires C++ build tools (e.g., Microsoft Visual C++ 14.0 or greater on Windows). This can lead to build errors if dependencies are not met.
- gotcha For maximum performance, especially leveraging Advanced Vector Extensions (AVX) if your hardware supports it, you may need to force recompilation of the library by installing with `--no-binary chroma-hnswlib`. Pre-built wheels are compiled for broader compatibility and might not use AVX.
- gotcha HNSWlib's memory usage is typically higher compared to some other Approximate Nearest Neighbor (ANN) libraries because it needs to store the graph structure in memory, which scales with the number of elements and the `M` parameter.
- gotcha Optimal performance (trade-off between query speed, index build time, and recall) depends on tuning parameters like `M`, `ef_construction`, and `ef_search`. Incorrect settings can lead to poor search accuracy or slow operations.
Install
-
pip install chroma-hnswlib
Imports
- Index
import hnswlib index = hnswlib.Index(space='l2', dim=128)
Quickstart
import hnswlib
import numpy as np
dim = 128
num_elements = 10000
# Generate random data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)
# Initialize and configure the index
# 'l2' for Euclidean distance, 'ip' for inner product, 'cosine' for cosine similarity
index = hnswlib.Index(space='l2', dim=dim)
index.init_index(max_elements=num_elements, ef_construction=200, M=16)
# Add elements to the index
index.add_items(data, data_labels)
# Perform a search
num_queries = 5
query_data = np.float32(np.random.random((num_queries, dim)))
k = 10 # Number of nearest neighbors to return
labels, distances = index.knn_query(query_data, k=k)
print("Query Results (labels, distances):")
for i in range(num_queries):
print(f" Query {i}: {labels[i]}, {distances[i]}")