Chroma HNSWlib

0.7.6 · active · verified Thu Apr 09

Chroma HNSWlib is a Python library that serves as Chroma's fork of the highly efficient HNSW (Hierarchical Navigable Small World) C++ library for fast approximate nearest neighbor (ANN) search. It provides Python bindings to the C++ implementation, enabling high-performance vector similarity search capabilities often used as an underlying component for vector databases like ChromaDB. The current version is 0.7.6, and releases are automated via GitHub actions upon new version tags.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize an HNSW index, add random 128-dimensional vectors to it, and then perform an approximate k-nearest neighbor search. Key parameters like `ef_construction` and `M` are set during index initialization to tune performance and accuracy.

import hnswlib
import numpy as np

dim = 128
num_elements = 10000

# Generate random data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)

# Initialize and configure the index
# 'l2' for Euclidean distance, 'ip' for inner product, 'cosine' for cosine similarity
index = hnswlib.Index(space='l2', dim=dim)
index.init_index(max_elements=num_elements, ef_construction=200, M=16)

# Add elements to the index
index.add_items(data, data_labels)

# Perform a search
num_queries = 5
query_data = np.float32(np.random.random((num_queries, dim)))
k = 10 # Number of nearest neighbors to return

labels, distances = index.knn_query(query_data, k=k)

print("Query Results (labels, distances):")
for i in range(num_queries):
    print(f"  Query {i}: {labels[i]}, {distances[i]}")

view raw JSON →