PyNNDescent
PyNNDescent is a Python library that provides a fast and flexible implementation of Nearest Neighbor Descent for approximate nearest neighbor search and k-neighbor-graph construction. It supports a wide variety of distance metrics, sparse matrix inputs, and integrates with Scikit-learn. The current version is 0.6.0, and it maintains a regular release cadence with several minor patches and updates throughout the year.
Warnings
- deprecated The `n_search_trees` parameter in `NNDescent` has been deprecated. While it may still work, it's recommended to rely on the default or other parameters for controlling initialization.
- gotcha For NumPy versions 2.0 and above, `np.infty` has been replaced with `np.inf`. PyNNDescent versions `0.5.13` and later include patches for compatibility. If using an older version of PyNNDescent with newer NumPy, this could lead to issues.
- breaking Version `0.6.0` removed support for End-of-Life Python versions and officially added support for Python 3.12 and 3.13. If you are on an older Python version, this update may break your environment.
- gotcha Earlier versions (`<0.5.9`) had bugs causing infinite recursion during random projection tree generation, especially for certain datasets or configurations. This could lead to crashes or hanging processes.
- gotcha In `0.5.11`, caching for functions that take distance metrics as arguments was removed. If your application relied on this caching for performance, you might observe a change in execution time after upgrading.
Install
-
pip install pynndescent
Imports
- NNDescent
from pynndescent import NNDescent
- PyNNDescentTransformer
from pynndescent import PyNNDescentTransformer
Quickstart
import numpy as np
from pynndescent import NNDescent
# Generate some sample data
data = np.random.rand(1000, 64).astype(np.float32)
# Build the NNDescent index
# n_neighbors specifies the number of neighbors to find for each point
# verbose=True shows progress
index = NNDescent(data, n_neighbors=15, verbose=True)
# Build the index (computes the nearest neighbor graph)
index.prepare()
# Query the index for the 5 nearest neighbors of new data
query_data = np.random.rand(10, 64).astype(np.float32)
neighbors, distances = index.query(query_data, k=5)
print("Shape of neighbors (query_points, k):"), print(neighbors.shape)
print("Shape of distances (query_points, k):"), print(distances.shape)
print("First query point's 5 nearest neighbor indices:"), print(neighbors[0])
print("First query point's 5 nearest neighbor distances:"), print(distances[0])