Scann

1.4.2 · active · verified Thu Apr 16

ScaNN (Scalable Nearest Neighbors) is a library by Google Research for efficient vector similarity search at scale, implementing techniques like search space pruning and quantization. It offers both Python and TensorFlow APIs and is known for its speed and scalability with large datasets. The current version is 1.4.2, actively maintained, and released through PyPI.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a ScaNN searcher with a sample dataset, configure it for Maximum Inner Product Search (MIPS) using tree partitioning and anisotropic quantization, and then perform a similarity search. It uses the native Python API (`scann.scann_ops_pybind`) which does not require TensorFlow. The example generates random data for simplicity, but in a real application, `dataset` would be your actual high-dimensional vectors (e.g., embeddings).

import numpy as np
import scann

# 1. Prepare your dataset (e.g., embeddings)
# For demonstration, creating a random dataset of 1000 vectors, 128 dimensions each.
dataset = np.random.rand(1000, 128).astype(np.float32)

# 2. Build the ScaNN searcher
# This example uses dot product distance for Maximum Inner Product Search (MIPS).
# num_leaves: Number of leaves in the tree for partitioning.
# num_leaves_to_search: Number of leaves to search at query time.
# anisotropic_quantization_threshold: Parameter for Anisotropic Vector Quantization.

searcher = scann.scann_ops_pybind.builder(
    dataset,
    num_neighbors=10, # Number of nearest neighbors to retrieve
    distance_measure="dot_product"
).tree(
    num_leaves=100,
    num_leaves_to_search=10
).score_ah(
    dimensions_per_block=2, # Recommended for MIPS
    anisotropic_quantization_threshold=0.2
).reorder(
    100 # Rescore top 100 candidates to improve accuracy
).build()

# 3. Define a query vector
query = np.random.rand(128).astype(np.float32)

# 4. Perform a search
neighbors, distances = searcher.search(query)

print(f"Query vector shape: {query.shape}")
print(f"Dataset shape: {dataset.shape}")
print(f"Found {len(neighbors)} neighbors: {neighbors}")
print(f"Corresponding distances: {distances}")

view raw JSON →