Scann
ScaNN (Scalable Nearest Neighbors) is a library by Google Research for efficient vector similarity search at scale, implementing techniques like search space pruning and quantization. It offers both Python and TensorFlow APIs and is known for its speed and scalability with large datasets. The current version is 1.4.2, actively maintained, and released through PyPI.
Common errors
-
ERROR: Could not find a version that satisfies the requirement scann (from versions: none) ERROR: No matching distribution found for scann
cause This error typically occurs if your Python version is not within the supported range (e.g., <3.9 or >=3.14 for current versions), if your operating system architecture is not supported by available wheels (e.g., macOS or Windows without WSL), or if pip is outdated.fixCheck your Python version (`python --version`) and ensure it's compatible. Upgrade pip (`pip install --upgrade pip`). If on macOS/Windows, consider using a Linux environment (e.g., Docker, WSL) or building from source. -
TypeError: builder() missing 3 required positional arguments: 'db', 'num_neighbors', and 'distance_measure'
cause This indicates an incorrect call to the `builder()` method, usually from `scann.scann_ops_pybind` or `scann.scann_ops`. The `builder()` method requires the dataset, number of neighbors to retrieve, and the distance measure (e.g., 'dot_product', 'squared_l2') as its initial arguments.fixEnsure you are passing the `dataset` (a numpy array), `num_neighbors` (an integer), and `distance_measure` (a string) to `builder()`: `scann.scann_ops_pybind.builder(dataset, num_neighbors=10, distance_measure="dot_product")`. -
ERROR: Cannot create ScaNN index with empty table.
cause The ScaNN builder (or related indexing functions) was provided with an empty dataset or a dataset that became empty after filtering. ScaNN requires data to build its index.fixEnsure that the `dataset` array passed to the ScaNN builder is not empty and contains valid embedding vectors before attempting to build the searcher.
Warnings
- breaking ScaNN has strict Python version requirements and specific TensorFlow version compatibility. For example, ScaNN 1.2.0 dropped Python 3.5 support and was built against TensorFlow 2.4.0, making it incompatible with TensorFlow 2.3.x. More recent versions (e.g., 1.4.x) support Python 3.9-3.13.
- breaking As of ScaNN 1.4.0, TensorFlow op bindings are no longer enabled by default. `pip install scann` will *not* include TensorFlow integration.
- gotcha ScaNN wheels have system-level dependencies. x86 wheels require AVX and FMA instruction set support, while ARM wheels require NEON. Additionally, `manylinux_2_27` compatible wheels require `libstdc++` version 3.4.23 or above.
- deprecated The `ScannBuilder` API underwent changes in version 1.1.0. Rather than calling `create_tf` or `create_pybind` directly on a `ScannBuilder` object, you now use the `builder()` method from `scann_ops` or `scann_ops_pybind` to get a `ScannBuilder` object, and then call `build()` on it.
Install
-
pip install scann -
pip install scann[tf]
Imports
- scann
import scann
- scann_ops_pybind
import scann searcher = scann.scann_ops_pybind.builder(...).build()
- scann_ops
from scann import ScannBuilder
import scann searcher = scann.scann_ops.builder(...).build()
Quickstart
import numpy as np
import scann
# 1. Prepare your dataset (e.g., embeddings)
# For demonstration, creating a random dataset of 1000 vectors, 128 dimensions each.
dataset = np.random.rand(1000, 128).astype(np.float32)
# 2. Build the ScaNN searcher
# This example uses dot product distance for Maximum Inner Product Search (MIPS).
# num_leaves: Number of leaves in the tree for partitioning.
# num_leaves_to_search: Number of leaves to search at query time.
# anisotropic_quantization_threshold: Parameter for Anisotropic Vector Quantization.
searcher = scann.scann_ops_pybind.builder(
dataset,
num_neighbors=10, # Number of nearest neighbors to retrieve
distance_measure="dot_product"
).tree(
num_leaves=100,
num_leaves_to_search=10
).score_ah(
dimensions_per_block=2, # Recommended for MIPS
anisotropic_quantization_threshold=0.2
).reorder(
100 # Rescore top 100 candidates to improve accuracy
).build()
# 3. Define a query vector
query = np.random.rand(128).astype(np.float32)
# 4. Perform a search
neighbors, distances = searcher.search(query)
print(f"Query vector shape: {query.shape}")
print(f"Dataset shape: {dataset.shape}")
print(f"Found {len(neighbors)} neighbors: {neighbors}")
print(f"Corresponding distances: {distances}")