NMSLIB (Non-Metric Space Library)
NMSLIB (Non-Metric Space Library) is an efficient cross-platform similarity search library with Python bindings, designed for approximate nearest neighbor search in high-dimensional and non-metric spaces. It is widely used for vector similarity tasks, often in conjunction with embeddings from libraries like Gensim. The current version is 2.1.2, and the library receives updates focused on performance, new features, and broader platform/Python support.
Common errors
-
error: subprocess-exited-with-error × Running setup.py install for nmslib did not run successfully.
cause This generic error during installation often indicates missing C++ build tools, an outdated Python development environment, or incompatibility with the current Python version or operating system's compiler.fixEnsure you have the necessary C++ build tools (e.g., `python3-dev` on Linux, Xcode command line tools on macOS, Visual Studio 2019 or later on Windows). For Windows, ensure your Visual Studio installation matches the Python version's build requirements. Consider installing from source using `pip install --no-binary :all: nmslib` after installing build tools, or directly from a cloned Git repository. -
RuntimeError: Unsupported compiler -- at least C++11 support is needed
cause The C++ compiler available on your system (or detected by `setuptools`) does not meet the C++11 standard requirement for building NMSLIB.fixUpdate your C++ compiler (e.g., GCC to version 4.7+, Clang to 3.4+, Visual Studio to 2015/VC14 or later). On Linux, ensure `g++` is up-to-date. On macOS, ensure Xcode command-line tools are installed and updated (`xcode-select --install`). -
Build error with Apple clang: error: unknown type name 'thread_local' (Sonoma / aarch64)
cause Specific compiler flags or features required for NMSLIB might be incompatible with newer versions of Apple Clang or architectures like Aarch64, particularly on macOS Sonoma.fixThis often requires patching the NMSLIB source or using a specific compiler version. Check the NMSLIB GitHub issues for workarounds or newer releases that address this specific build problem. Building from source with specific compiler flags or a different compiler environment (e.g., via `conda` with a known working C++ toolchain) might be necessary. -
Incompatibility with python >= 3.9 -- repeating only one item.
cause A bug in specific NMSLIB versions or their interaction with newer Python runtime versions (>=3.9) can cause incorrect query results, such as repeatedly returning the same item.fixEnsure you are using the latest stable version of NMSLIB (2.1.2 or newer) which often includes fixes for Python compatibility. If the issue persists, review GitHub issues for specific patches or temporary workarounds related to your Python version.
Warnings
- breaking NMSLIB versions between 2.0.6 and 2.1.1 had deployment issues and were deleted. Users who installed these versions are advised to delete them and install a more recent version (>=2.1.1) to avoid potential instability or missing features.
- gotcha Pre-compiled binaries installed via `pip install nmslib` might be slower. For optimal performance, especially when using optimized spaces (e.g., negdotprod, l1, linf), it's recommended to install from source.
- gotcha When saving an HNSW index for `l2` or `cosinesimil` spaces, `saveIndex` can store an optimized copy of the data. If both `save_data=True` and `load_data=True` are used, this can lead to data duplication. Additionally, `getDistance` might not work properly on a loaded index if data is not explicitly reloaded.
- deprecated The `nmslib` package currently uses the legacy `setup.py install` method. Pip versions 23.1 and later may enforce changes, potentially leading to installation failures.
Install
-
pip install nmslib -
pip install --no-binary :all: nmslib
Imports
- nmslib
import nmslib
- numpy
import numpy
Quickstart
import nmslib
import numpy
# Create a random matrix to index
data = numpy.random.randn(10000, 100).astype(numpy.float32)
# Initialize a new index using HNSW on Cosine Similarity
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)
# Query for the nearest neighbours of the first datapoint
ids, distances = index.knnQuery(data[0], k=10)
print(f"Nearest neighbors for data[0]: {ids}, distances: {distances}")
# Get all nearest neighbours for all the datapoints using multiple threads
# neighbours = index.knnQueryBatch(data, k=10, num_threads=4)
# print(f"Batch query results (first entry): {neighbours[0]}")