{"id":8665,"library":"sparse-dot-topn","title":"Sparse Dot Top-N","description":"sparse-dot-topn is a Python package designed to accelerate sparse matrix multiplication followed by the selection of the top-N results. It significantly reduces memory footprint and improves performance for operations common in tasks like large-scale string comparison and entity matching. Developed by ING Wholesale Banking Advanced Analytics, it is currently at version 1.2.0 and receives regular updates with a focus on performance and Python version compatibility.","status":"active","version":"1.2.0","language":"en","source_language":"en","source_url":"https://github.com/ing-bank/sparse_dot_topn","tags":["sparse matrix","matrix multiplication","top-n","similarity","performance","scientific computing","machine learning"],"install":[{"cmd":"pip install sparse-dot-topn","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Core dependency for numerical operations and array handling.","package":"numpy","optional":false},{"reason":"Used for system and process utilities, possibly for resource monitoring or thread management.","package":"psutil","optional":false},{"reason":"Provides sparse matrix data structures (e.g., csr_matrix) which are fundamental to sparse-dot-topn's functionality.","package":"scipy","optional":false}],"imports":[{"note":"`awesome_cossim_topn` was deprecated in v1.0.0 and will be removed in future versions; use `sp_matmul_topn` instead.","wrong":"from sparse_dot_topn import awesome_cossim_topn","symbol":"sp_matmul_topn","correct":"from sparse_dot_topn import sp_matmul_topn"},{"symbol":"sp_matmul","correct":"from sparse_dot_topn import sp_matmul"},{"note":"Introduced in v1.1.0 for chunked matrix multiplication.","symbol":"zip_sp_matmul_topn","correct":"from sparse_dot_topn import zip_sp_matmul_topn"}],"quickstart":{"code":"import scipy.sparse as sparse\nfrom sparse_dot_topn import sp_matmul_topn\nimport numpy as np\n\n# Create two sample sparse matrices (CSR format is recommended for performance)\nA = sparse.random(1000, 100, density=0.1, format=\"csr\", random_state=42)\nB = sparse.random(100, 2000, density=0.1, format=\"csr\", random_state=42)\n\n# Compute C = A * B and retain the top 10 values per row in C\n# sp_matmul_topn also supports `n_threads` for parallel execution\nC = sp_matmul_topn(A, B, top_n=10, n_threads=None, threshold=0.0)\n\nprint(f\"Shape of A: {A.shape}\")\nprint(f\"Shape of B: {B.shape}\")\nprint(f\"Shape of result C: {C.shape}\")\nprint(f\"Number of non-zero elements in C: {C.nnz}\")\n# print(C)","lang":"python","description":"This example demonstrates how to perform a sparse matrix multiplication with top-N result selection using `sp_matmul_topn`. It creates two random CSR sparse matrices and computes their product, keeping only the top 10 values for each row in the result matrix. Ensure `scipy` and `numpy` are installed."},"warnings":[{"fix":"Upgrade your Python environment to 3.9 or newer.","message":"Python 3.8 support was dropped in version 1.2.0. Ensure you are using Python 3.9 or higher.","severity":"breaking","affected_versions":">=1.2.0"},{"fix":"Migrate your code to use the new `sp_matmul_topn` function and updated parameter names. Refer to the migration guide in the GitHub README.","message":"Major API changes in v1.0.0: `awesome_cossim_topn` was deprecated (use `sp_matmul_topn`), `ntop` parameter renamed to `topn`, `lower_bound` to `threshold`, and `use_threads`/`n_jobs` combined into `n_threads`.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"If experiencing OpenMP errors, try installing from source with `pip install sparse_dot_topn --no-binary sparse_dot_topn` to ensure architecture-specific optimizations, or run functions without explicitly specifying the `n_threads` argument. Check the `INSTALLATION.md` for platform-specific troubleshooting.","message":"OpenMP initialization issues, especially on MacOS, can lead to crashes or unexpected behavior due to double initialization or incorrect `rpath` settings.","severity":"gotcha","affected_versions":"All"},{"fix":"Profile your application with `threshold=None` versus a specific `threshold` value to determine the optimal setting for your use case.","message":"Setting `threshold=None` (the default since v1.0.0) enables pre-computation of non-zero entries, which can reduce memory at a mild performance penalty (~10%). If performance is critical and memory is not an issue, consider setting an explicit `threshold` (e.g., `0.0`).","severity":"gotcha","affected_versions":">=1.0.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Update your import statement and function calls to use `sp_matmul_topn` instead. For example: `from sparse_dot_topn import sp_matmul_topn` and `C = sp_matmul_topn(A, B, top_n=N)`.","cause":"The function `awesome_cossim_topn` was deprecated and replaced by `sp_matmul_topn` in version 1.0.0. It has likely been removed in newer versions.","error":"cannot import name 'awesome_cossim_topn' from 'sparse_dot_topn'"},{"fix":"Ensure you have a C++17 compatible compiler installed (e.g., `build-essential` on Debian/Ubuntu, Xcode Command Line Tools on MacOS, Visual Studio Build Tools on Windows). Also, explicitly install `cython` and `numpy` before `sparse-dot-topn`: `pip install cython numpy scipy sparse-dot-topn`. If issues persist, try specifying a known working NumPy version or installing with `--no-binary sparse_dot_topn`.","cause":"This error typically indicates missing build dependencies (like a C++ compiler, Cython, or compatible NumPy/SciPy versions) required to compile the underlying C++ extension if a pre-built wheel is not available for your system/Python version.","error":"Could not build wheels for sparse-dot-topn, which is required to install pyproject.toml-based projects"},{"fix":"For very large matrices (e.g., O(10M+) rows), consider breaking them into smaller chunks and using the `zip_sp_matmul_topn` function to process them distributedly. Adjusting `top_n` and `threshold` can also help reduce memory footprint.","cause":"Performing sparse matrix multiplication on extremely large matrices can still exhaust available memory, even with `sparse-dot-topn`'s optimizations.","error":"MemoryError: Unable to allocate ..."}]}