Sparse Dot Top-N

1.2.0 · active · verified Thu Apr 16

sparse-dot-topn is a Python package designed to accelerate sparse matrix multiplication followed by the selection of the top-N results. It significantly reduces memory footprint and improves performance for operations common in tasks like large-scale string comparison and entity matching. Developed by ING Wholesale Banking Advanced Analytics, it is currently at version 1.2.0 and receives regular updates with a focus on performance and Python version compatibility.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to perform a sparse matrix multiplication with top-N result selection using `sp_matmul_topn`. It creates two random CSR sparse matrices and computes their product, keeping only the top 10 values for each row in the result matrix. Ensure `scipy` and `numpy` are installed.

import scipy.sparse as sparse
from sparse_dot_topn import sp_matmul_topn
import numpy as np

# Create two sample sparse matrices (CSR format is recommended for performance)
A = sparse.random(1000, 100, density=0.1, format="csr", random_state=42)
B = sparse.random(100, 2000, density=0.1, format="csr", random_state=42)

# Compute C = A * B and retain the top 10 values per row in C
# sp_matmul_topn also supports `n_threads` for parallel execution
C = sp_matmul_topn(A, B, top_n=10, n_threads=None, threshold=0.0)

print(f"Shape of A: {A.shape}")
print(f"Shape of B: {B.shape}")
print(f"Shape of result C: {C.shape}")
print(f"Number of non-zero elements in C: {C.nnz}")
# print(C)

view raw JSON →