Splink

library 4.0.16 ·python

✓ verified Jun 28, 2026

Splink is a Python package for fast, accurate, and scalable probabilistic record linkage (entity resolution). It enables users to deduplicate and link records from datasets that lack unique identifiers, leveraging unsupervised learning based on the Fellegi-Sunter model. Splink supports various SQL backends like DuckDB, Apache Spark, and AWS Athena, allowing it to scale to datasets of 100 million records or more, and provides a suite of interactive visualizations for model understanding and diagnostics.