Splink
JSON →Splink is a Python package for fast, accurate, and scalable probabilistic record linkage (entity resolution). It enables users to deduplicate and link records from datasets that lack unique identifiers, leveraging unsupervised learning based on the Fellegi-Sunter model. Splink supports various SQL backends like DuckDB, Apache Spark, and AWS Athena, allowing it to scale to datasets of 100 million records or more, and provides a suite of interactive visualizations for model understanding and diagnostics.
Traffic · last 30 days ↑150% vs prev 7d
total hits 17
actors 6 distinct systems
last hit 1d ago ChatGPT-User
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇦🇺 Australia · 🇫🇷 France
Resources
packagepypi.org/project/splink/ ↗
API endpoints
full doc /v1/registry/splink
install /v1/registry/splink/install
compatibility /v1/registry/splink/compatibility