{"library":"datasketch","code":"from datasketch import MinHash, MinHashLSH\n\n# Create MinHash objects for two sets\ns1 = {\"minhash\", \"is\", \"a\", \"probabilistic\", \"data\", \"structure\", \"for\", \"estimating\", \"similarity\", \"between\", \"sets\"}\ns2 = {\"minhash\", \"is\", \"a\", \"data\", \"structure\", \"for\", \"estimating\", \"similarity\", \"between\", \"documents\"}\ns3 = {\"today\", \"is\", \"a\", \"beautiful\", \"day\"}\n\nm1 = MinHash(num_perm=128)\nm2 = MinHash(num_perm=128)\nm3 = MinHash(num_perm=128)\n\nfor d in s1:\n    m1.update(d.encode('utf8'))\nfor d in s2:\n    m2.update(d.encode('utf8'))\nfor d in s3:\n    m3.update(d.encode('utf8'))\n\n# Create an LSH index with a threshold\nlsh = MinHashLSH(threshold=0.5, num_perm=128)\nlsh.insert(\"m1\", m1)\nlsh.insert(\"m2\", m2)\nlsh.insert(\"m3\", m3)\n\n# Query the LSH for candidates similar to m2\nprint(f\"Candidate keys for m2: {lsh.query(m2)}\")","lang":"python","description":"This quickstart demonstrates the core functionality of datasketch: creating MinHash objects from sets and using MinHashLSH to find approximate nearest neighbors. The example initializes three MinHash objects from different text sets, inserts them into an LSH index, and then queries the index to find items similar to `m2`.","tag":null,"tag_description":null,"last_tested":"2026-04-24","results":[{"runtime":"python:3.10-alpine","exit_code":0},{"runtime":"python:3.10-slim","exit_code":0},{"runtime":"python:3.11-alpine","exit_code":0},{"runtime":"python:3.11-slim","exit_code":0},{"runtime":"python:3.12-alpine","exit_code":0},{"runtime":"python:3.12-slim","exit_code":0},{"runtime":"python:3.13-alpine","exit_code":0},{"runtime":"python:3.13-slim","exit_code":0},{"runtime":"python:3.9-alpine","exit_code":0},{"runtime":"python:3.9-slim","exit_code":0}]}