{"library":"simhash","title":"Simhash Python Library","description":"The `simhash` library provides a Python implementation of the Simhash Algorithm, a technique for quickly finding near-duplicate documents or comparing the similarity of two texts or data objects. It's highly useful for tasks like large-scale content deduplication, spam detection, and content recommendation, offering a fast way to identify perceptually similar items. The current version is 2.1.2, and it follows an irregular release cadence based on contributions and bug fixes.","language":"python","status":"active","last_verified":"Thu Apr 16","install":{"commands":["pip install simhash"],"cli":null},"imports":["from simhash import Simhash"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"from simhash import Simhash\n\n# Create Simhash objects from text\ntext1 = \"The quick brown fox jumps over the lazy dog.\"\ntext2 = \"A quick brown fox jumps over the sleeping dog.\"\ntext3 = \"Python is a programming language.\"\n\nhash1 = Simhash(text1)\nhash2 = Simhash(text2)\nhash3 = Simhash(text3)\n\n# Calculate Hamming distance between hashes\n# A lower distance means higher similarity\nprint(f\"Distance between '{text1[:20]}...' and '{text2[:20]}...': {hash1.distance(hash2)}\")\nprint(f\"Distance between '{text1[:20]}...' and '{text3[:20]}...': {hash1.distance(hash3)}\")\n\n# You can use a similarity threshold to determine if items are 'duplicates'\nsimilarity_threshold = 3\nif hash1.distance(hash2) < similarity_threshold:\n    print(f\"'{text1[:20]}...' and '{text2[:20]}...' are considered very similar.\")\nelse:\n    print(f\"'{text1[:20]}...' and '{text2[:20]}...' are not considered very similar.\")","lang":"python","description":"This quickstart demonstrates how to create `Simhash` objects from strings and calculate the Hamming distance between them. A smaller distance indicates greater similarity. The choice of similarity threshold depends on your specific application.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}