ImageHash
ImageHash is a Python library that provides tools for generating perceptual hash values for images. These hashes can be used to compare images based on their visual content, making it useful for finding similar or duplicate images. It supports various hashing algorithms like aHash, pHash, dHash, wHash, colorhash, and crop-resistant hashing. The current version is 4.3.2, and it receives updates periodically to add features, improve performance, and address bugs. [1, 3, 8]
Warnings
- breaking Version 4.0 introduced a change in the binary to hex implementation for hashes, breaking compatibility with hashes generated by previous versions. [3, 6]
- breaking Version 3.0 fixed a bug in the `dhash` algorithm where it computed pixel differences vertically instead of horizontally. The corrected `dhash` behavior now follows the standard. [3, 6, 16]
- gotcha ImageHash functions expect a `PIL.Image.Image` object as input. Passing a NumPy array directly (e.g., from OpenCV) will result in an `AttributeError` because NumPy arrays do not have `convert` or `resize` methods expected by the library. [19]
- gotcha When loading images from URLs using libraries like `requests`, directly passing `resp.raw` to `Image.open()` can fail if the URL does not return raw image bytes (e.g., an HTML error page instead of an image). [21]
- gotcha Determining an appropriate 'similarity threshold' (Hamming distance) for comparing hashes is empirical and highly dependent on the use case. A smaller difference indicates more similarity. Common recommendations range from 1 to 10 for 'similar' images. [1, 14]
- gotcha Significant image transformations like extensive cropping, rotations beyond 15 degrees, or substantial color adjustments can drastically alter perceptual hashes, potentially making perceptually similar images appear completely different. [14, 22]
Install
-
pip install Pillow imagehash
Imports
- Image
from PIL import Image
- imagehash
import imagehash
Quickstart
from PIL import Image
import imagehash
import os
# Create a dummy image for demonstration if not available
dummy_image_path = 'dummy_image.png'
if not os.path.exists(dummy_image_path):
try:
from PIL import ImageDraw
img = Image.new('RGB', (200, 200), color = 'red')
d = ImageDraw.Draw(img)
d.text((10,10), "Hello", fill=(0,0,0))
img.save(dummy_image_path)
print(f"Created dummy image: {dummy_image_path}")
except ImportError:
print("Pillow is needed to create a dummy image. Please install it.")
exit()
try:
# Load an image
image = Image.open(dummy_image_path)
# Generate a perceptual hash (e.g., average hash)
hash_value = imagehash.average_hash(image)
print(f"Hash for '{dummy_image_path}': {hash_value}")
# You can also generate other types of hashes:
# phash_value = imagehash.phash(image)
# dhash_value = imagehash.dhash(image)
# whash_value = imagehash.whash(image)
# colorhash_value = imagehash.colorhash(image)
# To compare with another image:
# For demonstration, let's pretend to load a slightly different image
# In a real scenario, this would be another actual image file
slightly_different_image = Image.new('RGB', (200, 200), color = 'red')
d = ImageDraw.Draw(slightly_different_image)
d.text((15,15), "Hello", fill=(0,0,0)) # slight shift
other_hash_value = imagehash.average_hash(slightly_different_image)
print(f"Hash for a slightly different image: {other_hash_value}")
# Calculate the Hamming distance (difference) between hashes
difference = hash_value - other_hash_value
print(f"Difference between hashes: {difference}")
# A smaller difference indicates greater similarity
if difference < 5:
print("The images are considered similar (difference < 5).")
else:
print("The images are considered different (difference >= 5).")
except FileNotFoundError:
print(f"Error: Image file not found at {dummy_image_path}")
except Exception as e:
print(f"An error occurred: {e}")