Image Deduplicator

0.3.3.post2 · active · verified Sat Apr 11

Imagededup is a Python package that simplifies finding exact and near-duplicate images in a collection. It offers various algorithms like perceptual hashing (PHash, DHash, WHash, AHash) and convolutional neural networks (CNNs) for robust deduplication. The package also includes an evaluation framework and plotting utilities for duplicates. The current version is 0.3.3.post2, and it maintains an active development cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the Perceptual Hashing (PHash) method to find duplicate images in a specified directory. It involves initializing the hashing method, generating image encodings, and then finding duplicates based on these encodings. The example includes creating a dummy directory and files for immediate execution.

import os
from imagededup.methods import PHash

# Create a dummy image directory and some dummy files for demonstration
# In a real scenario, 'image_dir' would point to your directory of images.
image_dir = './my_images'
os.makedirs(image_dir, exist_ok=True)
with open(os.path.join(image_dir, 'image1.jpg'), 'w') as f: f.write('dummy image content 1')
with open(os.path.join(image_dir, 'image2.png'), 'w') as f: f.write('dummy image content 2')
# Add more dummy images as needed to test deduplication

# Initialize a perceptual hashing method
phasher = PHash()

# Generate encodings for all images in the directory
encodings = phasher.encode_images(image_dir=image_dir)

# Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)

print(f"Found duplicates: {duplicates}")

# Optionally, remove the dummy directory and files
# import shutil
# shutil.rmtree(image_dir)

view raw JSON →