What is duplicate detection in Mediapapa?

Mediapapa Feature

Duplicate detection in Mediapapa identifies media files that are identical by content — regardless of filename or upload date. It groups duplicates, identifies the reference version (the one with the most active references), and enables safe removal of the copies.

How Mediapapa detects duplicates

Mediapapa compares files by content hash — a fingerprint generated from the file data itself. Two files with different names but identical content will have the same hash and be flagged as duplicates. This is more reliable than filename comparison, which would miss “product-image.jpg” and “product-image-v2.jpg” being the same photo.

When duplicates are found, Mediapapa identifies the reference version — typically the attachment with the most active content references in the usage index. The other copies are marked for review.

How to safely remove duplicates

Removing a duplicate without updating references is dangerous. If posts reference the duplicate’s attachment ID, deleting it breaks those posts. Safe Replace handles this correctly: it updates every reference to the duplicate to point to the reference version, then removes the duplicate.

The duplicate blocker on upload prevents new duplicates from accumulating. When you upload a file that matches an existing one by content hash, Mediapapa warns you before the upload completes.

Frequently asked questions

Does duplicate detection find near-duplicates?

No. Mediapapa identifies exact duplicates by content hash. Near-duplicates — the same image at different sizes, or slightly edited versions — are not flagged. Near-duplicate detection requires more complex image analysis that is not part of the current scope.

Is duplicate detection available in the free version?

Detection — identifying duplicates and showing the reference version — is free. Bulk removal and Safe Replace for consolidating references require Pro.