Authors:
Richard Connor
1
;
Stewart MacKenzie-Leigh
1
;
Franco Alberto Cardillo
2
and
Robert Moss
1
Affiliations:
1
University of Strathclyde, United Kingdom
;
2
Consiglio Nazionale delle Ricerche, Italy
Keyword(s):
Near-duplicate Image Detection, Benchmark, Image Similarity Function, Forensic Image Detection.
Related
Ontology
Subjects/Areas/Topics:
Applications and Services
;
Computer Vision, Visualization and Computer Graphics
;
Multimedia Forensics
Abstract:
There are many contexts where the automated detection of near-duplicate images is important, for example
the detection of copyright infringement or images of child abuse. There are many published methods for the
detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively
compared with each other, probably because of a lack of any good framework in which to do so. Published
sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test
set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR-Flickr
1M image set contains a significant number of near-duplicate images, we have discovered the majority
of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very
likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is
the identification o
f these images, which may then be used by other authors to make comparisons as they see
fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity
and specificity over a general collection of images.
(More)