database operator as its core. We introduced the self
range wide-join operator: an improved version of the
wide-join that enables computing similarity by com-
bining a relation to itself. We optimized the wide-
join algorithm to scan the search space relying on
pivots and using metric space properties to prune ele-
ments, which enabled achieving a large performance
gain when compared to the existing solutions.
The experiments were executed using two real
datasets. They showed that our proposed wide-join-
based framework is able not only to improve the near-
duplicate detection performance by at least 2 and up
to 3 orders of magnitude, but also to improve the qual-
ity of the results when compared to the previous tech-
niques.
The introduced technique is general enough to be
applied over any dataset in a metric space, but we fo-
cused its application for an emergency-based appli-
cation. When handling an emergency scenario, it is
common that the eyewitnesses capture a large amount
of photos and videos about the incident. Existing
monitoring systems can benefit from those crowd-
sourcing information, aiming at improving decision
making support. However, as the information in-
creases, its elements tend to become too similar, so
it is crucial to provide efficient techniques to properly
handle near-duplicates.
As future work, we are exploring data distribution
statistics and selectivity estimations for join operators
in order to provide accurate definitions of the param-
eters required by the self-similarity range wide-join.
We also intend to combine the images with their as-
sociated meta-data in order to further improve both
the precision and the performance of near-duplicate
detection.
ACKNOWLEDGEMENTS
The authors are grateful to FAPESP, CNPQ, CAPES
and Rescuer (EU FP7-614154 / CNPQ 490084/2013-
3) for their financial support.
REFERENCES
Bangay, S. and Lv, O. (2012). Evaluating locality sensitive
hashing for matching partial image patches in a social
media setting. Journal of Multimedia, 1(9):14–24.
Carvalho, L. O., Santos, L. F. D., Oliveira, W. D., Traina, A.
J. M., and Traina Jr., C. (2015). Similarity joins and
beyond: an extended set of operators with order. In
Proc. 8th Int. Conf. on Similarity Search and Applica-
tions, pages 29–41.
Chino, D. Y. T., Avalhais, L. P. S., Rodrigues Jr., J. F., and
Traina, A. J. M. (2015). Bowfire: detection of fire
in still images by integrating pixel color and texture
analysis. In Proc. 28th Conf. on Graphics, Patterns
and Images, pages 1–8.
Chum, O., Philbin, J., and Zisserman, A. (2008). Near du-
plicate image detection: min-hash and tf-idf weight-
ing. In British Machine Vision Conference, pages 1–
10.
Kasutani, E. and Yamada, A. (2001). The mpeg-7 color
layout descriptor: a compact image feature descrip-
tion for high-speed image/video segment retrieval. In
Proc. 8th Int. Conf. on Image Processing, pages 674–
677.
Li, J., Qian, X., Li, Q., Zhao, Y., Wang, L., and Tang, Y. Y.
(2015). Mining near-duplicate image groups. Multi-
media Tools and Applications, 74(2):655–669.
Searc
´
oid, M.
´
O. (2007). Metric spaces. Springer.
Silva, Y. N., Aref, W. G., Larson, P.-A., Pearson, S., and
Ali, M. H. (2013). Similarity queries: their concep-
tual evaluation, transformations, and processing. The
VLDB Journal, 22(3):395–420.
Sonka, M., Hlavac, V., and Boyle, R. (2014). Image
Processing, Analysis, and Machine Vision. Cengage
Learning.
Stricker, M. and Orengo, M. (1995). Similarity of color
images. In Proc. 3rd Conf. on Storage and Retrieval
for Image and Video Databases, pages 381–392.
Wang, X.-J., Zhang, L., and Ma, W.-Y. (2012). Duplicate
search based image annotation using web-scale data.
Proc. of the IEEE, 100(9):2705–2721.
Xiao, C., Wang, W., Lin, X., Yu, J. X., and Wang, G. (2011).
Efficient similarity joins for near-duplicate detection.
ACM Transactions on Database Systems, 36(3):15:1–
15:41.
Yao, J., Yang, B., and Zhu, Q. (2015). Near-duplicate image
retrieval based on contextual descriptor. IEEE Signal
Processing Letters, 22(9):1404–1408.
Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios
91