Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios

Luiz Olmes Carvalho, Lucio F. D. Santos, Willian D. Oliveira, Agma J. M. Traina, Caetano Traina Jr.

2016

Abstract

Crowdsourcing information is being increasingly employed to improve and support decision making in emergency situations. However, the gathered records quickly become too similar among themselves and handling several similar reports does not add valuable knowledge to assist the helping personnel at the control center in their decision making tasks. The usual approaches to detect and handle the so-called near-duplicate data rely on costly twofold processing. Aimed at reducing the cost and also improving the ability of duplication detection, we developed a framework model based on the similarity wide-join database operator. We extended the wide-join definition empowering it to surpass its restrictions and accomplish the near-duplicate task too. In this paper, we also provide an efficient algorithm based on pivots that speeds up the entire process, which enables retrieving the top similar elements in a single-pass processing. Experiments using real datasets show that our framework is up to three orders of magnitude faster than the competing techniques in the literature, whereas also improving the quality of the result in about 35 percent.

References

  1. Bangay, S. and Lv, O. (2012). Evaluating locality sensitive hashing for matching partial image patches in a social media setting. Journal of Multimedia, 1(9):14-24.
  2. Carvalho, L. O., Santos, L. F. D., Oliveira, W. D., Traina, A. J. M., and Traina Jr., C. (2015). Similarity joins and beyond: an extended set of operators with order. In Proc. 8th Int. Conf. on Similarity Search and Applications, pages 29-41.
  3. Chino, D. Y. T., Avalhais, L. P. S., Rodrigues Jr., J. F., and Traina, A. J. M. (2015). Bowfire: detection of fire in still images by integrating pixel color and texture analysis. In Proc. 28th Conf. on Graphics, Patterns and Images, pages 1-8.
  4. Chum, O., Philbin, J., and Zisserman, A. (2008). Near duplicate image detection: min-hash and tf-idf weighting. In British Machine Vision Conference, pages 1- 10.
  5. Kasutani, E. and Yamada, A. (2001). The mpeg-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In Proc. 8th Int. Conf. on Image Processing, pages 674- 677.
  6. Li, J., Qian, X., Li, Q., Zhao, Y., Wang, L., and Tang, Y. Y. (2015). Mining near-duplicate image groups. Multimedia Tools and Applications, 74(2):655-669.
  7. Searcóid, M. O. (2007). Metric spaces. Springer.
  8. Silva, Y. N., Aref, W. G., Larson, P.-A., Pearson, S., and Ali, M. H. (2013). Similarity queries: their conceptual evaluation, transformations, and processing. The VLDB Journal, 22(3):395-420.
  9. Sonka, M., Hlavac, V., and Boyle, R. (2014). Image Processing, Analysis, and Machine Vision. Cengage Learning.
  10. Stricker, M. and Orengo, M. (1995). Similarity of color images. In Proc. 3rd Conf. on Storage and Retrieval for Image and Video Databases, pages 381-392.
  11. Wang, X.-J., Zhang, L., and Ma, W.-Y. (2012). Duplicate search based image annotation using web-scale data. Proc. of the IEEE, 100(9):2705-2721.
  12. Xiao, C., Wang, W., Lin, X., Yu, J. X., and Wang, G. (2011). Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 36(3):15:1- 15:41.
  13. Yao, J., Yang, B., and Zhu, Q. (2015). Near-duplicate image retrieval based on contextual descriptor. IEEE Signal Processing Letters, 22(9):1404-1408.
Download


Paper Citation


in Harvard Style

Carvalho L., Santos L., Oliveira W., Traina A. and Jr. C. (2016). Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios . In Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-187-8, pages 81-91. DOI: 10.5220/0005868900810091


in Bibtex Style

@conference{iceis16,
author={Luiz Olmes Carvalho and Lucio F. D. Santos and Willian D. Oliveira and Agma J. M. Traina and Caetano Traina Jr.},
title={Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios},
booktitle={Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2016},
pages={81-91},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005868900810091},
isbn={978-989-758-187-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios
SN - 978-989-758-187-8
AU - Carvalho L.
AU - Santos L.
AU - Oliveira W.
AU - Traina A.
AU - Jr. C.
PY - 2016
SP - 81
EP - 91
DO - 10.5220/0005868900810091