Small Vocabulary with Saliency Matching for Video Copy Detection

Huamin Ren, Thomas B. Moeslund, Sheng Tang, Heri Ramampiaro


The importance of copy detection has led to a substantial amount of research in recent years, among which Bag of visual Words (BoW) plays an important role due to its ability to effectively handling occlusion and some minor transformations. One crucial issue in BoW approaches is the size of vocabulary. BoW descriptors under a small vocabulary can be both robust and efficient, while keeping high recall rate compared with large vocabulary. However, the high false positives exists in small vocabulary also limits its application. To address this problem in small vocabulary, we propose a novel matching algorithm based on salient visual words selection. More specifically, the variation of visual words across a given video are represented as trajectories and those containing locally asymptotically stable points are selected as salient visual words. Then we attempt to measure the similarity of two videos through saliency matching merely based on the selected salient visual words to remove false positives. Our experiments show that a small codebook with saliency matching is quite competitive in video copy detection. With the incorporation of the proposed saliency matching, the precision can be improved by 30% on average compared with the state-of-the-art technique. Moreover, our proposed method is capable of detecting severe transformations, e.g. picture in picture and post production.


  1. Douze, M., Jégou, H., Schmid, C., and Pérez, P. (2010). Compact video description for copy detection with precise temporal alignment. In Proceedings of the 11th European conference on Computer vision: Part I, ECCV'10. Springer-Verlag.
  2. Gengembre, N. and Berrani, S. (2008). A probabilistic framework for fusing frame-based searches within a video copy detection system. In Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM.
  3. Jegou, H., Douze, M., and Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I. Springer-Verlag.
  4. Jiang, Y., J. Yang, C. N., and Hauptmann, A. (2010). Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia, 12.
  5. Kim, C. and Vasudev, B. (2005). Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans. Circuits Syst. Video Techn.
  6. Law-To, J., Chen, L., Joly, A., Laptev, I., Buisson, O., Gouet-Brunet, V., Boujemaa, N., and Stentiford, F. (2007). Video copy detection: a comparative study. In CIVR. ACM.
  7. Li, D., Yang, L., Hua, X., and Zhang, H. (2010). Largescale robust visual codebook construction. In Proceedings of the international conference on Multimedia. ACM.
  8. Liu, D., Hua, G., Paul, A., and Tsuhan, C. (2008). Integrated feature selection and higher-order spatial feature extraction for object categorization. In CVPR. IEEE Computer Society.
  9. Mallapragada, P., Jin, R., and Jain, A. (2010). Online visual vocabulary pruning using pairwise constraints. In CVPR'10. IEEE Computer Society.
  10. Nistér, D. and Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (2). IEEE Computer Society.
  11. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR. IEEE Computer Society.
  12. Philbin, J., Isard, M., Sivic, J., and Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In In CVPR.
  13. Poullot, S., Buisson, O., and Crucianu, M. (2010). Scaling content-based video copy detection to very large databases. Multimedia Tools Appl.
  14. Ren, H., Ramampiaro, H., Zhang, Y., and Lin, S. (2012). An incremental clustering based codebook construction in video copy detection. In 2012 IEEE Southwest Symposium on Image Analysis and Interpretation. IEEE.
  15. Wang, L. (2007). Toward a discriminative codebook: Codeword selection across multi-resolution. In CVPR. IEEE Computer Society.
  16. Zhang, L., Chen, C., Bu, J., Chen, Z., Tan, S., and He, X. (2010). Discriminative codeword selection for image representation. In ACM Multimedia. ACM.

Paper Citation

in Harvard Style

Ren H., Moeslund T., Tang S. and Ramampiaro H. (2013). Small Vocabulary with Saliency Matching for Video Copy Detection . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 768-773. DOI: 10.5220/0004280207680773

in Bibtex Style

author={Huamin Ren and Thomas B. Moeslund and Sheng Tang and Heri Ramampiaro},
title={Small Vocabulary with Saliency Matching for Video Copy Detection},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},

in EndNote Style

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Small Vocabulary with Saliency Matching for Video Copy Detection
SN - 978-989-8565-47-1
AU - Ren H.
AU - Moeslund T.
AU - Tang S.
AU - Ramampiaro H.
PY - 2013
SP - 768
EP - 773
DO - 10.5220/0004280207680773