Video Object Recognition and Modeling by SIFT Matching Optimization

Alessandro Bruno, Luca Greco, Marco La Cascia


In this paper we present a novel technique for object modeling and object recognition in video. Given a set of videos containing 360 degrees views of objects we compute a model for each object, then we analyze short videos to determine if the object depicted in the video is one of the modeled objects. The object model is built from a video spanning a 360 degree view of the object taken against a uniform background. In order to create the object model, the proposed techniques selects a few representative frames from each video and local features of such frames. The object recognition is performed selecting a few frames from the query video, extracting local features from each frame and looking for matches in all the representative frames constituting the models of all the objects. If the number of matches exceed a fixed threshold the corresponding object is considered the recognized objects .To evaluate our approach we acquired a dataset of 25 videos representing 25 different objects and used these videos to build the objects model. Then we took 25 test videos containing only one of the known objects and 5 videos containing only unknown objects. Experiments showed that, despite a significant compression in the model, recognition results are satisfactory.


  1. Li, Z. N., Zaiane, O. R., Tauber, Z., 1999. Illumination Invariance and Object Model Content-Based Image and Video Retrieval. In Journal of Visual Communication and Image Representation, vol 10, pp 219-224.
  2. Z. Li and B. Yan., 1996 Recognition Kernel for contentbased search. In Proc. IEEE Conf. on Systems, Man, and Cybernetics, pages 472-477.
  3. Day, Y. F., Dagtas, S., Iino, M., Khokhar, A., Ghafoor, A., 1995. Object-oriented conceptual modeling of video data. In Proceedings of the Eleventh International Conference on Data Engineering.
  4. Chen, L., Ozsu, M. T., 2002. Modeling of video objects in a video databases. In Proceedings of IEEE International Conference on Multimedia and Expo.
  5. Sivic, J., Zisserman, A., 2006. Video Google: Efficient visual search of videos. In Toward Category-Level Object Recognition, pp. 127-144, Springer.
  6. Vedaldi, A., Fulkerson, B., 2010. VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the International Conference on Multimedia.
  7. Kavitha, G., Chandra, M. D., Shanmugan, J., 2007. Video Object Extraction Using Model Matching Technique: A Novel Approach. In 14th IWSSIP, 2007 and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services, pp. 118-121.
  8. Mundy, Joseph L. 2006. Object recognition in the geometric era: A retrospective. Toward category-level object recognition. pp.3-28.
  9. Lowe, D.G., 2004. Distinctive Image Features from ScaleInvariant Keypoints, In International Journal of Computer Vision n. 60 vol.2 pp. 91-110, Springer.
  10. Turk, M., Pentland, A., 1991. Eigenfaces for recognition. In Journal of cognitive neuroscience vol.3, n.1, pp. 71- 86, MIT press.
  11. Zhao, L. W., Luo, S. W., Liao, L. Z., 2004. 3D object recognition and pose estimation using kernel PCA. In Proceedings of 2004 International Conference on Machine Learning and Cybernetics.
  12. Wang, X. Z., Zhang, S. F., Li, J., 2007. View-based 3D object recognition using wavelet multiscale singularvalue decomposition and support vector machine. In ICWAPR.
  13. Pontil, M., Verri, A., 1998. Support vector machines for 3D object recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20 n.6, pp. 637-646.
  14. Murase, H., Nayar, S. K., 1995. Visual learning and recognition of 3-D objects from appearance. In International journal of computer vision, vol.14 n.1, pp. 5-24. Springer.
  15. Lowe, D. G., 1999. Object recognition from local scaleinvariant features. In . The proceedings of the seventh IEEE international conference on Computer vision.
  16. Chang, P., Krumm, J., 1999. Object recognition with color cooccurrence histograms. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  17. Wu, Y. J., Wang, X. M., Shang, F. H., 2011. Study on 3D Object Recognition Based on KPCA-SVM. In International Conference on Information and Intelligent Computing, vol.18 pp. 55-60. IACSIT Press, Singapore.
  18. Fischler, Martin A and Bolles, Robert C.,1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography In Communications of the ACM, vol. 24, num.6, pp. 381-395.
  19. Kovesi, P., 2003. MATLAB and Octave Functions for Computer Vision and Image Processing. [online] Available at: <> [Accessed September 2013]
  20. Jinda-Apiraksa, A., Vonikakis, V., Winkler, S., 2013. California-ND: An annotated dataset for nearduplicate detection in personal photo collections. In Proceedings of 5th International Workshop on Quality of Multimedia Experience (QoMEX), Klagenfurt, Austria.
  21. CVIPLab, 2013. Computer Vision & Image Processing Lab, Università degli studi di Palermo Available at: < mCVCFxGQ>
  22. Dong, W., Wang, Z., Charikar, M., Li, K., 2012. Highconfidence near-duplicate image detection. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval.
  23. Chau, D. P., Bremond, F., Thonnat, M., 2013. Object Tracking in Videos: Approaches and Issues. arXiv preprint arXiv:1304.5212.

Paper Citation

in Harvard Style

Bruno A., Greco L. and La Cascia M. (2014). Video Object Recognition and Modeling by SIFT Matching Optimization . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 662-670. DOI: 10.5220/0004828006620670

in Bibtex Style

author={Alessandro Bruno and Luca Greco and Marco La Cascia},
title={Video Object Recognition and Modeling by SIFT Matching Optimization},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

in EndNote Style

JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Video Object Recognition and Modeling by SIFT Matching Optimization
SN - 978-989-758-018-5
AU - Bruno A.
AU - Greco L.
AU - La Cascia M.
PY - 2014
SP - 662
EP - 670
DO - 10.5220/0004828006620670