Video Shot Boundary Detection using Visual Bag-of-Words

Jukka Lankinen, Joni-Kristian Kämäräinen



Recently, convergence of techniques used in image analysis and video processing has occurred. Many computation and memory intensive image analysis methods have become available for per frame processing of videos due to increased computing power of desktop computers and efficient implementations on multiple cores and graphical processing units (GPUs). As our main contribution in this work, we solve the problem of shot boundary detection using a popular image analysis (object detection) approach: visual bag-of-words (BoW). The baseline approach for the shot boundary detection has been colour histogram and it is at the core of many top methods, but our BoW method of similar complexity in the terms of parameters clearly outperforms colour histograms. Interestingly, an “AND-combination” of colour and BoW histogram detection is clearly superior indicating that colour and local features provide complimentary information for video analysis.


  1. Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010). Spatial bag-of-features. In CVPR.
  2. Csurka, G., Dance, C., Willamowski, J., Fan, L., and Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning in Computer Vision.
  3. Deng, J., Berg, A., Li, K., and Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In ECCV.
  4. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2011). The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. /voc2011/workshop/index.html.
  5. Gargi, U., Kasturi, R., and Strayer, S. H. (2000). Performance characterization of video-shot-change detection methods. IEEE Trans. Circuits Syst. Video Techn., 10(1):1-13.
  6. Joyce, R. A. and Liu, B. (2006). Temporal segmentation of video using frame and histogram space. IEEE Transactions on Multimedia, 8(1):130-140.
  7. Kang, H.-W. and Hua, X.-S. (2005). To learn representativeness of video frames. In ACM international conference on Multimedia.
  8. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
  9. Leibe, B., Ettlin, A., and Schiele, B. (2008). Learning semantic object parts for object categorization. Image and Vision Computing, 26(1):15-26.
  10. Li, J., Ding, Y., Shi, Y., and Li, W. (2010). A divideand-rule scheme for shot boundary detection based on SIFT. Int. J. of Digital Content Technology and Its Applications, 4(3).
  11. Mas, J. and Fernandez, G. (2003). Video shot boundary detection based on color histogram. In TRECVid Workshop.
  12. Pruteanu-Malinici, I. and Carin, L. (2008). Infinite Hidden Markov Models for Unusual-Event Detection in Video. IEEE Trans. on Image Processing, 17(5):811- 822.
  13. Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
  14. Smeaton, A., Over, P., and Doherty, A. (2010). Video shot boundary detection: Seven years of TRECVid activity. Computer Vision and Image Understanding, 114:411-418.
  15. Tahaghoghi, S., Williams, H., Thom, J., and Volkmer, T. (2005). Video cut detection using frame windows. In Australasian Computer Science Conference.
  16. Truong, B. and Venkatesh, S. (2007). Video abstraction: A systematic review and classification. ACM Trans. on Multimedia Computing, Communications and Applications (ACM TOMCCAP), 3(1).
  17. Tuytelaars, T., Lampert, C., Blaschko, M., and Buntine, W. (2010). Unsupervised object discovery: A comparison. Int J Comput Vis, 88(2).

Paper Citation

in Harvard Style

Lankinen J. and Kämäräinen J. (2013). Video Shot Boundary Detection using Visual Bag-of-Words . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 788-791. DOI: 10.5220/0004290707880791

in Bibtex Style

author={Jukka Lankinen and Joni-Kristian Kämäräinen},
title={Video Shot Boundary Detection using Visual Bag-of-Words},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},

in EndNote Style

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Video Shot Boundary Detection using Visual Bag-of-Words
SN - 978-989-8565-47-1
AU - Lankinen J.
AU - Kämäräinen J.
PY - 2013
SP - 788
EP - 791
DO - 10.5220/0004290707880791