High Definition Visual Attention based Video Summarization

Yiming Qian, Matthew Kyan


A High Definition visual attention based video summarization algorithm is proposed to extract feature frames and create a video summary. It uses colour histogram shot detection algorithm to separate the video into shots, then applies a novel high definition visual attention algorithm to construct a saliency map for each frame. A multivariate mutual information algorithm is applied to select a feature frame to represent each shot. Finally, those feature frames are processed by a self-organizing map to remove the redundant frames. The algorithm was assessed against manual key frame summaries presented with tested datasets from www.open-video.org. Of the frames selected by the algorithm, 27.8% to 68.1% were in agreement with the manual frame summaries depending on the category and length of the video.


  1. Amiri, A., & Fathy, M. (2010). Hierarchical keyframebased video summarization using QR-decomposition and modified k-means clustering. EURASIP Journal on Advances in Signal Processing, 2010, 102.
  2. Ayadi, T., Ellouze, M., Hamdani, T. M., & Alimi, A. M. (2013). Movie scenes detection with MIGSOM based on shots semi-supervised clustering. Neural Computing and Applications, 1-10.
  3. Bailer, W., & Thallinger, G. (2009, May). Summarizing raw video material using Hidden Markov Models. In Image Analysis for Multimedia Interactive Services, 2009. WIAMIS'09. 10th Workshop on (pp. 53-56). IEEE.
  4. Calic, J., Gibson, D. P., & Campbell, N. W. (2007). Efficient layout of comic-like video summaries. Circuits and Systems for Video Technology, IEEE Transactions on, 17(7), 931-936.
  5. Cayllahua-Cahuina, E. J. Y., Cámara-Chávez, G., & Menotti, D. A (2012) Static Video Summarization Approach With Automatic Shot Detection Using Color Histograms.
  6. Chasanis, V., Likas, A., & Galatsanos, N. (2008, October). Video rushes summarization using spectral clustering and sequence alignment. In Proceedings of the 2nd ACM TRECVid Video Summarization Workshop (pp. 75-79). ACM.
  7. Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons.
  8. Ejaz, N., Mehmood, I., Ejaz, W., & Baik, S. W. (2012, September). Multi-scale Information Maximization Based Visual Attention Modeling for Video Summarization. In Next Generation Mobile Applications, Services and Technologies (NGMAST), 2012 6th International Conference on (pp. 48-52). IEEE.
  9. Evangelopoulos, G., Rapantzikos, K., Potamianos, A., Maragos, P., Zlatintsi, A., & Avrithis, Y. (2008, October). Movie summarization based on audiovisual saliency detection. In Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on (pp. 2528-2531). IEEE.
  10. Frintrop, S. (2011). Computational visual attention. In Computer Analysis of Human Behavior (pp. 69-101). Springer London.
  11. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(11), 1254-1259.
  12. Jiang, H., & Zhang, M. (2011, June). Tennis video shot classification based on support vector machine. In Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on (Vol. 2, pp. 757-761). IEEE.
  13. Koskela, M., Sjöberg, M., Laaksonen, J., Viitaniemi, V., & Muurinen, H. (2007, September). Rushes summarization with self-organizing maps. In Proceedings of the international workshop on TRECVID video summarization (pp. 45-49). ACM.
  14. Li, L., Zhang, X., Hu, W., Li, W., & Zhu, P. (2009). Soccer video shot classification based on color characterization using dominant sets clustering. In Advances in Multimedia Information Processing-PCM 2009 (pp. 923-929). Springer Berlin Heidelberg.
  15. Li, L., Zhou, K., Xue, G. R., Zha, H., & Yu, Y. (2011, March). Video summarization via transferrable structured learning. In Proceedings of the 20th international conference on World Wide Web (pp. 287-296). ACM.
  16. Liu, C., Huang, Q., Jiang, S., Xing, L., Ye, Q., & Gao, W. (2009). A framework for flexible summarization of racquet sports video using multiple modalities. Computer Vision and Image Understanding, 113(3), 415-424.
  17. Luo, M. R., Cui, G., & Rigg, B. (2001). The development of the CIE 2000 colour difference formula: CIEDE2000. Color Research & Application, 26(5), 340-350.
  18. Longfei, Z., Yuanda, C., Gangyi, D., & Yong, W. (2008, December). A computable visual attention model for video skimming. In Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on (pp. 667- 672).
  19. Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. Multimedia, IEEE Transactions on, 7(5), 907-919.
  20. Marchionini, G., Wildemuth, B. M., & Geisler, G. (2006). The open video digital library: A möbius strip of research and practice. Journal of the American Society for Information Science and Technology, 57(12), 1629-1643.
  21. Millward, S. (2009). Color Difference Equations and Their Assessment. Test Targets, 19.
  22. Peng, J., & Xiao-Lin, Q. (2010). Keyframe-based video summary using visual attention clues. IEEE MultiMedia, 64-73.
  23. Saber, Y. (2011). High-definition human visual attention mapping using wavelets.
  24. Sharma, G., Wu, W., Dalal, E. N., & Celik, M. U. (2004). Mathematical discontinuities in CIEDE2000 color difference computations. In Color and Imaging Conference (Vol. 2004, No. 1, pp. 334-339). Society for Imaging Science and Technology.
  25. Sun, S. G., & Kwak, D. M. (2006). Automatic detection of targets using center-surround difference and local thresholding. Journal of Multimedia, 1(1), 16-23.
  26. Tabrizi, Z. Z., Bidgoli, B. M., & Fathi, M. (2009, October). Video summarization using genetic algorithm and information theory. In Computer Conference, 2009. CSICC 2009. 14th International CSI (pp. 158-163). IEEE.
  27. X-Rite, Incorporated (2007). A Guide to Understanding Color Communication.
  28. Yin, H. (2008). The self-organizing maps: Background, theories, extensions and applications. In Computational intelligence: a compendium (pp. 715- 762). Springer Berlin Heidelberg.
  29. Yusoff, Y., Christmas, W. J., & Kittler, J. (2000, September). Video Shot Cut Detection using Adaptive Thresholding. In BMVC (pp. 1-10).
  30. Zawbaa, H. M., El-Bendary, N., Hassanien, A. E., & Abraham, A. (2011, October). SVM-based soccer video summarization system. In Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress on (pp. 7-11). IEEE.

Paper Citation

in Harvard Style

Qian Y. and Kyan M. (2014). High Definition Visual Attention based Video Summarization . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-003-1, pages 634-640. DOI: 10.5220/0004742206340640

in Bibtex Style

author={Yiming Qian and Matthew Kyan},
title={High Definition Visual Attention based Video Summarization},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)},

in EndNote Style

JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)
TI - High Definition Visual Attention based Video Summarization
SN - 978-989-758-003-1
AU - Qian Y.
AU - Kyan M.
PY - 2014
SP - 634
EP - 640
DO - 10.5220/0004742206340640