Learning to Predict Video Saliency using Temporal Superpixels

Anurag Singh, Chee-Hung Henry Chu, Michael A. Pratt

Abstract

Visual Saliency of a video sequence can be computed by combining spatial and temporal features that attract a user’s attention to a group of pixels. We present a method that computes video saliency by integrating these features: color dissimilarity, objectness measure, motion difference, and boundary score. We use temporal clusters of pixels, or temporal superpixels, to simulate attention associated with a group of moving pixels in a video sequence. The features are combined using weights learned by a linear support vector machine in an online fashion. The temporal linkage for superpixels is then used to find the saliency flow across the image frames. We experimentally demonstrate the efficacy of the proposed method and that the method has better performance when compared to state-of-the-art methods.

References

  1. Alexe, B., Deselaers, T., and Ferrari, V., 2012. Measuring the objectness of image windows. IEEE Transactions on PAMI, vol. 34, no. 11, pp. 2189-2202.
  2. Borji, A., Sihite, D.N., and Itti, L., 2012. Salient object detection: A benchmark. In ECCV, pp. 414-429.
  3. Chang, J., Wei, D., and Fisher, J.W., 2013. A video representation using temporal superpixels. In IEEE CVPR, pp. 2051-2058.
  4. Chang Y., and Lin, C.-J., 2008. Feature ranking using linear SVM. JMLR Workshop and Conference Proceedings, vol. 3, pp. 53-64.
  5. Cheng, M.-M., Zhang, G.-X., Mitra, N.J., Huang, X., and Hu, S.-M., 2011. Global contrast based salient region detection. In IEEE CVPR, pp.409-416.
  6. Fukuchi, K., Miyazato, K., Kimura, A., Takagi S., and Yamato, J., 2009. Saliency-based video segmentation with graph cuts and sequentially updated priors. In ICME, pp.638-641.
  7. Goferman, S., Zelnik-Manor, L., and Tal, A., 2010. Context-aware saliency detection. In IEEE CVPR, pp. 2376-2383.
  8. Grundmann, M., Kwatra, V., Han, M. and Essa, I., 2010. Efficient hierarchical graph-based video segmentation. In IEEE CVPR, pp. 2141-2148.
  9. Harel, J., Koch, C., and Perona, P., 2007. Graph-Based Visual Saliency. In NIPS, pp. 545-552.
  10. Itti L., and Baldi, P. 2005. A principled approach to detecting surprising events in video. In IEEE CVPR, pp. 631-637.
  11. Jiang, H., Wang, J., Yuan, Z., Liu, T., Zheng, N., and Li. S., 2011. Automatic salient object segmentation based on context and shape prior. In BMVC, pp 7
  12. Karampatziakis, N., and Langford, J. 2010. Online importance weight aware updates. In UAI, pp 392-399.
  13. Koffka, K., 1955. Principles of Gestalt Psychology. Routledge & Kegan Paul.
  14. Mahadevan, V., and Vasconcelos, N., 2010. Spatiotemporal saliency in dynamic scenes. IEEE Transactions on PAMI, vol. 32, no. 1, pp. 171-177.
  15. Mancas, M., Riche, N., Leroy, J., and Gosselin, B., 2011. Abnormal motion selection in crowds using bottom-up saliency. In IEEE ICIP, pp. 229-232.
  16. Mital, P.K., Smith, T.J., Hill, R.L., and Henderson, J.M., 2011. Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, vol. 3, no. 1, pp. 5-24.
  17. Paris S., and Durand, F., 2007. A topological approach to hierarchical segmentation using mean shift. In IEEE CVPR, pp. 1-8.
  18. Rahtu, E,. Kannala, J., Salo, M., and Heikkilä, J., 2010. Segmenting salient objects from images and videos. In ECCV, pp. 366-379.
  19. Ren, X., and Bo, L., 2012. Discriminatively trained sparse code gradients for contour detection. In NIPS, pp. 584- 592.
  20. Ren, X., and Malik, J., 2003. Learning a classification model for segmentation. In IEEE ICCV, pp. 10-17.
  21. Reso, M., Jachalsky, J., Rosenhahn, B., and Ostermann, J., 2013. Temporally consistent superpixels. In IEEE ICCV, pp. 385-392.
  22. Rudoy, D., Goldman, D.B., Shechtman, E., and ZelnikManor, L., 2013. Learning video saliency from human gaze using candidate selection. In IEEE CVPR, pp. 1147-1154.
  23. Sharon, E., Galun, M., Sharon, D., Basri, R., and Brandt, A., 2006. Hierarchy and adaptivity in segmenting visual scenes. Nature, vol. 442, no. 7104, pp.719-846.
  24. Singh, A., Chu, C.H., Pratt, M.A., 2014. Multiresolution superpixels for visual saliency detection. In IEEE Symposium on Computational Intelligence for Multimedia, Signal, and Vision Processing.
  25. Sun, J., and Ling, H., 2013. Scale and object aware image thumbnailing. International Journal of Computer Vision, vol. 104, no. 2, pp. 135-153.
  26. Sun, D., Roth, S., and Black, M.J., 2010. Secrets of optical flow estimation and their principles. In IEEE CVPR, pp. 2432-2439.
  27. Treisman, A.M., and Gelade. G., 1980. A featureintegration theory of attention. Cognitive Psychology, vol 12, no. 1, pp 97-136.
  28. Tsai, D., Flagg, M., Nakazawa, A., and Rehg, J.M., 2012. Motion coherent tracking using multi-label MRF optimization. International Journal of Computer Vision vol. 100, no.2, pp. 190-202.
  29. Van den Bergh, M., Roig, G., Boix, X., Manen, S., and Van Gool, L., 2013. Online video seeds for temporal window objectness. In IEEE ICCV, pp. 377-384.
  30. Veksler, O., Boykov, Y., and Mehrani, P., 2010. Superpixels and supervoxels In an energy optimization framework. In ECCV, pp. 211-224.
  31. Xu, C., and Corso, J.J., 2012. Evaluation of super-voxel methods for early video processing. In IEEE CVPR, pp. 1202-1209.
Download


Paper Citation


in Harvard Style

Singh A., Henry Chu C. and A. Pratt M. (2015). Learning to Predict Video Saliency using Temporal Superpixels . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-758-077-2, pages 201-209. DOI: 10.5220/0005206402010209


in Bibtex Style

@conference{icpram15,
author={Anurag Singh and Chee-Hung Henry Chu and Michael A. Pratt},
title={Learning to Predict Video Saliency using Temporal Superpixels},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2015},
pages={201-209},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005206402010209},
isbn={978-989-758-077-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - Learning to Predict Video Saliency using Temporal Superpixels
SN - 978-989-758-077-2
AU - Singh A.
AU - Henry Chu C.
AU - A. Pratt M.
PY - 2015
SP - 201
EP - 209
DO - 10.5220/0005206402010209