Activity Prediction using a Space-Time CNN and Bayesian Framework

Hirokatsu Kataoka; Yoshimitsu Aoki; Kenji Iwata; Yutaka Satoh

doi:10.5220/0005671704610469

Activity Prediction using a Space-Time CNN and Bayesian Framework

Hirokatsu Kataoka, Yoshimitsu Aoki, Kenji Iwata, Yutaka Satoh

2016

Abstract

We present a technique to address the new challenge of activity prediction in computer vision field. In activity prediction, we infer the next human activity through "classified activities" and "activity data analysis.” Moreover, the prediction should be processed in real-time to avoid dangerous or anomalous activities. The combination of space--time convolutional neural networks (ST-CNN) and improved dense trajectories (iDT) are able to effectively understand human activities in image sequences. After categorizing human activities, we insert activity tags into an activity database in order to sample a distribution of human activity. A naive Bayes classifier allows us to achieve real-time activity prediction because only three elements are needed for parameter estimation. The contributions of this paper are: (i) activity prediction within a Bayesian framework and (ii) ST-CNN and iDT features for activity recognition. Moreover, human activity prediction in real-scenes is achieved with 81.0% accuracy.

References

Aggarwal, J. K. and Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Survey.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. European Conference on Computer Vision Workshop (ECCVW).
Farneback, G. (2003). Two-frame motion estimation based on polynomial expansion. Proceedings of the Scandinavian Conference on Image Analysis.
Jain, M., Gemert, J., and Snoek, C. G. M. (2014). University of amsterdam at thumos challenge2014. World Health Assembly.
Jain, M., Jegou, H., and Bouthemy, P. (2013). Better exploiting motion for better action recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Kataoka, H., Hashimoto, K., Iwata, K., Satoh, Y., Navab, N., Ilic, S., and Aoki, Y. (2014a). Extended co-occurrence hog with dense trajectories for finegrained activity recognition. Asian Conference on Computer Vision (ACCV).
Kataoka, H., Tamura, K., Iwata, K., Satoh, Y., Matsui, Y., and Aoki, Y. (2014b). Extended feature descriptor and vehicle motion model with tracking-by-detection for pedestrian active safety. In IEICE Trans.
Kitani, K., Ziebart, B., J., A. B., and M., H. (2009). Activity forecasting. European Conference on Computer Vision (ECCV).
Klaser, A., Marszalek, M., and Schmid, C. (2008). A spatiotemporal descriptor based on 3d-gradients. British Machine Vision Conference (BMVC).
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. NIPS.
Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision (IJCV).
Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Li, B., Camps, O., and Sznaier, M. (2012). Cross-view activity recognition using hankelets. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Li, K., Hu, J., and Fu, Y. (2014). Prediction of human activity by discovering temporal sequence patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
Moeslund, T. B., Hilton, A., Kruger, V., and L., S. (2011). Visual analysis of humans: Looking at people. Springer.
Niebles, J. C., Wang, H., and Fei-Fei, L. (2006). Unsupervised learning of human action categories using spatial-temporal words. British Machine Vision Conference (BMVC).
Pellegrini, S., Ess, A., Schindler, K., and Gool, L. V. (2009). You'll never walk alone: Modeling social behavior for multi-target tracking. IEEE International Conference on Computer Vision (ICCV).
Peng, X., Qiao, Y., Peng, Q., and Qi, X. (2013). Exploring motion boundary based sampling and spatial temporal context descriptors for action recognition. In BMVC.
Perronnin, F., Sanchez, J., and Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. European Conference on Computer Vision (ECCV).
Raptis, M., Kokkinos, I., and Soatto, S. (2013). Discovering discriminative action parts from mid-level video representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Rohrbach, M., Amin, S., M., A., and Schiele, B. (2012). A database for fine grained activity detection of cooking activities. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Ryoo, M. S. (2011). Human activity prediction: Early recognition of ongoing activities from streaming videos. IEEE International Conference on Computer Vision (ICCV).
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv technical report 1409.1556.
Wang, H., Klaser, A., and Schmid, C. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision (IJCV).
Wang, H., Klaser, A., Schmid, C., and Liu, C. L. (2011). Action recognition by dense trajectories. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Wang, H. and Schmid, C. (2013). Action recognition with improved trajectories. IEEE International Conference on Computer Vision (ICCV).
Watanabe, T., Ito, S., and Yokoi, K. (2009). Co-occurrence histograms of oriented gradients for pedestrian detection. PSIVT.
(WHO), W. H. O. (2001). The international classification of functioning, disability and health (icf). World Health Assembly.
Zinnen, A., Blanke, U., and Schiele, B. (2009). An analysis of sensor-oriented vs. model - based activity recognition. In ISWC.

Download

Paper Citation

in Harvard Style

Kataoka H., Aoki Y., Iwata K. and Satoh Y. (2016). Activity Prediction using a Space-Time CNN and Bayesian Framework . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 461-469. DOI: 10.5220/0005671704610469

in Bibtex Style

@conference{visapp16,
author={Hirokatsu Kataoka and Yoshimitsu Aoki and Kenji Iwata and Yutaka Satoh},
title={Activity Prediction using a Space-Time CNN and Bayesian Framework},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={461-469},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005671704610469},
isbn={978-989-758-175-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Activity Prediction using a Space-Time CNN and Bayesian Framework
SN - 978-989-758-175-5
AU - Kataoka H.
AU - Aoki Y.
AU - Iwata K.
AU - Satoh Y.
PY - 2016
SP - 461
EP - 469
DO - 10.5220/0005671704610469