fier effectively predicted human activity from three at-
tributes including two previous activities and time of
day. We believe the combination of computer vision
and data analysis theory is beneficial to both fields.
In the future, we would like to include posture
and object information in activity recognition. With a
good understanding of these elements, activity recog-
nition and prediction can be improved. Moreover, we
would like to improve the approach for fine-grained
activity prediction by using a large number of classi-
fication methods for activity recognition.
REFERENCES
Aggarwal, J. K. and Ryoo, M. S. (2011). Human activity
analysis: A review. ACM Computing Survey.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and
Bray, C. (2004). Visual categorization with bags of
keypoints. European Conference on Computer Vision
Workshop (ECCVW).
Farneback, G. (2003). Two-frame motion estimation based
on polynomial expansion. Proceedings of the Scandi-
navian Conference on Image Analysis.
Jain, M., Gemert, J., and Snoek, C. G. M. (2014). University
of amsterdam at thumos challenge2014. World Health
Assembly.
Jain, M., Jegou, H., and Bouthemy, P. (2013). Better ex-
ploiting motion for better action recognition. IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Kataoka, H., Hashimoto, K., Iwata, K., Satoh, Y., Navab,
N., Ilic, S., and Aoki, Y. (2014a). Extended
co-occurrence hog with dense trajectories for fine-
grained activity recognition. Asian Conference on
Computer Vision (ACCV).
Kataoka, H., Tamura, K., Iwata, K., Satoh, Y., Matsui, Y.,
and Aoki, Y. (2014b). Extended feature descriptor and
vehicle motion model with tracking-by-detection for
pedestrian active safety. In IEICE Trans.
Kitani, K., Ziebart, B., J., A. B., and M., H. (2009). Ac-
tivity forecasting. European Conference on Computer
Vision (ECCV).
Klaser, A., Marszalek, M., and Schmid, C. (2008). A spatio-
temporal descriptor based on 3d-gradients. British
Machine Vision Conference (BMVC).
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. NIPS.
Laptev, I. (2005). On space-time interest points. Interna-
tional Journal of Computer Vision (IJCV).
Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld,
B. (2008). Learning realistic human actions from
movies. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
Li, B., Camps, O., and Sznaier, M. (2012). Cross-view ac-
tivity recognition using hankelets. IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Li, K., Hu, J., and Fu, Y. (2014). Prediction of human activ-
ity by discovering temporal sequence patterns. IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence (PAMI).
Moeslund, T. B., Hilton, A., Kruger, V., and L., S.
(2011). Visual analysis of humans: Looking at peo-
ple. Springer.
Niebles, J. C., Wang, H., and Fei-Fei, L. (2006). Unsu-
pervised learning of human action categories using
spatial-temporal words. British Machine Vision Con-
ference (BMVC).
Pellegrini, S., Ess, A., Schindler, K., and Gool, L. V. (2009).
You’ll never walk alone: Modeling social behavior for
multi-target tracking. IEEE International Conference
on Computer Vision (ICCV).
Peng, X., Qiao, Y., Peng, Q., and Qi, X. (2013). Exploring
motion boundary based sampling and spatial temporal
context descriptors for action recognition. In BMVC.
Perronnin, F., Sanchez, J., and Mensink, T. (2010). Im-
proving the fisher kernel for large-scale image classi-
fication. European Conference on Computer Vision
(ECCV).
Raptis, M., Kokkinos, I., and Soatto, S. (2013). Discover-
ing discriminative action parts from mid-level video
representation. IEEE Conference on Computer Vision
and Pattern Recognition (CVPR).
Rohrbach, M., Amin, S., M., A., and Schiele, B. (2012). A
database for fine grained activity detection of cooking
activities. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
Ryoo, M. S. (2011). Human activity prediction: Early
recognition of ongoing activities from streaming
videos. IEEE International Conference on Computer
Vision (ICCV).
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv technical report 1409.1556.
Wang, H., Klaser, A., and Schmid, C. (2013). Dense tra-
jectories and motion boundary descriptors for action
recognition. International Journal of Computer Vision
(IJCV).
Wang, H., Klaser, A., Schmid, C., and Liu, C. L. (2011).
Action recognition by dense trajectories. IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Wang, H. and Schmid, C. (2013). Action recognition with
improved trajectories. IEEE International Conference
on Computer Vision (ICCV).
Watanabe, T., Ito, S., and Yokoi, K. (2009). Co-occurrence
histograms of oriented gradients for pedestrian detec-
tion. PSIVT.
(WHO), W. H. O. (2001). The international classification of
functioning, disability and health (icf). World Health
Assembly.
Zinnen, A., Blanke, U., and Schiele, B. (2009). An analysis
of sensor-oriented vs. model - based activity recogni-
tion. In ISWC.