Human Activity Recognition and Prediction

David Jardim, Luis Nunes, Miguel Sales Dias


Human activity recognition (HAR) has become one of the most active research topics in image processing and pattern recognition. Detecting specific activities in a live feed or searching in video archives still relies almost completely on human resources. Detecting multiple activities in real-time video feeds is currently performed by assigning multiple analysts to simultaneously watch the same video stream. Manual analysis of video is labour intensive, fatiguing, and error prone. Solving the problem of recognizing human activities from video can lead to improvements in several applications fields like in surveillance systems, human computer interfaces, sports video analysis, digital shopping assistants, video retrieval, gaming and health-care. This area has grown dramatically in the past 10 years, and throughout our research we identified a potentially underexplored sub-area: Action Prediction. What if we could infer the future actions of people from visual input? We propose to expand the current vision-based activity analysis to a level where it is possible to predict the future actions executed by a subject. We are interested in interactions which can involve a single actor, two humans and/or simple objects. For example try to predict if “a person will cross the street” or “a person will try to steal a hand-bag from another” or were will a tenis-player target the next voley. Using a hierarchical approach we intend to represent high-level human activities that are composed of other simpler activities, which are usually called sub-events which may themselves be decomposable. We expect to develop a system capable of predicting the next action in a sequence initially using offline-learning and then with self-improvement/task specialization in mind, using online-learning.


  1. Aggarwal, J. K., and Ryoo, M. S. (2011). Human activity analysis. ACM Computing Surveys, 43(3), 1-43. doi:10.1145/1922649.1922653.
  2. Allen, J. F., and Allen, J. F. (1983). Maintaining Knowledge about Temporal Intervals, 26(11), 832- 843.
  3. Bobick, A.F. Wilson, A.D., A state-based approach to the representation and recognition of gesture, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.19, no.12, pp.1325-1337, Dec 1997 doi: 10.1109/34.643892.
  4. C Wolf, J. Mille, L.E Lombardi, O. Celiktutan, M. Jiu, M. Baccouche, E Dellandrea, C.-E. Bichot, C. Garcia, B. Sankur, The LIRIS Human activities dataset and the ICPR 2012 human activities recognition and localization competition, Technical Report RR-LIRIS2012-004, LIRIS Laboratory, March 28th, 2012.
  5. CMU Graphics Lab Motion Capture Database, Available: Last accessed 14th August 2012.
  6. Damen, D. Hogg, D., Recognizing linked events: Searching the space of feasible explanations, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , vol., no., pp.927- 934, 20-25 June 2009 doi: 10.1109/CVPR.2009. 5206636.
  7. Dariu M. Gavrila, The visual analysis of human movement: a survey, Computer Vision and Image Understanding (CVIU) 73 (1) (1999) 82-92.
  8. Gupta, A., Srinivasan, P., Jianbo Shi; Davis, L.S., Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , vol., no., pp.2012-2019, 20-25 June 2009 doi: 10.1109/CVPR. 2009.5206492.
  9. Hawkins, K., and Vo, N. (2013). Probabilistic human action prediction and wait-sensitive planning for responsive human-robot collaboration. Proceedings of the IEEE.
  10. Hoai, M., and De la Torre, F. (2013). Max-Margin Early Event Detectors. International Journal of Computer Vision, 107(2), 191-202. doi:10.1007/s11263-013- 0683-3.
  11. ICPR - HARL 2012 (Human activities recognition and localization competition), Available: harl2012/. Last accessed 24th September 2012.
  12. Intille, S. S., and Bobick, A. F. (1999). A Framework for Recognizing Multi-Agent Action from Visual Evidence, (489), 1-7.
  13. Ivanov, Y.A. Bobick, A.F., Recognition of visual activities and interactions by stochastic parsing, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.22, no.8, pp.852-872, Aug 2000 doi: 10.1109/ 34.868686.
  14. Kalman, R. E., and Bucy, R. S. (1961). New Results in Linear Filtering and Prediction Theory. Journal of Basic Engineering, 83(1), 95. doi:10.1115/1.3658902.
  15. Keller, C. G., Dang, T., Fritz, H., Joos, A., Rabe, C., and Gavrila, D. M. (2011). Active Pedestrian Safety by Automatic Braking and Evasive Steering. IEEE Transactions on Intelligent Transportation Systems, 12(4), 1292-1304. doi:10.1109/TITS.2011.2158424.
  16. Kitani, K. M., Ziebart, B. D., Bagnell, J. A., and Hebert, M. (n.d.). Activity Forecasting, 1-14.
  17. Koppula, H. S. (2013). Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation, 28.
  18. Koppula, H., and Saxena, A. (2013). Anticipating Human Activities using Object Affordances for Reactive Robotic Response. Robotics: Science and Systems.
  19. Li, K., Hu, J., and Fu, Y. (2012). Modeling complex temporal composition of actionlets for activity prediction. Computer Vision-ECCV 2012, 286-299.
  20. Liu, N., Lovell, B. C., Kootsookos, P. J., Davis, R. I. A., Imaging, I. R., and Group, S. I. (n.d.). Understanding HMM Training for Video Gesture Recognition School of Information Technology and Electrical Engineering, (Figure 2), 2-5.
  21. Lopes, P.F. Jardim, D. Alexandre, I.M. , Math4Kids, Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on , vol., no., pp.1-6, 15-18 June 2011.
  22. Moore, D. (n.d.). Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar Introduction and Related Work Representation using SCFG The Earley-Stolcke Parsing AAAI-02, 770-776.
  23. Nevatia, Ram Zhao, Tao Hongeng, Somboon, Hierarchical Language-based Representation of Events in Video Streams, Computer Vision and Pattern Recognition Workshop, 2003. CVPRW 7803. Conference on , vol.4, no., pp.39, 16-22 June 2003 doi: 10.1109/CVPRW.2003.10038.
  24. Nguyen, N.T. Phung, D.Q. Venkatesh, S. Bui, H., Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model," Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on , vol.2, no., pp. 955- 960 vol. 2, 20-25 June 2005 doi: 10.1109/CVPR.2005.203.
  25. Niu, W., Long, J., Han, D., Wang, Y., and Barbara, S. (n.d.). Human Activity Detection and Recognition for Video Surveillance, 1-4.
  26. Oliver, N. Horvitz, E. Garg, A., Layered representations for human activity recognition, Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on, vol., no., pp. 3- 8, 2002 doi: 10.1109/ ICMI.2002.1166960.
  27. O'Rourke, J. and N. I. Badler. 1980. Model-based image analysis of human motion using constraint propagation. IEEE PAMI, 2(4).
  28. Pentland, A. and Liu, A. (1999). Modeling and prediction of human behavior. Neural computation, 11(1), 229- 42. Retrieved from pubmed/9950731.
  29. Pinhanez, C.S., Bobick, A.F., Human action detection using PNF propagation of temporal constraints, Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, vol., no., pp.898-904, 23-25 Jun 1998 doi: 10.1109/CVPR.1998.698711.
  30. Rashid, Rick. 1980. LIGHTS: a system for interpretation of moving light displays. Ph.D. thesis, University of Rochester Computer Science Department.
  31. Ryoo, M. (2011). Human activity prediction: Early recognition of ongoing activities from streaming videos. Computer Vision (ICCV), 2011 IEEE, (Iccv). Retrieved from jsp?arnumber=6126349.
  32. Ryoo, M. S., and Aggarwal, J. K. (2008). Semantic Representation and Recognition of Continued and Recursive Human Activities. International Journal of Computer Vision, 82(1), 1-24. doi:10.1007/s11263- 008-0181-1.
  33. Ryoo, M.S. Aggarwal, J.K., Semantic Understanding of Continued and Recursive Human Activities, Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, vol.1, no., pp.379-378, 0-0 0 doi: 10.1109/ICPR.2006.1043.
  34. Ryoo, M.S., Aggarwal, J.K. , Recognition of Composite Human Activities through Context-Free Grammar Based Representation, Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol.2, no., pp. 1709- 1718, 2006 doi: 10.1109/CVPR.2006.242.
  35. Sinha, S. N., Frahm, J., Pollefeys, M., and Genc, Y. (2006). GPU-based Video Feature Tracking And Matching, 012(May), 1-15.
  36. Starner, T. Pentland, A., Real-time American Sign Language recognition from video using hidden Markov models, Computer Vision, 1995. Proceedings., International Symposium on, vol., no., pp.265-270, 21-23 Nov 1995 doi: 10.1109/ISCV.1995.477012.
  37. Uddin, M. Z., Byun, K., Cho, M., Lee, S., Khang, G., and Kim, T.-S. (2011). A Spanning Tree-Based Human Activity Prediction System Using Life Logs from Depth Silhouette-Based Human Activity Recognition. In P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, and W. Kropatsch (Eds.), Computer Analysis of Images and Patterns (Vol. 6854, pp. 302- 309). Springer Berlin Heidelberg. doi:10.1007/978-3- 642-23672-3_37.
  38. Vu, V., Bremond, F., Thonnat, M., Orion, P., Sophia, I. N. R. I. A., Cedex, B.-S. A., Vu, T., et al. (2004). Automatic Video Interpretation?: A Novel Algorithm for Temporal Scenario Recognition, 1-6.
  39. Yamato, J., Ohya, J. Ishii, K., Recognizing human action in time-sequential images using hidden Markov model, Computer Vision and Pattern Recognition, 1992. Proceedings CVPR 7892., 1992 IEEE Computer Society Conference on , vol., no., pp.379-385, 15-18 Jun 1992 doi: 10.1109/CVPR.1992.223161.
  40. Yu, E. Aggarwal, J.K., Detection of Fence Climbing from Monocular Video, Pattern Recognition, 2006. ICPR 2006. 18th International Conference on , vol.1, no., pp.375-378, 0-0 0doi: 10.1109/ICPR.2006.440.
  41. Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., et al. (2009). Planning-based prediction for pedestrians. 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 3931-3936. doi:10.1109/IROS. 2009.5354147.

Paper Citation

in Harvard Style

Jardim D., Nunes L. and Dias M. (2015). Human Activity Recognition and Prediction . In Doctoral Consortium - DCPRAM, (ICPRAM 2015) ISBN , pages 24-32

in Bibtex Style

author={David Jardim and Luis Nunes and Miguel Sales Dias},
title={Human Activity Recognition and Prediction},
booktitle={Doctoral Consortium - DCPRAM, (ICPRAM 2015)},

in EndNote Style

JO - Doctoral Consortium - DCPRAM, (ICPRAM 2015)
TI - Human Activity Recognition and Prediction
SN -
AU - Jardim D.
AU - Nunes L.
AU - Dias M.
PY - 2015
SP - 24
EP - 32
DO -