Table 7 presents comparative results for the
proposed approach and other state of the art
approaches using the KTH dataset. Table 7 shows that
the approach of Wang et al., (Wang et al., 2013)
achieves an accuracy of 95.7%. While this is higher
than the proposed approach, the computational cost of
this method prevents it from running in real time. We
also compare our approach with that in (Reid et al.,
2020) who used a reduced sample rate and sample
size to achieve real time performance using body
keypoints. The proposed approach performs
significantly better, indicating that the use of keypoint
changes is a more robust alternative to simply
reducing the sample rate and sample size while
maintaining the real-time performance.
Table 7: Comparison of approaches on the KTH dataset.
Performance evaluation using the KTH
dataset
Approach Accuracy Speed/FPS
(Wang et al., 2013) 95.7% 3
(Reid et al., 2020) 90.2% 24
Keypoint Changes
94.2% 24
6 CONCLUSION
We have presented a method for human activity
recognition based on calculating the key points
changes (Euclidean distance and angle). We have
shown that this approach achieves accuracy on par
with current state of the art methods, while using a
sparse representation. Further, we have conducted
run-time experiments and shown that this method is
sufficiently fast enough for real time applications. In
future work we will investigate how this approach
performs for multi-person activity recognition and
adapt this approach for more complex activities and
scenes involving one or more people.
REFERENCES
Cai, Y., Wang, Z., Yin, B., Yin, R., Du, A., Luo, Z., Li, Z.,
Zhou, X., Yu, G., Zhou, E., Zhang, X., Wei, Y., & Sun,
J. (2019). Res-steps-net for multi-person pose
estimation. Joint COCO and Mapillary Workshop at
ICCV 2019: COCO Keypoint Challenge Track.
Camarena, F., Chang, L., & Gonzalez-Mendoza, M. (2019).
Improving the dense trajectories approach towards
efficient recognition of simple human activities. 2019
7th International Workshop on Biometrics and
Forensics (IWBF), 1–6.
Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017).
Realtime multi-person 2D pose estimation using part
affinity fields. IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 7291–7299.
Choutas, V., Weinzaepfel, P., Revaud, J., & Schmid, C.
(n.d.). PoTion: Pose MoTion Representation for Action
Recognition.
D’Sa, A. G., & Prasad, B. G. (2019). An IoT Based
Framework For Activity Recognition Using Deep
Learning Technique. In ArXiv Preprint.
http://arxiv.org/abs/1906.07247
Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005).
Behavior recognition via sparse spatio-temporal
features. 2005 IEEE International Workshop on Visual
Surveillance and Performance Evaluation of Tracking
and Surveillance, 65–72.
Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003).
Recognising Action at a distance. Proceedings Ninth
IEEE International Conference on Computer Vision, 2,
726–733. https://doi.org/10.1017/s1358246107000136
Gao, Z., Chen, M. Y., Hauptmann, A. G., & Cai, A. (2010).
Comparing evaluation protocols on the KTH dataset.
International Workshop on Human Behavior
Understanding, 88–100.
Gorelick, L., Blank, M., Shechtman, E., Member, S., Irani,
M., & Basri, R. (2007). Actions as space time shapes.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29(12), 2247–2253. https://doi.org/
10.1109/TPAMI.2007.70711
Guo, K., Ishwar, P., & Konrad, J. (2010). Action
recognition using sparse representation on covariance
manifolds of optical flow. Proceedings - IEEE
International Conference on Advanced Video and
Signal Based Surveillance, AVSS 2010, 188–195.
https://doi.org/10.1109/AVSS.2010.71
Jain, M., Jégou, H., & Bouthemy, P. (2013). Better
exploiting motion for better action recognition.
https://doi.org/10.1109/CVPR.2013.330
Ke, Y., Sukthankar, R., & Hebert, M. (2005). Efficient
Visual Event Detection Using Volumetric Features.
Tenth IEEE International Conference on Computer
Vision (ICCV’05), 166–173. https://doi.org/10.1109/
CVPR.2007.383137
Laptev, I. (2004). Local Spatio-Temporal Image Features
for Motion Interpretation.
Lee, D. G., & Lee, S. W. (2019). Prediction of partially
observed human activity based on pre-trained deep
representation. Pattern Recognition, 85, 198–206.
https://doi.org/10.1016/j.patcog.2018.08.006
Lin, Liang, et al. (2020). The Foundation and Advances of
Deep Learning. In Human Centric Visual Analysis with
Deep Learning (pp. 3-13.).
Matikainen, P., Hebert, M., & Sukthankar, R. (2009).
Trajectons: Action recognition through the motion
analysis of tracked features. 2009 IEEE 12th
International Conference on Computer Vision
Workshops, ICCV Workshops 2009, 514–521.
https://doi.org/10.1109/ICCVW.2009.5457659