for different classes we trained class-specific Siamese
neural networks and corresponding multichannel 1D
CNNs. The final decision is taken on the basis of the
voting. The presented algorithm achieves promising
results in comparison to recent algorithms. It achieves
considerable gain in recognition accuracy on challeng-
ing SYSU 3DHOI dataset. We demonstrated experi-
mentally that our algorithm outperforms several recent
skeleton-based methods.
ACKNOWLEDGEMENTS
This work was supported by Polish National
Science Center (NCN) under a research grant
2017/27/B/ST6/01743.
REFERENCES
Bulbul, M., Islam, S., and Ali, H. (2019). 3D human action
analysis and recognition through GLAC descriptor on
2D motion and static posture images. Multim. Tools
Appl., 78(15).
Chen, C., Jafari, R., and Kehtarnavaz, N. (2015). UTD-
MHAD: A multimodal dataset for human action recog-
nition utilizing a depth camera and a wearable inertial
sensor. In IEEE ICIP, pages 168–172.
Chopra, S., Hadsell, R., and LeCun, Y. (2005). Learning a
similarity metric discriminatively, with application to
face verification. In CVPR, pages 539–546.
Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimension-
ality reduction by learning an invariant mapping. In
CVPR, pages 1735–1742. IEEE Comp. Society.
Hinton, G. and Salakhutdinov, R. (2006). Reducing the
dimensionality of data with neural networks. Science,
313(5786):504 – 507.
Hou, Y., Li, Z., Wang, P., and Li, W. (2018). Skeleton optical
spectra-based action recognition using convolutional
neural networks. IEEE Trans. CSVT, 28(3):807–811.
Hu, J., Zheng, W., Lai, J., and Zhang, J. (2015). Jointly
learning heterogeneous features for RGB-D activity
recognition. In CVPR, pages 5344–5352.
Hu, J., Zheng, W., Ma, L., Wang, G., Lai, J., and Zhang,
J. (2019). Early action prediction by soft regression.
IEEE Trans. PAMI, 41(11):2568–2583.
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). Siamese
neural networks for one-shot image recognition. In
ICML Deep Learning Workshop. vol. 2.
Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition
based on a bag of 3D points. In CVPR Workshops,
pages 9–14.
Liang, B. and Zheng, L. (2015). A survey on human ac-
tion recognition using depth sensors. In Int. Conf. on
Digital Image Comp.: Techn. and Appl., pages 1–8.
Masci, J., Meier, U., Cire
s¸
an, D., and Schmidhuber, J. (2011).
Stacked convolutional auto-encoders for hierarchical
feature extraction. In ICANN, vol. I, pages 52–59.
Ren, B., Liu, M., Ding, R., and Liu, H. (2020). A survey on
3D skeleton-based action recognition using learning
method. arXiv, 2002.05907.
Wang, L., Huynh, D. Q., and Koniusz, P. (2020). A compar-
ative review of recent Kinect-based action recognition
algorithms. IEEE Trans. Image Process., 29:15–28.
Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., and Ogunbona,
P. (2016). Action recognition from depth maps using
deep convolutional neural networks. IEEE Trans. on
Human-Machine Systems, 46(4):498–509.
Wang, P., Li, W., Li, C., and Hou, Y. (2018). Action recogni-
tion based on joint trajectory maps with convolutional
neural networks. Knowledge-Based Syst., 158:43 – 53.
Wang, P., Wang, S., Gao, Z., Hou, Y., and Li, W. (2017).
Structured images for RGB-D action recognition. In
ICCV Workshops, pages 1005–1014.
Wang, X., Hu, J.-F., Lai, J.-H., Zhang, J., and Zheng, W.-S.
(2019). Progressive teacher-student learning for early
action prediction. In CVPR, pages 3551–3560.
Wu, Y. (2012). Mining actionlet ensemble for action recogni-
tion with depth cameras. In CVPR, pages 1290–1297.
Xia, L. and Aggarwal, J. (2013). Spatio-temporal depth
cuboid similarity feature for activity recognition using
depth camera. In CVPR, pages 2834–2841.
Xia, L., Chen, C.-C., and Aggarwal, J. (2012). View invari-
ant human action recognition using histograms of 3D
joints. In CVPR Workshops, pages 20–27.
Yang, X., Zhang, C., and Tian, Y. L. (2012). Recognizing
actions using depth motion maps-based histograms of
oriented gradients. In Proc. of the 20th ACM Int. Conf.
on Multimedia, pages 1057–1060. ACM.
Zheng, Y., Liu, Q., Chen, E., Ge, Y., and Zhao, J. L. (2014).
Time series classification using multi-channels deep
convolutional neural networks. In Web-Age Informa-
tion Management, pages 298–310. Springer.
Zheng, Y., Liu, Q., Chen, E., Ge, Y., and Zhao, J. L. (2016).
Exploiting multi-channels deep convolutional neural
networks for multivariate time series classification.
Frontiers of Computer Science, 10(1):96–112.
Embedded Features for 1D CNN-based Action Recognition on Depth Maps
543