Fang, H., Xie, S., Tai, Y., Lu, C., 2017. RMPE: Regional
Multi-person Pose Estimation. IEEE International
Conference on Computer Vision, 2353-2362.
Feichtenhofer, C., Pinz, A., Zisserman, A., 2016. Deep
Residual Learning for Image Recognition. IEEE
Conference on Computer Vision and Pattern
Recognition, 1933-1941.
He, K., Gkioxari, G., Dollár, P., Girshick, R. B., 2017.
Mask R-CNN. IEEE International Conference on
Computer Vision, 2980-2988.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual
Learning for Image Recognition. IEEE Conference on
Computer Vision and Pattern Recognition, 770-778.
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu,
K., 2015. Spatial Transformer Networks. Conference
on Neural Information Processing Systems, 2017–
2025.
Joseph, R., Ali, F., 2018. YOLOv3: An Incremental
Improvement. arXiv preprint arXiv: 1804.02767.
Kim T. S., Reiter, A., 2017. Interpretable 3d Human Action
Analysis with Temporal Convolutional Networks.
Computer Vision and Pattern Recognition Workshops,
1623–1631.
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.,
2011. HMDB: A Large Video Database for Human
Motion Recognition. IEEE International Conference
on Computer Vision, 2556-2563.
Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L., 2018.
Skeleton-Based Relational Modeling for Action
Recognition. arXiv preprint arXiv:1805.02556.
Liu, J., Shahroudy, A., Xu, D., Wang, G., 2016. Spatio-
Temporal LSTM with Trust Gates for 3d Human
Action Recognition. European Conference on
Computer Vision, 816-833.
Paszke, A., Gross, S., Massa, F. et al., 2015. PyTorch: An
Imperative Style, High-Performance Deep Learning
Library. Advances in Neural Information Processing
Systems, 8024-8035.
Qiu, Z., Yao, T., Mei, T., 2017. Learning Spatio-Temporal
Representation with Pseudo-3D Residual Networks.
2017 IEEE Conference on Computer Vision and
Pattern Recognition, 5534-5542.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein,
M. S., Berg, A. C., Li, F., 2015. ImageNet Large Scale
Visual Recognition Challenge. International Journal of
Computer Vision, 211-252.
Shahroudy, A., Liu, J., Ng, T., Wang, G., 2016. NTU
RGB+D: A Large Scale Dataset for 3d Human Activity
Analysis. IEEE Conference on Computer Vision and
Pattern Recognition, 1010-1019.
Shi, L., Zhang, Y., Cheng, J., Lu, H., 2018. Non-Local
Graph Convolutional Networks for Skeleton-Based
Action Recognition. arXiv preprint arXiv:1805.07694.
Shi, L., Zhang, Y., Cheng, J., Lu, H., 2019. Skeleton-Based
Action Recognition with Directed Graph Neural
Networks. IEEE Conference on Computer Vision and
Pattern Recognition, 7912-7921.
Simonyan, K., Zisserman, A., 2014. Two-Stream
Convolutional Networks for Action Recognition in
Videos. Annual Conference on Neural Information
Processing Systems 2014, 568-576.
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J., 2017. An End-
to-End Spatio-Temporal Attention Model for Human
Action Recognition from Skeleton Data. AAAI
Conference on Artificial Intelligence, 4263-4270.
Soomro, K., Zamir, A. R., Shah, M., 2012. A Dataset of 101
Human Actions Classes from Videos in The Wild.
arXiv preprint arXiv:1212.0402.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E.,
Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich,
A., 2015. Going deeper with convolutions. IEEE
Conference on Computer Vision and Pattern
Recognition, 1-9.
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J., 2018. Deep
Progressive Reinforcement Learning for Skeleton-
Based Action Recognition. IEEE Conference on
Computer Vision and Pattern Recognition, 5323-5332.
Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., Paluri,
M., 2015. Learning Spatiotemporal Features with 3D
Convolutional Networks. 2015 IEEE International
Conference on Computer Vision, 4489-4497.
Tran, D., Wang, H., Torresani, L., Feiszli, M., 2019. Video
Classification with Channel-Separated Convolutional
Networks. arXiv preprint arXiv: 1904.02811.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y.,
Paluri, M., 2018. A Closer Look at Spatiotemporal
Convolutions for Action Recognition. 2018 IEEE
Conference on Computer Vision and Pattern
Recognition, 6450-6459.
Wang, H., Schmid, C., 2013. Action Recognition with
Improved Trajectories. IEEE International Conference
on Computer Vision, 3551-3558.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X.,
Gool, L. V., 2016. Temporal Segment Networks:
Towards Good Practices for Deep Action Recognition.
European Conference on Computer Vision, 20-36.
Xie, S., Girshick, R. B., Dollár, P., Tu, Z., He, K., 2017.
Aggregated Residual Transformations for Deep Neural
Networks. IEEE Conference on Computer Vision and
Pattern Recognition, 5987-5995.
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K., 2018.
Rethinking Spatiotemporal Feature Learning for Video
Understanding. European Conference on Computer
Vision, 318-335.
Yan, S., Xiong, Y., Lin, D., 2018. Spatial Temporal Graph
Convolutional Networks for Skeleton-Based Action
Recognition. AAAI Conference on Artificial
Intelligence,
7444-7452.