We found that different architectures used for 2D
human pose estimation models have different
advantages in the 3D human pose estimation task in
complex sports scenarios: multi-stage cascading and
intermediate supervision (MSPN and RSN), stage-
by-stage linking of high-level features in deep layer
network (MSPN), fusion of local and global features
(MSPN and RSN), and densely connected structure
with branches and diverse convolutional layers
(RSN). Based on these findings, we concluded that
the choice of 2D pose estimation method and their
network architectures have a significant effect on the
performance of 3D pose estimation in complex sports
scenarios, and that different models and architectures
are suitable for different application scenarios.
These findings provide strategies for improving
3D pose estimation models and insights and future
perspectives for the development of robust and
efficient 3D human pose estimation algorithms for
complex real-world sports scenarios.
ACKNOWLEDGEMENTS
This work was supported by JST SPRING, Japan
Grant Number JPMJSP2106.
REFERENCES
Liu, W., Bao, Q., Sun, Y., & Mei, T. (2022). Recent
advances of monocular 2d and 3d human pose
estimation: A deep learning perspective. ACM
Computing Surveys, 55(4), 1-41.
Xiao, B., Wu, H., & Wei, Y. (2018). Simple baselines for
human pose estimation and tracking. In Proceedings of
the European conference on computer vision (ECCV)
(pp. 466-481).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn:
Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems, 28.
Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-
resolution representation learning for human pose
estimation. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition
(pp. 5693-5703).
Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., ... &
Sun, J. (2020). Learning delicate local representations
for multi-person pose estimation. In Computer Vision–
ECCV 2020: 16th European Conference, Glasgow, UK,
August 23–28, 2020, Proceedings, Part III 16 (pp. 455-
472). Springer International Publishing.
Li, W., Wang, Z., Yin, B., Peng, Q., Du, Y., Xiao, T., ... &
Sun, J. (2019). Rethinking on multi-stage networks for
human pose estimation. arXiv preprint
arXiv:1901.00148.
Bryan, M. A., Rowhani-Rahbar, A., Comstock, R. D., &
Rivara, F. (2016). Sports-and recreation-related
concussions in US youth. Pediatrics, 138(1).
Giza, C. C., & Hovda, D. A. (2001). The neurometabolic
cascade of concussion. Journal of athletic training,
36(3), 228.
Courtney, A., & Courtney, M. (2015). The complexity of
biomechanics causing primary blast-induced traumatic
brain injury: a review of potential mechanisms.
Frontiers in neurology, 6, 221.
McKee, A. C., Stein, T. D., Nowinski, C. J., Stern, R. A.,
Daneshvar, D. H., Alvarez, V. E., ... & Cantu, R. C.
(2013). The spectrum of disease in chronic traumatic
encephalopathy. Brain, 136(1), 43-64.
Ji, S., Zhao, W., Ford, J. C., Beckwith, J. G., Bolander, R.
P., Greenwald, R. M., ... & McAllister, T. W. (2015).
Group-wise evaluation and comparison of white matter
fiber strain and maximum principal strain in sports-
related concussion. Journal of neurotrauma, 32(7),
441-454.
Camarillo, D. B., Shull, P. B., Mattson, J., Shultz, R., &
Garza, D. (2013). An instrumented mouthguard for
measuring linear and angular head impact kinematics in
American football. Annals of biomedical engineering,
41, 1939-1949.
Madhukar, A., & Ostoja-Starzewski, M. (2019). Finite
element methods in human head impact simulations: a
review.
Annals of biomedical engineering, 47(9), 1832-
1854.
Cortes, N., Lincoln, A. E., Myer, G. D., Hepburn, L.,
Higgins, M., Putukian, M., & Caswell, S. V. (2017).
Video analysis verification of head impact events
measured by wearable sensors. The American journal
of sports medicine, 45(10), 2379-2387.
Camarillo, D. B., Shull, P. B., Mattson, J., Shultz, R., &
Garza, D. (2013). An instrumented mouthguard for
measuring linear and angular head impact kinematics in
American football. Annals of biomedical engineering,
41, 1939-1949.
Wu, L. C., Nangia, V., Bui, K., Hammoor, B., Kurt, M.,
Hernandez, F., ... & Camarillo, D. B. (2016). In vivo
evaluation of wearable head impact sensors. Annals of
biomedical engineering, 44, 1234-1245.
King, D., Hume, P. A., Brughelli, M., & Gissane, C. (2015).
Instrumented mouthguard acceleration analyses for
head impacts in amateur rugby union players over a
season of matches. The American journal of sports
medicine, 43(3), 614-624.
Li, W., Liu, H., Ding, R., Liu, M., Wang, P., & Yang, W.
(2022). Exploiting temporal contexts with strided
transformer for 3d human pose estimation. IEEE
Transactions on Multimedia, 25, 1282-1293.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You only look once: Unified, real-time object detection.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 779-788).