
Goel, S., Pavlakos, G., Rajasegaran, J., Kanazawa, A., and
Malik, J. (2023). Humans in 4D: Reconstructing and
Tracking Humans with Transformers. In ICCV.
Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C.,
Li, Y., Vijayanarasimhan, S., Toderici, G., Ricco, S.,
Sukthankar, R., et al. (2018). AVA: A Video Dataset of
Spatio-Temporally Localized Atomic Visual Actions.
In CVPR.
Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C.
(2014). Human3.6M: Large Scale Datasets and Pre-
dictive Methods for 3D Human Sensing in Natural En-
vironments. IEEE Transactions on Pattern Analysis
and Machine Intelligence.
Kanazawa, A., Black, M. J., Jacobs, D. W., and Malik, J.
(2018). End-to-End Recovery of Human Shape and
Pose. In CVPR.
Kanazawa, A., Zhang, J. Y., Felsen, P., and Malik, J. (2019).
Learning 3D Human Dynamics from Video. In CVPR.
Khirodkar, R., Tripathi, S., and Kitani, K. (2022). Occluded
human mesh recovery. In CVPR.
Kocabas, M., Athanasiou, N., and Black, M. J. (2020).
VIBE: Video Inference for Human Body Pose and
Shape Estimation. In CVPR.
Kocabas, M., Huang, C.-H. P., Hilliges, O., and Black, M. J.
(2021). PARE: Part Attention Regressor for 3D Hu-
man Body Estimation. In ICCV.
Kolotouros, N., Pavlakos, G., and Daniilidis, K. (2019).
Convolutional Mesh Regression for Single-Image Hu-
man Shape Reconstruction. In CVPR.
Kolotouros, N., Pavlakos, G., Jayaraman, D., and Dani-
ilidis, K. (2021). Probabilistic Modeling for Human
Mesh Recovery. In ICCV.
Li, J., Xu, C., Chen, Z., Bian, S., Yang, L., and Lu, C.
(2021). Hybrik: A Hybrid Analytical-Neural Inverse
Kinematics Solution for 3D Human Pose and Shape
Estimation. In CVPR.
Li, Z., Liu, J., Zhang, Z., Xu, S., and Yan, Y. (2022).
CLIFF: Carrying Location Information in Full Frames
into Human Pose and Shape Estimation. In ECCV.
Lin, J., Zeng, A., Wang, H., Zhang, L., and Li, Y.
(2023). One-Stage 3D Whole-Body Mesh Recovery
with Component Aware Transformer. In CVPR.
Lin, K., Wang, L., and Liu, Z. (2021a). End-to-End Human
Pose and Mesh Reconstruction with Transformers. In
CVPR.
Lin, K., Wang, L., and Liu, Z. (2021b). Mesh Graphormer.
In ICCV.
Lin, T.-Y., Maire, M., Belongie, S. J., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft COCO: Common Objects in Context. In
ECCV.
Ma, X., Su, J., Wang, C., Zhu, W., and Wang, Y. (2023).
3D Human Mesh Estimation from Virtual Markers. In
CVPR.
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko,
O., Xu, W., and Theobalt, C. (2017). Monocular 3D
Human Pose Estimation in the Wild Using Improved
CNN Supervision. In 3DV.
Moon, G., Choi, H., and Lee, K. M. (2022). Accurate 3D
Hand Pose Estimation for Whole-Body 3D Human
Mesh Estimation. In CVPR.
Moon, G. and Lee, K. M. (2020). I2L-MeshNet: Image-
to-Lixel Prediction Network for Accurate 3D Human
Pose and Mesh Estimation from a Single RGB Image.
In ECCV.
Song, J., Chen, X., and Hilliges, O. (2020). Human
Body Model Fitting by Learned Gradient Descent. In
ECCV.
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E.,
Laptev, I., and Schmid, C. (2018). BodyNet: Volumet-
ric Inference of 3D Human Body Shapes. In ECCV.
Vaswani, A., Shazeer, N. M., Parmar, N., Uszkoreit, J.,
Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin,
I. (2017). Attention Is All You Need. In NeurIPS.
Von Marcard, T., Henschel, R., Black, M. J., Rosenhahn, B.,
and Pons-Moll, G. (2018). Recovering Accurate 3D
Human Pose in the Wild Using IMUs and a Moving
Camera. In ECCV.
Wu, J., Zheng, H., Zhao, B., Li, Y., Yan, B., Liang, R.,
Wang, W., Zhou, S., Lin, G., Fu, Y., Wang, Y., and
Wang, Y. (2017). AI Challenger : A Large-scale
Dataset for Going Deeper in Image Understanding.
ArXiv.
Xu, Y., Zhang, J., Zhang, Q., and Tao, D. (2022). Vitpose:
Simple vision transformer baselines for human pose
estimation. Advances in Neural Information Process-
ing Systems, 35:38571–38584.
Xue, Y., Chen, J., Zhang, Y., Yu, C., Ma, H., and Ma, H.
(2022). 3D Human Mesh Reconstruction by Learning
to Sample Joint Adaptive Tokens for Transformers. In
ACM.
Yao, C., Yang, J., Ceylan, D., Zhou, Y., Zhou, Y., and Yang,
M.-H. (2022). Learning Visibility for Robust Dense
Human Body Estimation. ArXiv.
You, Y., Liu, H., Wang, T., Li, W., Ding, R., and Li, X.
(2023). Co-Evolution of Pose and Mesh for 3D Hu-
man Body Estimation from Video. In ICCV.
Zanfir, A., Marinoiu, E., and Sminchisescu, C. (2018).
Monocular 3D Pose and Shape Estimation of Multiple
People in Natural Scenes: The Importance of Multiple
Scene Constraints. In CVPR.
Zanfir, M., Zanfir, A., Bazavan, E. G., Freeman, W. T., Suk-
thankar, R., and Sminchisescu, C. (2021). THUNDR:
Transformer-Based 3D Human Reconstruction with
Markers. In ICCV.
Zhang, H., Tian, Y., Zhang, Y., Li, M., An, L., Sun, Z.,
and Liu, Y. (2022). PyMAF-X: Towards Well-Aligned
Full-Body Model Regression From Monocular Im-
ages. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence.
Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang,
L., and Sun, Z. (2021). PyMAF: 3D Human Pose
and Shape Regression with Pyramidal Mesh Align-
ment Feedback Loop. In ICCV.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
742