ber of cameras. In IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Elhayek, A., de Aguiar, E., Jain, A., Tompson, J.,
Pishchulin, L., Andriluka, M., Bregler, C., Schiele,
B., and Theobalt, C. (2015b). Efficient convnet-based
marker-less motion capture in general scenes with a
low number of cameras. In 2015 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 3810–3818.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In
Advances in neural information processing systems,
pages 2672–2680.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep
residual learning for image recognition. CoRR,
abs/1512.03385.
Hwang, D.-H., Aso, K., Yuan, Y., Kitani, K., and Koike, H.
(2020). Monoeye: Multimodal human motion capture
system using a single ultra-wide fisheye camera. In
Proceedings of the 33rd Annual ACM Symposium on
User Interface Software and Technology, UIST ’20,
pages 98–111.
Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C.
(2014). Human3.6m: Large scale datasets and pre-
dictive methods for 3d human sensing in natural envi-
ronments. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 36:1325–1339.
Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., Baner-
jee, S., Godisart, T., Nabbe, B. C., Matthews, I. A.,
Kanade, T., Nobuhara, S., and Sheikh, Y. (2016).
Panoptic studio: A massively multiview system for
social interaction capture. CoRR, abs/1612.03153.
Kanazawa, A., Black, M. J., Jacobs, D. W., and Malik,
J. (2018). End-to-end recovery of human shape and
pose. In Computer Vision and Pattern Regognition
(CVPR).
Kolotouros, N., Pavlakos, G., Black, M. J., and Dani-
ilidis, K. (2019). Learning to reconstruct 3d human
pose and shape via model-fitting in the loop. CoRR,
abs/1909.12828.
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and
Black, M. J. (2015). SMPL: A skinned multi-person
linear model. ACM Trans. Graphics (Proc. SIG-
GRAPH Asia), 34(6):248:1–248:16.
Loper, M. M., Mahmood, N., and Black, M. J. (2014).
MoSh: Motion and shape capture from sparse mark-
ers. ACM Transactions on Graphics, (Proc. SIG-
GRAPH Asia), 33(6):220:1–220:13.
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O.,
Xu, W., and Theobalt, C. (2017a). Monocular 3d hu-
man pose estimation in the wild using improved cnn
supervision. In 3D Vision (3DV), 2017 Fifth Interna-
tional Conference on. IEEE.
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H.,
Shafiei, M., Seidel, H., Xu, W., Casas, D., and
Theobalt, C. (2017b). Vnect: Real-time 3d human
pose estimation with a single RGB camera. CoRR,
abs/1705.01583.
Mixamo (2022). Get animated. https://www.mixamo.com/.
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Os-
man, A. A. A., Tzionas, D., and Black, M. J. (2019).
Expressive body capture: 3D hands, face, and body
from a single image. In Proceedings IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR),
pages 10975–10985.
Pavlakos, G., Zhou, X., and Daniilidis, K. (2018). Ordi-
nal depth supervision for 3d human pose estimation.
CoRR, abs/1805.04095.
Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis,
K. (2016). Coarse-to-fine volumetric prediction for
single-image 3d human pose. CoRR, abs/1611.07828.
Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis, K.
(2017). Harvesting multiple views for marker-less 3d
human pose annotations. CoRR, abs/1704.04793.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Shiratori, T., Park, H. S., Sigal, L., Sheikh, Y., and Hodgins,
J. K. (2011). Motion capture from body-mounted
cameras. ACM Trans. Graph., 30(4).
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V.,
and Fua, P. (2016). Structured prediction of 3d
human pose with deep neural networks. CoRR,
abs/1605.05180.
von Marcard, T., Rosenhahn, B., Black, M., and Pons-
Moll, G. (2017). Sparse inertial poser: Automatic
3d human pose estimation from sparse imus. Com-
puter Graphics Forum 36(2), Proceedings of the 38th
Annual Conference of the European Association for
Computer Graphics (Eurographics).
Xu, W., Chatterjee, A., Zollhoefer, M., Rhodin, H., Fua,
P., Seidel, H.-P., and Theobalt, C. (2019). Mo
2
Cap
2
: Real-time mobile 3d motion capture with a cap-
mounted fisheye camera. IEEE Transactions on Vi-
sualization and Computer Graphics, pages 1–1.
Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. (2017).
Weakly-supervised transfer for 3d human pose esti-
mation in the wild. CoRR, abs/1704.02447.
3D Human Body Reconstruction from Head-Mounted Omnidirectional Camera and Light Sources
989