
tation for human reconstruction. Advances in Neural
Information Processing Systems, 35:31130–31144.
Floater, M. S. (2003). Mean value coordinates. Computer
aided geometric design, 20(1):19–27.
G
¨
uler, R. A., Neverova, N., and Kokkinos, I. (2018). Dense-
pose: Dense human pose estimation in the wild. In
IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages
7297–7306.
Guo, C., Jiang, T., Chen, X., Song, J., and Hilliges, O.
(2023). Vid2avatar: 3d avatar reconstruction from
videos in the wild via self-supervised scene decom-
position. In IEEE/CVF Conf. Comput. Vis. Pattern
Recog.
Guo, J., Li, J., Narain, R., and Park, H. S. (2021). In-
verse simulation: Reconstructing dynamic geometry
of clothed humans via optimal control. In IEEE/CVF
Conf. Comput. Vis. Pattern Recog.
Hilton, A. and Starck, J. (2004). Multiple view recon-
struction of people. In Proceedings. 2nd International
Symposium on 3D Data Processing, Visualization and
Transmission, 2004. 3DPVT 2004., pages 357–364.
IEEE.
Jafarian, Y. and Park, H. S. (2021). Learning high fidelity
depths of dressed humans by watching social media
dance videos. In IEEE/CVF Conf. Comput. Vis. Pat-
tern Recog., pages 12753–12762.
Jiang, W., Yi, K. M., Samei, G., Tuzel, O., and Ranjan, A.
(2022). Neuman: Neural human radiance field from
a single video. In European Conference on Computer
Vision, pages 402–418. Springer.
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews,
I., Kanade, T., Nobuhara, S., and Sheikh, Y. (2015).
Panoptic studio: A massively multiview system for
social motion capture. In ICCV, pages 3334–3342.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Lin, S., Yang, L., Saleemi, I., and Sengupta, S. (2021).
Robust high-resolution video matting with temporal
guidance.
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and
Black, M. J. (2015). SMPL: A skinned multi-person
linear model. ACM Trans. Graphics (Proc. SIG-
GRAPH Asia), 34(6):248:1–248:16.
Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G.,
Tang, S., and Black, M. J. (2020). Learning to dress
3d people in generative clothing. In IEEE/CVF Conf.
Comput. Vis. Pattern Recog., pages 6469–6478.
Moakher, M. (2002). Means and averaging in the group of
rotations. SIAM J. Matrix Anal., 24(1):1–16.
Narain, R., Samii, A., and O’brien, J. F. (2012). Adap-
tive anisotropic remeshing for cloth simulation. ACM
transactions on graphics (TOG), 31(6):1–10.
Newcombe, R. A., Fox, D., and Seitz, S. M. (2015). Dy-
namicfusion: Reconstruction and tracking of non-
rigid scenes in real-time. In CVPR, pages 343–352.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., et al. (2019). Pytorch: An imperative style,
high-performance deep learning library. Advances in
neural information processing systems, 32.
Ravi, N., Reizenstein, J., Novotny, D., Gordon, T., Lo, W.-
Y., Johnson, J., and Gkioxari, G. (2020). Accelerating
3d deep learning with pytorch3d. arXiv:2007.08501.
Rong, Y., Shiratori, T., and Joo, H. (2021). Frankmocap:
A monocular 3d whole-body pose estimation system
via regression and integration. In IEEE/CVF Conf.
Comput. Vis. Pattern Recog., pages 1749–1759.
Santesteban, I., Thuerey, N., Otaduy, M. A., and Casas, D.
(2021). Self-supervised collision handling via gener-
ative 3d garment models for virtual try-on. ieee. In
IEEE/CVF Conf. Comput. Vis. Pattern Recog., vol-
ume 2, page 3.
Teed, Z. and Deng, J. (2020). Raft: Recurrent all-pairs
field transforms for optical flow. In Computer Vision–
ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part II 16,
pages 402–419. Springer.
Varol, G., Romero, J., Martin, X., Mahmood, N., Black,
M. J., Laptev, I., and Schmid, C. (2017). Learning
from synthetic humans. In CVPR, pages 109–117.
Wang, H., O’Brien, J. F., and Ramamoorthi, R. (2011).
Data-driven elastic models for cloth: modeling and
measurement. ACM transactions on graphics (TOG),
30(4):1–12.
Wang, L., Zhang, J., Liu, X., Zhao, F., Zhang, Y., Zhang, Y.,
Wu, M., Yu, J., and Xu, L. (2022). Fourier plenoctrees
for dynamic radiance field rendering in real-time. In
IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages
13524–13534.
Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Mul-
tiscale structural similarity for image quality assess-
ment. In The Thrity-Seventh Asilomar Conference on
Signals, Systems & Computers, 2003, volume 2, pages
1398–1402. Ieee.
Weng, C.-Y., Curless, B., Srinivasan, P. P., Barron, J. T.,
and Kemelmacher-Shlizerman, I. (2022). Human-
nerf: Free-viewpoint rendering of moving people
from monocular video. In IEEE/CVF Conf. Comput.
Vis. Pattern Recog., pages 16210–16220.
Xiu, Y., Yang, J., Cao, X., Tzionas, D., and Black, M. J.
(2023). ECON: Explicit Clothed humans Optimized
via Normal integration. In IEEE/CVF Conf. Comput.
Vis. Pattern Recog.
Zablotskaia, P., Siarohin, A., Zhao, B., and Sigal, L.
(2019). Dwnet: Dense warp-based network for
pose-guided human video generation. arXiv preprint
arXiv:1910.09139.
Zhao, F., Yang, W., Zhang, J., Lin, P., Zhang, Y., Yu, J., and
Xu, L. (2022). Humannerf: Efficiently generated hu-
man radiance field from sparse inputs. In IEEE/CVF
Conf. Comput. Vis. Pattern Recog., pages 7743–7753.
Learning 3D Human UV with Loose Clothing from Monocular Video
129