
space around an SMPL body mesh, resulting in a
surface-aligned representation that can be animated
through skeletal joint parameters. These joint param-
eters serve as additional input to the NeRF, allowing
for pose-dependent appearances in the rendered out-
put. Moreover, our representation includes 2D UV
coordinates on the mesh texture map and the distance
between the query point and the mesh, facilitating ef-
ficient learning despite mapping ambiguities and ran-
dom visual variations.
Through extensive experiments, we have demon-
strated the effectiveness of our approach in generat-
ing high-quality renderings for novel-view and novel-
pose synthesis. The results showcase the capabilities
of our framework in producing visually appealing and
controllable 3D human models, offering realistic and
pose-dependent renderings, even from a very limited
set of training frames. In future, we plan to train the
model on a more diverse set of poses to enable even
more accurate reconstructions.
Although our model produces high-fidelity ren-
derings, there remain aspects and challenges that have
room for enhancement. Primarily, as in similar meth-
ods, the model struggles with rendering individual
fingers, as visible in Figure 1. We have been using the
standard SMPL model, which does not capture hand
geometry well and will improve by using models spe-
cialized on hands, such as SMPL-H, as well as custom
model fitting for finger parameters.
ACKNOWLEDGEMENTS
This work has partly been funded by the German Fed-
eral Ministry of Education and Research (Voluprof,
grant no. 16SV8705) and the German Federal Min-
istry for Economic Affairs and Climate Action (To-
HyVe, grant no. 01MT22002A).
REFERENCES
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., and
Sheikh, Y. A. (2019). Openpose: Realtime multi-
person 2d pose estimation using part affinity fields.
IEEE Transactions on Pattern Analysis and Machine
Intelligence.
Chen, A., Xu, Z., Geiger, A., Yu, J., and Su, H. (2022). Ten-
sorf: Tensorial radiance fields. In Computer Vision–
ECCV 2022: 17th European Conference, Tel Aviv, Is-
rael, October 23–27, 2022, Proceedings, Part XXXII,
pages 333–350. Springer.
Chen, D., Worchel, M., Feldmann, I., Schreer, O., and Eis-
ert, P. (2021). Accurate human body reconstruction
for volumetric video. In Proc. Int. Conf. on 3D Im-
mersion (IC3D).
Du, Y., Zhang, Y., Yu, H.-X., Tenenbaum, J. B., and Wu,
J. (2021). Neural radiance flow for 4d view synthesis
and video processing. In Proc. of the IEEE/CVF In-
ternational Conference on Computer Vision (ICCV).
Fang, Q., Shuai, Q., Dong, J., Bao, H., and Zhou, X. (2021).
Reconstructing 3d human pose by watching humans in
the mirror. In CVPR.
Fechteler, P., Hilsmann, A., and Eisert, P. (2019). Marker-
less multiview motion capture with 3d shape model
adaptation. Computer Graphics Forum, 38(6):91–
109.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hilsmann, A., Fechteler, P., Morgenstern, W., Paier, W.,
Feldmann, I., Schreer, O., and Eisert, P. (2020). Go-
ing beyond free viewpoint: Creating animatable volu-
metric video of human performances. IET Computer
Vision, Special Issue on Computer Vision for the Cre-
ative Industries, 14(6):350–358.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., and
Theobalt, C. (2021). Neural actor: Neural free-view
synthesis of human actors with pose control. ACM
Trans. Graph.(ACM SIGGRAPH Asia).
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and
Black, M. (2015). Smpl: a skinned multi-person linear
model. volume 34.
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,
Ramamoorthi, R., and Ng, R. (2021). Nerf: Repre-
senting scenes as neural radiance fields for view syn-
thesis. Communications of the ACM, 65(1):99–106.
Müller, T., Evans, A., Schied, C., and Keller, A. (2022).
Instant neural graphics primitives with a multiresolu-
tion hash encoding. ACM Transactions on Graphics,
41(4):1–15.
Noguchi, A., Sun, X., Lin, S., and Harada, T. (2021). Neu-
ral articulated radiance field. In Proc. International
Conference on Computer Vision (ICCV).
Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman,
D. B., Seitz, S. M., and Martin-Brualla, R. (2021a).
Nerfies: Deformable neural radiance fields. In Proc.
International Conference on Computer Vision (ICCV).
Park, K., Sinha, U., Hedman, P., Barron, J. T., Bouaziz, S.,
Goldman, D. B., Martin-Brualla, R., and Seitz, S. M.
(2021b). Hypernerf: A higher-dimensional represen-
tation for topologically varying neural radiance fields.
ACM Transaction on Graphics, 40(6).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., et al. (2019). Pytorch: An imperative style,
high-performance deep learning library. Advances in
neural information processing systems, 32.
Peng, S., Dong, J., Wang, Q., Zhang, S., Shuai, Q., Zhou,
X., and Bao, H. (2021a). Animatable neural radiance
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
412