
reality and robotics. An open-source dataset of vari-
ous emotional expressions was developed and a pro-
cessing pipeline was implemented to analyze skele-
tal and 3D body representations. A generative model
based on Variational Autoencoders (VAEs), in partic-
ular VPoser, was used to generate new 3D poses that
retain emotional nuances. Future work includes inte-
grating these poses into NPC animation pipelines, ex-
tending the dataset for better visualization, and evalu-
ating the impact on user experience in real-world ap-
plications.
ACKNOWLEDGEMENTS
This research is partially supported by the project
“Romanian Hub for Artificial Intelligence - HRIA”,
Smart Growth, Digitization and Financial Instru-
ments Program, 2021-2027, MySMIS no. 334906
and a grant of the Ministry of Research, Innovation
and Digitization, CNCS/CCCDI-UEFISCDI, project
no. PN-IV-P8-8.1-PRE-HE-ORG-2023-0081, within
PNCDI IV.
REFERENCES
Akhter, I. and Black, M. J. (2015). Pose-conditioned joint
angle limits for 3D human pose reconstruction. In In
proceedings of IEEE CVPR 2015.
Anguelov, D. et al. (2005). Scape: shape comple-
tion and animation of people. ACM Trans. Graph.,
24(3):408–416.
Bogo, F. et al. (2016). Keep it SMPL: Automatic estima-
tion of 3D human pose and shape from a single im-
age. In Computer Vision – ECCV 2016, Lecture Notes
in Computer Science. Springer International Publish-
ing.
Bolkart, T. and Wuhrer, S. (2021). FLAME: A 3d mor-
phable model of the head and face based on 3d scans.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 43(11):2808–2821.
Cao, Z. et al. (2019). Openpose: Realtime multi-person 2d
pose estimation using part affinity fields. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence.
Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. (2017). Real-
time multi-person 2d pose estimation using part affin-
ity fields. In CVPR.
Hirshberg, D. et al. (2012). Coregistration: Simultane-
ous alignment and modeling of articulated 3d shape.
7577:242–255.
Ionescu, C. et al. (2014). Human3.6m: Large scale datasets
and predictive methods for 3d human sensing in natu-
ral environments. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 36(7):1325–1339.
Kolotouros, N. et al. (2019). Learning to reconstruct 3d
human pose and shape via model-fitting in the loop.
Li, C. et al. (2023). The good, the bad, and why: Unveiling
emotions in generative ai. In ICML 2024.
Loper, M. et al. (2015). Smpl: a skinned multi-person linear
model. ACM Trans. Graph., 34(6).
Loper, M., Mahmood, N., and Black, M. J. (2014). Mosh:
motion and shape capture from sparse markers. ACM
Trans. Graph., 33(6).
Mollahosseini, A., Hassani, B., and Mahoor, M. H. (2017).
Affectnet: A database for facial expression, va-
lence, and arousal computing in the wild. CoRR,
abs/1708.03985.
Park, J. S. et al. (2023). Generative agents: Interactive sim-
ulacra of human behavior. In In the 36th Annual ACM
Symposium on User Interface Software and Technol-
ogy (UIST ’23), UIST ’23, New York, NY, USA. As-
sociation for Computing Machinery.
Pavlakos, G. et al. (2019). Expressive body capture: 3d
hands, face, and body from a single image. In Pro-
ceedings IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR).
Pons-Moll, G. et al. (2015). Dyna: A model of dynamic
human shape in motion.
Prokudin, S., Black, M. J., and Romero, J. (2021). Smplpix:
Neural avatars from 3d human models. In Proceedings
of the IEEE/CVF WACV, pages 1810–1819.
Romero, J., Tzionas, D., and Black, M. J. (2017). Embod-
ied hands: Modeling and capturing hands and bod-
ies together. ACM Transactions on Graphics (TOG),
36(6):245:1–245:17.
Simon, T. et al. (2017). Hand keypoint detection in single
images using multiview bootstrapping. In CVPR.
Stathopoulos, A., Han, L., and Metaxas, D. (2024). Score-
guided diffusion for 3d human recovery. In CVPR.
Sun, J. J. et al. (2021). Eev: A large-scale dataset for study-
ing evoked expressions from video. arXiv preprint
arXiv:2001.05488.
Vrajitoru, D. (2006). Npcs and chatterbots with personality
and emotional response. pages 142 – 147.
Wei, S.-E. et al. (2016). Convolutional pose machines. In
CVPR.
Zadeh, A. et al. (2018a). Multimodal sentiment analysis of
videos: Facial expressions, text, and audio. In Pro-
ceedings of the 2018 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP), pages
5006–5015. Association for Computational Linguis-
tics.
Zadeh, A. et al. (2018b). Multimodal sentiment analysis of
videos: Facial expressions, text, and audio. In Pro-
ceedings of the 2018 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP), pages
5006–5015. Association for Computational Linguis-
tics.
Zhao, W. et al. (2024). Open-pose 3d zero-shot learning:
Benchmark and challenges.
Generative AI for Human 3D Body Emotions: A Dataset and Baseline Methods
653