
rics. We have also demonstrated that inclusion of
the proposed AdaIN layers improves the performance
of other pose-centric methods such as Action2Motion
(Guo et al., 2020) and (Wen et al., 2021) regardless
of the underlying architecture employed. Therefore, a
key strength of our proposed solution is the versatility
of the AdaIN layers and the potential to be included
in other pose-centric motion synthesis architectures.
In future work, we plan to further investigate the
potential of the AdaIN in pose-centric motion synthe-
sis, and especially focus on guided methods that will
allow the interaction of virtual characters with other
elements in computer games.
REFERENCES
Cai, H., Bai, C., Tai, Y.-W., and Tang, C.-K. (2017). Deep
video generation, prediction and completion of human
action sequences. ECCV.
Dabral, R., Mughal, M. H., and Vladislav Golyanik, C. T.
(2023). Mofusion: A framework for denoising-
diffusion-based motion synthesis. CVPR.
Goodfellow, I. J., Pouget-Abadie, J., Xu, M. M. B., Warde-
Farley, D., Ozair, S., Courville, A., and Bengio, Y.
(2014). Generative adversarial networks. neurips.
Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A.,
Gong, M., and Cheng, L. (2020). Action2motion:
Conditioned generation of 3d human motions. Pro-
ceedings of the 28th ACM International Conference
on Multimedia.
Hassan, M., Ceylan, D., Villegas, R., Saito, J., Yang, J.,
Zhou, Y., and Black, M. (2021). Stochastic scene-
aware motion prediction. ICCV.
Heusel, M., Unterthiner, H. R. T., Nessler, B., and Hochre-
iter, S. (2017). Gans trained by a two time-scale up-
date rule converge to a local nash equilibrium.
Huang, X. and Belongie, S. (2017). Arbitrary style trans-
fer in real-time with adaptive instance normalization.
2017 IEEE International Conference on Computer Vi-
sion (ICCV).
Kania, K., Kowalski, M., and Trzci
´
nski, T. (2021). Trajevae:
Controllable human motion generation from trajecto-
ries. arXiv preprint arXiv:2104.00351.
Li, P., Aberman, K., Zhang, Z., Hanocka, R., and Sorkine-
Hornung, O. (2022). Ganimator: Neural motion syn-
thesis from a single sequence. ACM Transactions on
Graphics.
Luo, Y., Soeseno, J. H., Chen, T. P., and Chen, W.
(2020). CARL: controllable agent with reinforce-
ment learning for quadruped locomotion. CoRR,
abs/2005.03288.
Mao, Y., Liu, X., Zhou, W., Lu, Z., and Li, H. (2024).
Learning generalizable human motion generator with
reinforcement learning.
Mourot, L., Hoyet, L., Le Clerc, F., Schnitzler, F., and Hel-
lier, P. (2021). A survey on deep learning for skeleton-
based human animation. Computer Graphics Forum,
41(1):122–157.
Petrovich, M., Black, M., and Varol, G. (2021). Action-
conditioned 3d human motion synthesis with trans-
former vae. ICCV.
PyTorch (2024). torch.nn.instancenorm1d. Accessed:
2024-06-27.
Raab, S., Leibovitch, I., Li, P., Popa, T., Aberman, K., and
Sorkine-Hornung, O. (2024). Modi: Unconditional
motion synthesis from diverse data. CVPR.
Razghandi, M., Zhou, H., Erol-Kantarci, M., and Turgut, D.
(2022). Variational autoencoder generative adversarial
network for synthetic data generation in smart home.
Ruder, M., Dosovitskiy, A., and Brox, T. (2016). Artistic
Style Transfer for Videos, page 26–36. Springer Inter-
national Publishing.
Shlizerman, E., Dery, L. M., Schoen, H., and Kemelmacher-
Shlizerman, I. (2017). Audio to body dynamics.
CVPR, IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition.
Tevet, G., Raab, S., Gordon, B., Cohen-Or, Y. S. D., and
Bermano, A. H. (2022). Human motion diffusion
model. ICLR.
Tulyakov, S., Liu, M.-Y., Yang, X., and Kautz, J. (2017).
Mocogan: Decomposing motion and content for video
generation. CVPR.
Ulyanov, D. and Vedaldi, A. (2017). Improved texture
networks: Maximizing quality and diversity in feed-
forward stylization and texture synthesis.
University, C. M. (2000). Cmu graphics lab motion cap-
ture database. https://mocap.cs.cmu.edu/. Accessed:
2024-09-15.
Wang, J., Wen, C., Fu, Y., Lin, H., Zou, T., Xue, X., and
Zhang, Y. (2020). Neural pose transfer by spatially
adaptive instance normalization. CVPR.
Wen, Y.-H., Yang, Z., Fu, H., Gao, L., Sun, Y., and Liu, Y.-J.
(2021). Autoregressive stylized motion synthesis with
generative flow.
Xu, L., Song, Z., Wang, D., Su, J., Fang, Z., Ding, C., Gan,
W., Jin, Y. Y. X., Yang, X., Zeng, W., and Wu, W.
(2022). Actformer: A gan-based transformer towards
general action-conditioned 3d human motion genera-
tion. ICCV.
Zhang, M., Guo, X., Pan, L., Cai, Z., Hong, F., Li, H.,
Yang, L., and Liu, Z. (2023). Remodiffuse: Retrieval-
augmented motion diffusion model.
Pose-Centric Motion Synthesis Through Adaptive Instance Normalization
47