motion capture files in BVH format and classify it
according to labels defined by a simplified ontology
composed of 9 action tags.
Our approach considered a separate data set of ca-
refully edited mocap files for training the network on
how to recognize each action. These data sets were
adapted from the freely available CMU Motion Cap-
ture Library. After training, the network was tested
using a different set of files that did not have been
used during training. For the sake of assessment, each
of these files were manually annotated with the ex-
pected label.
Comparing the results obtained by the classifica-
tion software against the expected manually annota-
ted tags, the system showed, in several of the tests, an
accuracy in some cases better than 95%, what can in-
dicate that the original hypothesis have been satisfied.
For the future, it’s expected to improve training
by adding other actions to the ontology like, for in-
stance, considering affective body postures and/or ot-
her kinds of medias that might be of interest in a pro-
duction pipeline like textures, sounds, etc.
The ultimate goal would be to design an extenda-
ble modular content annotator capable of annotating
with different types of medias, based on a general-
purpose ontology.
Another possible application that might gain from
this automatic motion capture action recognition
technology is authoring character animations for the
purpose of retargeting crowd behaviors to different
scenarios. In theory, such an AI system could help
understanding each character’s movements in a given
situation and then help adapting the animations to new
target scenarios, and facilitate authoring crowd simu-
lation.
ACKNOWLEDGMENT
This publication has emanated from research sup-
ported in part by a research grant from Science
Foundation Ireland (SFI) under the Grant Number
15/RP/2776 and in part by the European Unions Hori-
zon 2020 Research and Innovation Programme under
Grant Agreement No 780470.
REFERENCES
Brownlee, J. (2018). Long Short-Term Memory Networks
with Python - Develop Sequence Prediction Models
With Deep Learning. Machine Learning Mastery.
[eBook].
B
¨
utepage, J., Black, M. J., Kragic, D., and Kjellstr
¨
om,
H. (2017). Deep representation learning for hu-
man motion prediction and classification. CoRR,
abs/1702.07486. http://arxiv.org/abs/1702.07486.
CMU Graphics Lab (2018). CMU graphics lab motion cap-
ture database. http://mocap.cs.cmu.edu/.
Delbridge, M. (2015). Motion Capture in Performance -
An Introduction. Palgrave Macmillan UK, first edition
edition.
Du, Y., Wong, Y., Liu, Y., Han, F., Gui, Y., Wang, Z.,
Kankanhalli, M., and Geng, W. (2016). Marker-less
3D human motion capture with monocular image se-
quence and height-maps. In European Conference on
Computer Vision, pages 20–36. Springer.
Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani,
R., and Cucchiara, R. (2018). Learning to detect and
track visible and occluded body joints in a virtual
world. CoRR, abs/1803.08319. http://arxiv.org/abs/
1803.08319.
Gupta, A., Martinez, J., Little, J. J., and Woodham, R. J.
(2014). 3d pose from motion for cross-view action re-
cognition via non-linear circulant temporal encoding.
In 2014 IEEE Conference on Computer Vision and
Pattern Recognition, pages 2601–2608.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-
term memory. Neural Computation, 9(8):1735–1780.
D.O.I.: 10.1162/neco.1997.9.8.1735, https://doi.org/
10.1162/neco.1997.9.8.1735.
Kleinsmith, A., Bianchi-Berthouze, N., and Steed, A.
(2011). Automatic recognition of non-acted affective
postures. Trans. Sys. Man Cyber. Part B, 41(4):1027–
1038. D.O.I.: 10.1109/TSMCB.2010.2103557.
Martinez, J., Black, M. J., and Romero, J. (2017). On
human motion prediction using recurrent neural net-
works. CoRR, abs/1705.02445. http://arxiv.org/abs/
1705.02445.
Menache, A. (2011). Understanding Motion Capture for
Computer Animation. Morgan Kaufmann, second edi-
tion edition.
SAUCE Project (2018). Smart asset re-use in creative envi-
ronments - SAUCE. http://www.sauceproject.eu.
Tekin, B., Rozantsev, A., Lepetit, V., and Fua, P. (2016). Di-
rect prediction of 3d body poses from motion compen-
sated sequences. In 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
991–1000.
Toshev, A. and Szegedy, C. (2014). Deeppose: Human pose
estimation via deep neural networks. In The IEEE
Conference on Computer Vision and Pattern Recog-
nition (CVPR).
Using LSTM for Automatic Classification of Human Motion Capture Data
243