termine the important features of the expert motions
that are then used to evaluate the performance of a
new player.
Several authors have worked on this automatic ex-
traction of the relevant features of motions. It is in-
deed a prerequisite on other domains such as motion
recognition or motion retrieval in which these features
are both used 1) to group set of motions into cate-
gories of actions and 2) to differentiate these groups
of actions. For the first case, some authors have pro-
posed to identify common geometrical patterns of the
motions: by partitioning the 3D space with Cartesian
patches (Wang et al., 2012) or angular ones (Xia et al.,
2012), by simplifying the joints trajectories with lin-
ear regressions (Barnachon et al., 2013) or by using
pentagonal areas to represent the postures (Sakurai
et al., 2014). Some authors also worked on the re-
lation between the position of a joint relatively to a
plan defined by 3 other joints to give a semantic and
intuitive evaluation of the performed motion (Röder,
2006; Müller et al., 2005; Müller and Röder, 2006).
Finally, several authors tried to define morphology-
independent features to manage the morphology vari-
ability by normalizing the posture representation and
by extension the motion (Sie et al., 2014; Kulpa et al.,
2005; Shin et al., 2001). The goal of these studies
was to identify the similarity of motions while our is
to evaluate the difference between a motion and the
reference ones performed by experts. The motions
are thus supposed to be similar and our objective is
to quantify the errors between them and not to try to
ignore these small differences. For the second case,
some authors have computed the variance (Ofli et al.,
2012) or the entropy (Pazhoumand-Dar et al., 2015)
of each joint to discriminate the most informativefea-
tures characterizing the motion. The problem of such
approaches is that they lost some of the temporal in-
formation of the motion.
This temporal information is yet essential to eval-
uate motions and especially sports ones. The tempo-
rality of a movement is important for dance of course
but it also concerns all kinds of motions since the syn-
chronization of the limbs or the sequence of body mo-
tions are the key factors of a good technique and thus a
good performance. The temporal information is thus
essential at a global level but above all at the joints
level, highlighting the relative timing of the differ-
ent body parts of the player. Maes et al. proposed
to evaluate and train the basics of dance steps (Maes
et al., 2012). Since they considered that the dance
steps were very rhythmic, they based their analysis on
the music tempo of the dance. This case is however
very specific and only manages the synchronization
of the motion with an external and global tempo. To
take local synchronization into account, the temporal-
ity must be evaluated even when motions have differ-
ent lengths, different speeds and/or different rhythms.
To this end, some authors proposed to use Hidden
Markov Models (HMM) or Hidden Conditional Ran-
dom Field (HCRF) to encode time series as piecewise
stationary processes (Zhong and Ghosh, 2002; Kahol
et al., 2004; Sorel et al., 2013; Wang et al., 2006). In
our context, the time-varying features are trajectories
and are modeled as a state automaton in which each
state stands for a range of possible observation val-
ues of the feature while the transitions between states
can model time. The feature observation values and
the transitions between states are driven by probabili-
ties, which makes HMM very robust to spatiotempo-
ral variations. However, this approach gathers similar
postures together in a same state and the temporality
is only managed between these states that can repre-
sent a large part of the motion if at a period a joint
does not move a lot for instance.
To generically evaluate the synchrony of two mo-
tions, we need a more accurate method such as the
Dynamic Time Warping (Sakoe and Chiba, 1978).
Originally created for speech processing, DTW has
become a well-established method to account for tem-
poral variations in the comparison of related time se-
ries. Many studies havetried to upgrade the efficiency
of the DTW algorithm over the recent years depend-
ing on its application’s context (Keogh and Pazzani,
2001; Zhou and De La Torre Frade, 2009; Zhou and
de la Torre, 2015; Heloir et al., 2006; Gong et al.,
2014). In motion retrieval, DTW has then been used
by several authors to align the motion with some fea-
tures to determine the movement performed. Saku-
rai et al. for instance tried to evaluate a motion cap-
tured with the Microsoft Kinect by using pentagonal
areas defined by the body end-effector (Sakurai et al.,
2014). Pham et al. tried to compare surgery motions
by aligning trajectories of 3D sensors (Pham et al.,
2010). The problem of these studies is that the mo-
tion is simplified to manage the temporality and the
joint information are not preserved. In our approach,
we want to take temporal and spatial information into
account concurrently.
In this paper, we propose an efficient and au-
tomatic morphology-independent method based on
DTW to compare a motion performed by a player to a
database of experts’ motions in order to evaluate con-
currently the spatial and temporal relevant informa-
tion of the motion.