
equipment, practiced the dance. The system gave in-
stant feedback to them. It is important to note that
only the student’s gesture was displayed on the screen
and not the expert’s one. In addition, the pedagogi-
cal feedback was displayed through highlighted body
parts depending on the executed gesture. It was not
the case for Oagaz et al.’s work for table tennis, where
the student and expert gestures were simultaneously
shown while the evaluation was running (Oagaz et al.,
2022). The learners can observe and imitate the ges-
ture performed by the expert’s 3D avatar, while the
learner’s gesture was evaluated by comparing the tilt-
ing of different body joints (elbow, wrist, knees, etc.)
displayed on the second 3D avatar.
A captured gesture replayed in a VLE may origi-
nate from a teacher or an expert (Esmaeili et al., 2017;
Nawahdah and Inoue, 2013), a learner (Zhao, 2022)
or both (Liu et al., 2020; Chen et al., 2019; Oagaz
et al., 2022). Depending on which one is replayed
between the expert or the learner, the VLE design ob-
jectives and main functionalities may vary. Replaying
the teacher’s motions often aims at following the imi-
tation learning method. In this case, the 3D avatar can
be: (a) placed in front of them or (b) observed from
any viewpoint by navigating in the VLE or moving
the expert’s 3D avatar (Esmaeili et al., 2017; Nawah-
dah and Inoue, 2013; Wu et al., 2020). Replaying
the student’s gesture is also often linked to an auto-
matic evaluation process, with the feedback displayed
on the student’s 3D avatar (Zhao, 2022). Finally,
the combination of both displays allows combining
observation and evaluation (Liu et al., 2020; Oagaz
et al., 2022). Replaying a captured motion in a VLE
can also mean that player-type controls (play, pause,
decreasing speed, etc.) are available to the learner.
However, not all works precisely describe whether
those kinds of interactions are available or not (Es-
maeili et al., 2017). Nevertheless, in works indicating
the available functionalities, one can note the play and
pause options (Oagaz et al., 2022; Rho et al., 2020), or
replaying the gesture from the beginning (Chen et al.,
2019; Rho et al., 2020). Finally, other more advanced
options such as fast-forward, rewind, or speed control
are rarer (Liu et al., 2020).
With the possibility of displaying a 3D avatar
demonstrating the gesture, the question ”how the
avatar should be observed” emerge, and that question
is answered at different levels. The first one is by giv-
ing one static and fixed viewpoint to the learner (Chen
et al., 2019). A second method allows users to freely
move in VLE or around the expert’s 3D avatar to ob-
serve the replayed gesture from any view angle. This
allows the student to visualize and acquire more in-
formation from the 3D avatar compared to a single
fixed point (Liu et al., 2020). However, students may
not know the most appropriate viewpoint if existing.
Therefore, in order to guide the learner more effec-
tively, the VLE can provide specific and predefined
viewpoints for a better observation and understanding
of the gesture. Esmaeili et al. (2017) implemented
floor squares at locations defined by the expert, where
the learner can observe more effectively some specific
parts of the gesture.
Defining appropriate viewpoints can be tedious.
Given the complexity of the gesture, a large num-
ber of viewpoints must be defined. In addition, the
number and location of these points may differ de-
pending on the gesture. Some gestures may require
more points than others, with different positions and
orientations, particularly in a VLE using a VR head-
set. Moreover, the definition of an appropriate ob-
servation point can differ between experts. An ex-
pert can use the VLE to place the points themselves
in an empirical way. Consequently, this raises the
question of the automatic generation of viewpoints,
especially if one wants to expand the scope of VLE to
include other gestures. Mamoun Nawahdaha and In-
oue (2013) proposed a system where the learner was
static. The position and orientation of the expert’s 3D
avatar around the learner changed, based on the ges-
ture made at each moment, for example, depending
on the arm used for the task. Based on a survey and
experiments coupled to the expert’s captured gesture,
their work allowed achieving an ideal placement of
the 3D avatar, according to the expert’s used hand dur-
ing the demonstration and its position to enhance the
learning. However, to our knowledge, no past works
cover the automatic generation of observation points
in a VLE around the expert’s 3D avatar.
In the context of this study, there are three over-
looked aspects. The first one is related to the ac-
quisition of information when observing a 3D avatar.
Few articles address the optimal configurations to bet-
ter perceive the information when observing a ges-
ture. Next, the analysis of all the works highlights
the absence of a detailed and complete description of
the architecture of the whole system, from the cap-
ture of the expert movement to the building of an
appropriate and interactive VPR. Finally, the com-
parison between appropriate observation-based VLE
and other resources (book and video, for example) in
terms of perception of information linked to gesture-
based skills has not been enough studied.
Based on current state-of-the-art, the presented
work relies on the following research question:
• How to design Virtual Pedagogical Resources
dedicated to gesture learning from captured move-
ments, that maximize the learner’s perception of
CSEDU 2024 - 16th International Conference on Computer Supported Education
430