successfully, videos showing the characteristics of
the particular dinosaur are shown. This enables
learners to learn about the kind of dinosaur to which
the excavated fossils belong.
"BELONG" is intended to be implemented in a
museum. In the case of limited funds and space as in
a museum, a subsystem that can recognize body
movements with low cost and space saving is
required.
We therefore created a sub-system to achieve
these objectives. The sensors used in this sub-system
and their functions are described below.
2.2.1 Kinect V2
Recognizing human body movements requires us to
use a sensor to recognize these movements as well
as the position of a person. There are a number of
sensors that could be used for this task. For example,
there are motion capture sensors and also a three-
dimensional range image sensor.
A motion capture sensor is expensive and
requires a large space for installation. First, we
would need to surround the space we want to
recognize with multiple cameras. Next, we would
have to calibrate the space. The next step would
involve attaching a marker that reflects infrared rays
in the wavelength range from 3 [mm] to 200 [mm]
to a person, thus recognizing the person’s body
movement and position. However, the sensor is
expensive and requires a wide space for installation,
neither is it suitable for implementation in a museum.
The Kinect v1 sensor of a three-dimensional
distance image sensor is inexpensive and it can be
mounted simply by installing a single sensor.
Moreover, a person’s position can be estimated by
the depth sensor, and a total of 20 measurements
relating to skeletal information are possible. This
information is then used to calculate the trajectory of
the skeleton. Thus, the system can be developed
based on this information.
(Example of system) Suppose you raise your hand
when HAND - RIGHT is above HEAD.
This is an example of a condition for recognizing the
simple body movement of raising a hand. Although,
as mentioned above, simple bodily movements can
easily be recognized, it is difficult to recognize
complex body movements when the only
information that is available is that relating to the 20
skeletal points. Moreover, it is time consuming to
decide the conditions for implementation. Thus, the
Kinect v1 sensor is not suitable for recognition of
complicated body movements of learners who are
the most important part of the immersive learning
support system "BELONG."
The Kinect v2 sensor of the three-dimensional range
image sensor is as inexpensive as Kinect v1, and it
can be mounted by simply installing one sensor.
Moreover, the position of a person can be estimated
by the depth sensor, and information relating to a
total of 25 skeletal points can be measured (Figure
3). Therefore, by recording this skeletal information
and by performing machine learning, it is possible to
easily create a discriminator of gesture recognition
by using the Kinect v2 sensor (Shotton, J., 2011).
This discriminator can detect various body
movements upon request. Hence, we used the Kinect
v2 sensor to enable the sub-system to recognize
complicated body movements, because it is
inexpensive, space saving, and offers fast
implementation.
HEAD
HAND_LEFT
HAND_RIGHT
THUMB_LEFT
THUMB_RIGHT
HAND_TIP_RIGHT
HAND_TIP_LEFT
WRIST_RIGHT
WRIST_LEFT
ELBOW_LEFT
ELBOW_RIGHT
NECK
SHOULDER_LEFT
SHOULDER_RIGHT
SPINE_SHOULDER
SPINE_MID
SPINE_BASE
FOOT_RIGHT
FOOT_LEFT
ANKLE_LEFT
ANKLE_RIGHT
KNEE_RIGHT
KNEE_LEFT
HIP_LEFT
HIP_RIGHT
Figure 3: Information about the skeleton.
2.2.2 Gesture Recognition
The above-mentioned Kinect v2 sensor, which is
used to recognize complicated body movements,
provides the tools Kinect Studio and Visual Gesture
Builder. These tools are used to perform
complicated body movements recognition. Kinect
Studio can record position information and motion
information of a person's body, as acquired with the
Kinect v2 sensor. Visual Gesture Builder is a tool
that can create a discriminator for recognizing a
gesture using machine learning based on information
about the position and motion of a person`s body.
Kinect Studio functions as follows. Three-
dimensional information about the position and
motion of a person's body is acquired during
arbitrary body movement, as shown in Figure 4.
This three-dimensional information comprises the
coordinates of a total of 25 skeletal points shown in
Figure 3. When recording is started, these three-
dimensional coordinates are automatically recorded