2 RELATED WORK
Various works address the subject of supporting el-
derly people in their home environment. The assis-
tance concepts are closely related to the topic of AAL
(Ambient Assisted Living). Their unobtrusive inte-
gration into the living environment is one of the most
important requirement for AAL systems.
Clement et al. detected ADLs (Activities of Daily
Living) with the help of ’Smartmeters’, which mea-
sure the energy consumption of household devices
(Clement et al., 2013). A Semi-Markov model was
trained in order to construct behaviour profiles of
persons and to draw conclusions about their state of
health. Kalfhues et al. analysed a person’s behaviour
by means of several sensors integrated in a flat, e. g.
motion detectors, contact sensors and pressure sen-
sors (Kalfhues et al., 2012). Link et al. employed
optical stereo sensors to discern emergencies, i. e.
falls and predefined emergency gestures (Link et al.,
2013). Chronological sequences of the height of the
body centre and the angle between the main body axis
and the floor were analysed. Belbachier et al., who
also applied stereo sensors to detect falls (Belbachir
et al., 2012), used a neural network-based approach to
classify the fall event. The major advantage of optical
sensors is their easy integration into a flat. A consid-
erable amount of additional information can be ob-
tained by applying image processing algorithms, es-
pecially in connection with RGB-D sensors, which
deliver red, blue and green channel images as well
as depth information. Therefore, we decided to use
a stereo camera in our study. Although other sensors
that provide RGB-D data, such as the Kinect, could
also be installed in a flat, they show features that have
proved to be disadvantageous with regard to the appli-
cation field of AAL: Firstly, if the Kinect is mounted
at the ceiling, the range and the field of view do not
cover the complete room. It would be necessary to in-
tegrate several Kinect sensors at different places in a
flat, which is hardly applicable. Secondly, the resolu-
tion is not sufficient enough for the recognition of ob-
jects that are far away from the sensor. When, thirdly,
several Kinects are installed for better coverage of the
room, they are apt to influence each other, due to their
active technique for determining depth information.
Consequently, although the Kinect is highly perfor-
mant for a variety of applications, we considered this
sensor as unsuitable for AAL purposes.
The approaches listed above either address ADL
detection or emergency scenarios. In the context of
assessing the health status of persons, several former
projects have focused especially on the analysis of
mobility. Scanaill et al. employed body-worn sensors
for mobility telemonitoring (Scanaill et al., 2006).
However, this type of sensor unsuitable for demented
persons, as this group tends to forget to put them on
or puts them off intentionally. In the work of Steen et
al., another way of measuring mobility was presented
(Steen et al., 2013). In first field tests, several partici-
pants’ flats were equipped with laser scanners, motion
detectors and contact sensors. By means of these sen-
sors, the persons could be localised within their flats.
Apart from this, the traversing time between the sen-
sors as well as walking speeds were computed. These
field tests gave evidence that the evaluation of sensor
data allows conclusions about mobility.
In addition to a person’s location and the move-
ments, we think that the pose, i. e. standing, sitting
and lying, provides also an indication of a person’s
mobility. We therefore introduce a pose estimation
algorithm, which detects the pose of a person within
the area observed by a single stereo camera.
There is a variety of pose estimation algorithms
that use optical sensors. They differ, for exam-
ple, with respect to such parameters as camera type
(mono, stereo), inclusion of temporal information and
utilisation of explicit human models. Ning et al.
discerned the human pose using a single monocu-
lar image (Ning et al., 2008). By modifying a bag-
of-words approach, they were able to increase the
discriminative power of features. They also intro-
duced a selective and invariant local descriptor, which
does not require background subtraction. The poses
walking, boxing and jogging could be classified af-
ter supervised learning. Agarwal et al. determined
the pose from monocular silhouettes by regression
(Agarwal and Triggs, 2006) and thus needed nei-
ther a body model nor labelled body parts. Along
with spatial configurations of body parts, Ferrari et
al. additionally considered the temporal information
in their study (Ferrari et al., 2008). Haritaoglu et
al. employed an overhead stereo camera in order to
recognize the ’pick’ movement of customers while
shopping (Haritaoglu et al., 2002). In this study, a
three dimensional silhouette was computed by back-
projecting image points to their corresponding world
points by the use of depth information and calibration
parameters. The persons’ localizations were found at
regions with significant peaks in the occupancy map.
The pose is determined by calculating shape features
instead of using an explicit model. Other approaches
applied the Kinect sensor. Their results proved that
the Kinect, when suitable for the particular applica-
tion, leads to results of high quality. Ye et al. esti-
mated the pose from a single depth map of the Kinect
(Ye et al., 2011). They then compared this map with
mesh models from a database. In a first step, a simi-
MobilityAssessmentofDementedPeopleUsingPoseEstimationandMovementDetection-AnExperimentalStudyinthe
FieldofAmbientAssistedLiving
23