Fall Detection using Ceiling-mounted 3D Depth Camera

Michal Kepski

and Bogdan Kwolek

AGH University of Science and Technology, 30 Mickiewicza Av., 30-059 Krakow, Poland

University of Rzeszow, 16c Rejtana Av., 35-959 Rzesz´ow, Poland

Keywords:

Video Surveillance and Event Detection, Event and Human Activity Recognition.

Abstract:

This paper proposes an algorithm for fall detection using a ceiling-mounted 3D depth camera. The lying pose

is separated from common daily activities by a k-NN classiﬁer, which was trained on features expressing head-

ﬂoor distance, person area and shape’s major length to width. In order to distinguish between intentional lying

postures and accidental falls the algorithm also employs motion between static postures. The experimental

validation of the algorithm was conducted on realistic depth image sequences of daily activities and simulated

falls. It was evaluated on more than 45000 depth images and gave 0% error. To reduce the processing overload

an accelerometer was used to indicate the potential impact of the person and to start an analysis of depth

images.

1 INTRODUCTION

The aim of behavior recognition is an automated anal-

ysis (or interpretation) of ongoing events and their

context from the observations. Behavior understand-

ing aims at analyzing and recognizing motion patterns

in order to produce high-level description of actions.

Currently, human behavior understanding is becom-

ing one of the most active and extensive research top-

ics of artiﬁcial intelligence and cognitive sciences.

The strong interest is driven by broad spectrum of

applications in several areas such as visual surveil-

lance, human-machine-interaction and augmented re-

ality. However, this problem is very difﬁcult be-

cause of wide range of activities, which can occur

in any given context and the considerable variabil-

ity within particular activities, taking place at differ-

ent time scales. Despite its difﬁculty, there has been

made a considerable progress in developing methods

and models to improve understanding of human be-

havior. The current research concentrates more on

natural settings and moves from generic techniques

to speciﬁc scenarios and applications (Pantic et al.,

2006; Weinland et al., 2011).

One aspect of human behavior understanding is

recognition and monitoring of activities of daily liv-

ing (ADLs). Several methods were proposed to dis-

tinguish between activities of daily living and falls

(Noury et al., 2008). Falls are a major health risk

and a signiﬁcant obstacle to independent living of the

seniors (Marshall et al., 2005) and therefore signiﬁ-

cant effort has been devoted to ensuring user-friendly

assistive devices (Mubashir et al., 2013). However,

despite many efforts made to obtain reliable and un-

obtrusive fall detection, current technology does not

meet the seniors’ needs. One of the main reasons

for non-acceptance of the currently available technol-

ogy by elderly is that the existing devices generate

too much false alarms. This means that some daily

activities are erroneously reported as falls, which in

turn leads to considerable frustration of the seniors.

Additionally, the existing devices do not preserve the

privacy and unobtrusiveness adequately.

Most of the currently available techniques for fall

detection are based on wearable sensors. Accelerom-

eters or both accelerometers and gyroscopes are the

most frequently used sensors in wearable devices for

fall monitoring (Noury et al., 2007). However, on

the basis of inertial sensors it is hard to separate real

falls from fall-like activities (Bourke et al., 2007).

The reason is that the characteristic motion patterns

of fall also exist in many ADLs. For instance, the

squat or crouch also demonstrate a rapid downward

motion and in consequence the devices that are only

built on inertial sensors frequently trigger false alarms

for such simple action. Thus, a lot of research was

devoted to detecting of falls using various sensors.

Mubashir et al. (Mubashir et al., 2013) done a sur-

vey of methods used in the existing fall detection sys-

tems. Single CCD camera (Rougier et al., 2006),

640

Kepski M. and Kwolek B..

Fall Detection using Ceiling-mounted 3D Depth Camera.

DOI: 10.5220/0004742406400647

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 640-647

ISBN: 978-989-758-004-8

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

multiple cameras (Aghajan et al., 2008), specialized

omni-directional ones (Miaou et al., 2006) and stereo-

pair cameras (Jansen and Deklerck, 2006) were inves-

tigated in vision systems for fall detection. However,

the proposed solutions require time for installation,

camera calibration and are not cheap.

Recently, Kinect sensor was used in prototype sys-

tems for fall detection (Kepski and Kwolek, 2012;

Mastorakis and Makris, 2012). It is the world’s ﬁrst

low-cost device that combines an RGB camera and a

depth sensor. Unlike 2D cameras, it allows 3D track-

ing of the body movements. Thus, if only depth im-

ages are used it preserves the person’s privacy. Be-

cause depth images are extracted with the support of

an active light source, they are largely independent

of external light conditions. Thanks to the use of the

infrared light the Kinect is capable of extracting the

depth images in dark rooms.

In this work we demonstrate how to achieve re-

liable fall detection with low computational cost and

very low level of false positive alarms. A body-worn

tri-axial accelerometer is utilized to indicate a po-

tential fall (impact shock). A fall hypothesis is then

veriﬁed using a ceiling-mounted RGBD camera. We

show that the use of such a camera leads to many ad-

vantages. For ceiling-mounted RGBD cameras the

fully automatic procedure for extraction of the ground

plane (Kepski and Kwolek, 2013) can be further sim-

pliﬁed as the ground can be determined easily on the

basis of farthest subset of points cloud. Moreover, we

show that the distance to the ground of the topmost

points of the subject undergoing monitoring (head-

ﬂoor distance) allows us to ignore a lot of false pos-

itives alarms, which would be generated if only the

accelerometer is used. We show how to extract infor-

mative attributes of the human movement, which al-

lows us to achieve very high fall detection ratio with

very small ratio of false positives. To preserve the pri-

vacy of the user the method utilizes only depth images

acquired by the Kinect device.

2 FALL DETECTION USING

SINGLE INERTIAL SENSOR

A lot of different approaches have been investigated

to achieve reliable fall detection using inertial sen-

sors (Bourke et al., 2007). Usually, a single body-

worn sensor (tri-axial accelerometer or tri-axial gyro-

scope, or both embedded in an inertial measurement

unit) is used to indicate person fall. The most com-

mon method consists in using a tri-axial accelerome-

ter. At any point in time, the output of the accelerom-

eter is a linear combination of two components, that

is the acceleration component due to gravity and the

acceleration component due to bodily motion. The

ﬁrst approach to fall detection using accelerometry

was done by Williams et al. (Williams et al., 1998).

The accelerometer-based algorithms simply raise the

alarm when a certain threshold value of the accelera-

tion is reached. In practice, there exist many problems

inherently connected with such kind of algorithms, in-

cluding lacking of adaptability.

After accidental fall, the individual’s body is usu-

ally in a different orientation than before the fall, and

in consequence the resting acceleration in three axes

is different from the acceleration in the pose before

the fall. In (Chen et al., 2005) an accelerometer-

based algorithm, relying on change in body orienta-

tion has been proposed. If the root sum vector of the

three squared accelerometer outputs exceeds a certain

threshold, it signals a potential fall. After detecting

the impact, the orientation change is calculated over

one second before the ﬁrst impact and two seconds

after the last impact using the dot product of the ac-

celeration vectors. The angle change can be set ar-

bitrarily based on empirical data, as suggested by the

authors. However, despite many efforts to improve

this algorithm, it does not provide sufﬁcient discrimi-

nation between real-world falls and ADLs.

In this work we assume that a fall took place if the

signal upper peak value (UPV) from the accelerome-

ter is greater than 2.5g. We investigated also the use-

fulness of the orientation change for different thresh-

olds. However, the potential of this attribute seems

to be small, particularly in actions consisting in low-

ering the body closer to the ground by bending the

knees, for instance while taking or putting the ob-

ject from the ﬂoor. A review of the relevant literature

demonstrates that for a single inertial device the most

valuable information can be obtained if the device is

attached near the centre of mass of the subject. There-

fore, the accelerometer was attached near the spine on

the lower back using an elastic belt around the waist.

Compared to vision-based motion analysis sys-

tems, wearable sensors offer several advantages, par-

ticularly in terms of cost, ease of use and, most impor-

tantly, portability. Currently available smartphones

serve not only as communication and computing de-

vices, but they also come with a rich set of embedded

sensors, such as an accelerometer, gyroscope and dig-

ital compass. Therefore, they were used in prototype

systems for fall detection. However, despite many ad-

vantages, the inertial sensors-based technology does

not meet the seniors’ needs, because some activities

of daily living are erroneously reported as falls.

FallDetectionusingCeiling-mounted3DDepthCamera

641

3 FALL DETECTION USING

CEILING-MOUNTED RGBD

CAMERA

Various types of cameras were used in vision sys-

tems for fall detection (Rougier et al., 2006; Agha-

jan et al., 2008; Miaou et al., 2006; Jansen and Dek-

lerck, 2006). However, the video technology poses a

major problem of acceptance by seniors as it requires

the placement of the cameras in private living quar-

ters, and especially in the bedroom and the bathroom,

with consequent concerns about privacy. The exist-

ing systems require time for installation, camera cal-

ibration and are not cheap. Recently, Kinect sensor

was demonstrated to be very useful in fall detection

(Kepski and Kwolek, 2012; Mastorakis and Makris,

2012). In (Kepski and Kwolek, 2013) we demon-

strated an automatic method for fall detection using

Kinect sensor. The method utilizes only depth im-

ages acquired by a single device, which is placed at

height about 1 m from the ﬂoor. It determines the

parameters of the ground plane equation and then it

calculates the distance between the person’s centroid

to the ground. In contrast, in this work we employ

a ceiling-mounted Kinect sensor. In subsequent sub-

sections we demonstrate that such a placement of the

sensor has advantages and can lead to simpliﬁcation

of the algorithms devoted to distinguishing the acci-

dental falls from ADLs.

3.1 Person Detection in Depth Images

Depth is very useful cue to attain reliable person de-

tection since humans may not have consistent color

and texture but have to occupy an integrated region

in space. Kinect combines structured light with two

classic computer vision techniques, namely depth

from focus and depth from stereo. It is equipped with

infrared laser-based IR emitter, an infrared camera

and a RGB camera. The IR camera and the IR projec-

tor compose a stereo pair with a baseline of approxi-

mately 75 mm. A known pattern of dots is projected

from the IR laser emitter. These specs are captured

by the IR camera and compared to the known pattern.

Since there is a distance between laser and sensor, the

images correspond to different camera positions, and

that in turn allows us to use stereo triangulation to cal-

culate each spec depth. The ﬁeld of view is 57

◦

hor-

izontally and 43

◦

vertically, the minimum measure-

ment range is about 0.6 m, whereas the maximum

range is somewhere between 4-5 m. It captures the

depth and color images simultaneously at a frame rate

of about 30 fps. The RGB stream has size 640× 480

and 8-bit for each channel, whereas the depth stream

is 640× 480 resolution and with 11-bit depth.

The software called NITE from PrimeSense of-

fers skeleton tracking on the basis of RGBD images.

However, this software is targeted for supporting the

human-computerinteraction, and not for detecting the

person fall. In particular, it was developed to extract

and to track persons in front of the Kinect. There-

fore, we employ a person detection method (Kepski

and Kwolek, 2013), which with low computational

cost extracts the individual on images acquired by

a ceiling-mounted Kinect. Another rationale for us-

ing such a method is that it can delineate the per-

son in real-time on PandaBoard (Kepski and Kwolek,

2013), which is a low-power, low-cost single-board

computer development platform for mobile applica-

tions.

The person was delineated on the basis of a scene

reference image, which was extracted in advance and

then updated on-line. In the depth reference image

each pixel assumes the median value of several pix-

els values from the earlier images. In the setup phase

we collect a number of the depth images, and for each

pixel we assemble a list of the pixel values from the

former images, which is then sorted in order to ex-

tract the median. Given the sorted lists of pixels the

depth reference image can be updated quickly by re-

moving the oldest pixels and updating the sorted lists

with the pixels from the current depth image and then

extracting the median value. We found that for typical

human motions, good results can be obtained using 13

depth images (Kepski and Kwolek, 2013). For Kinect

acquiring the images at 25 Hz we take every ﬁfteenth

image.

In the detection mode the foreground objects are

extracted through differencing the current depth im-

age from such a depth reference map. Afterwards,

the person is delineated through extracting the largest

connected component in the thresholded difference

between the current depth map and the reference map.

Figure 1 illustrates extraction of the person us-

ing the updated depth reference image of the scene.

In the ﬁrst and the second row are depicted exam-

ple color images and their corresponding depth im-

ages, in the third row are shown the difference im-

ages between the depth images and the depth refer-

ence images, which are shown in the last row. As we

can observe, if the layout of the scene changes, for

instance due to the shift a chair to another location,

see Fig. 1 b,f, the depth difference images temporally

contain not only the person, but also the shifted ob-

ject, see Fig. 1 j,k. As we can observe in the sub-

sequent frames, on the basis of the refreshed depth-

reference image, see also Fig. 1 n), which now con-

tains the moved chair, the algorithm extracts only the

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

642

a) b) c) d)

e) f)

k) l)

m) n)

Figure 1: Extraction of the person on the basis of the updated depth reference image. Input images a-d), corresponding depth

images e-h), difference between depth images and depth-reference images i-l), updated on-line depth-reference images m-n).

person undergoing monitoring, see Fig. 1 l.

3.2 Lying Pose Recognition

The recognition of lying pose has been achieved us-

ing a classier trained on features representing the ex-

tracted person in the depth images. A data-set con-

sisting of images with normal activities like walking,

taking or putting an object from ﬂoor, bending right

or left to lift an object, sitting, tying laces, crouching

down and lying has been composed in order to train

classiﬁers responsible for checking whether a person

is lying on the ﬂoor. Thirty volunteers with age un-

der 28 years attended in preparation of the data-set.

In total 100 images representing typical activities of

daily living were selected and then utilized to extract

the features.

In most vision-based algorithms for lying pose

recognition the ratio of height to width of the rect-

angle surrounding the subject is utilized. In contrast,

in our algorithm we employ the ratio of major length

to major width, which is calculated on the basis of

the binary image I representing the person undergoing

monitoring. The major length and width (eigenval-

ues) were calculated in the following manner (Horn,

1986):

l = 0.707

(a+ c) +

+ (a− c)

w = 0.707

(a+ c) −

+ (a− c)

(1)

where

a =

− x

, b = 2(

− x

), c =

− y

∑

I(x,y), M

∑

xyI(x,y),

∑

I(x,y), M

∑

I(x,y),

whereas x

were computed as follows:

∑

xI(x,y)

∑

I(x,y)

, y

∑

yI(x,y)

∑

I(x,y)

In total, we utilized three features:

• H/H

max

- a ratio of head-ﬂoor distance to the

height of the person

• area - a ratio expressing the person’s area in the

image to the area at assumed distance to the cam-

era

• l/w - a ratio of major length to major width, cal-

culated on the basis of (1).

Figure 2 depicts a scatter plot matrix for the employed

attributes, in which a collection of the scatter plots

is organized in a two-dimensional matrix simultane-

ously to provide correlation information among the

attributes. In a single scatter plot two attributes are

projected along the x-y axes of the Cartesian coordi-

nates. As we can observe on the discussed plots, the

overlaps in the attribute space are not too signiﬁcant.

We considered also another attributes, for instance, a

ﬁlling ratio of the rectangles making up the person’s

bounding box. The worth of the considered features

FallDetectionusingCeiling-mounted3DDepthCamera

643

l/w

H/H

max

area l/w

area

H/H

max

Figure 2: Multivariate classiﬁcation scatter plot for features used in lying pose recognition.

was evaluated on the basis of the information gain

(Cover and Thomas, 1992), which measures the de-

pendence between the feature and the class label. In

assessment of the discrimination power of the consid-

ered features and selecting most discriminative ones

we used

InfoGainAttributeEval

function from the

Weka (Cover and Thomas, 2005). The features se-

lected in such a way were then utilized to train classi-

ﬁers responsible for distinguishing between daily ac-

tivities and accidental falls.

3.3 Dynamic Transitions for Fall

Detection

In the previous subsection we demonstrated how to

detect a fall on the basis of lying pose recognition,

through analysis the content of single depth image.

However, the human fall is a dynamic process, which

arises in relatively short time. The relevant liter-

ature suggests that the fall incident takes approxi-

mately 0.4 s to 0.8 s. During a person fall there is

an abrupt change of head-ﬂoor distance with accom-

panying change from a vertical orientation to a hori-

zontal one. The distance of the person’s centroid to

the ﬂoor also changes signiﬁcantly and rapidly during

the accidental fall. Thus, depth image sequences can

be used to extract features expressing motion patterns

of falls. Using depth image sequences we can charac-

terize the motion between static postures, and in par-

ticular between the pose before the fall and the lying

pose. In particular, motion information allows us to

determine whether a transition of the body posture or

orientation is intentional or not.

In the depth image sequences with ADLs as well

as person falls we analyzed the feature ratios H/H

max

area and l/w, and particularly their sudden changes

that arise during falling, e.g. from standing to ly-

ing. To reduce the ratio of false positive alarm of

the system relying only on features extracted in a sin-

gle depth image, we introduced a feature reﬂecting

change of the H/H

max

over time. That means that

aside from the static postures our algorithm also em-

ploys information from dynamic transitions, i.e. mo-

tions between static postures. In particular, this allows

us to distinguish between intentional lying postures

and accidental falls. Our experimental results show

that the ratio of H(t)/H(t − 1) for H(t) determined

in the moment of the impact and H(t − 1) determined

one second before the fall is very useful to distinguish

the fall from many common ADLs. Figure 3 depicts

change over time of the discussed features for typi-

cal activities, consisting in walking, crouching down,

knelling down, sitting, bending, standing and falling.

A peak in walking phase that is seen in the plot of

H/H

max

is due to a raise of the hand. As we can

observe, for ceiling mounted camera the area ratio

changes considerably in the case of the fall. In the

case of a person fall, the peak value of H(t)/H(t − 1)

is far below one.

The ratio H(t)/H(t − 1) can be determined using

only vision-based techniques, i.e. through analysis of

pairs of depth images. The Kinect microphones can

be used to support the estimation of the moment of

the person impact. Inertial sensors are particularly at-

tractive for such a task because currently they are em-

bedded in many smart devices. With the help of such

fall indicators the value H(t) can be computed at low

computational cost.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

644

Figure 3: Curves of feature change during the ADLs and

falls. W-walking, C-crouching down, K-knelling down, S-

sitting, B-bending, s-standing, F-falling.

4 EXPERIMENTAL RESULTS

At the beginning of the experimental validation of the

proposed algorithm we assessed the usefulness of an

accelerometer as an indicator of potential fall. The

actors performed common daily activities consisting

in walking, taking or putting an object from ﬂoor,

bending right or left to lift an object, sitting, tying

laces, crouching down and lying. The accelerome-

ter was worn near the spine on the lower back us-

ing an elastic belt around the waist. The motion data

were acquired by a wearable smart device (x-IMU

or Sony PlayStation Move) containing accelerometer

and gyroscope sensors. Data from the device were

transmitted wirelessly via Bluetooth and received by

a laptop computer. Figure 4 depicts the histogram of

the UPV values for the carried out activities during

half an hour experiment. As we can see, the values

2.5 − 3g were exceeded several times. This means

that within half an hour of typical person’s activity a

considerable number of false alarms would be gen-

erated if the fall detection was carried out only on

the basis of the accelerometer. In particular, we no-

ticed that all fall-like activities were indicated prop-

erly. Thus, the accelerometer can be used as reliable

indicator of the person impact. In consequence, the

computational overhead can be reduced signiﬁcantly,

as the depth image analysis can only be performed

in case of signaling with low cost the potential fall.

On the other hand, the accelerometer acknowledged

its usefulness in the update of the depth reference im-

age. For that reason, for a person at rest no update

of the depth reference image is needed. Overall, the

accelerometer acknowledged its usefulness in activity

summarization. However, typical accelerometer can

be inconvenient to be worn during the sleep and this

in turn results in the lack of ability to monitor the crit-

ical phase of getting up from the bed.

0 1 2 3 4 5

UPV [m/s

]

Histogram of UPV values for typical ADLs

Figure 4: Histogram of UPV for typical ADLs registered

during half fan hour experiment.

The algorithm for lying pose recognition has been

evaluated in 10-fold cross-validation using features

discussed in Section 3.2. We trained k-NN, SVM,

KStar and multilayer perceptron classiﬁers responsi-

ble for checking whether the person is lying on the

ﬂoor. All falls were distinguished correctly. We also

trained and evaluated our fall detection algorithm on

images acquired by Kinect with Nyko zoom range re-

duction lens. Figure 5 illustrates the images of the

same scene, which were acquired by Kinect with and

Figure 5: The same scene seen by Kinect (left) and by

Kinect with Nyko zoom range reduction lens (right). In up-

per row are shown color images, whereas in bottom one are

corresponding depth images.

without the Nyko lens. As we can notice, owing to

the Nyko lens the monitored area is far larger. In par-

ticular, using the Kinect sensor mounted at height of

about 2.5 m it is possible to encompass and to shot

FallDetectionusingCeiling-mounted3DDepthCamera

645

the ground of the whole room. Five young volun-

teers attended in the evaluation of the algorithm for

fall detection through analysis of lying pose. For the

testing of the classiﬁers we selected one hundred of

images with typical human actions of which half de-

picted persons lying on the ﬂoor. It is worth noting

that in the discussed experiment all lying poses were

distinguished properly from daily activities.

Figure 6 illustrates change of H/H

max

over time

for an intentional fall and an intentional lying on the

ﬂoor. As we can observe, the time needed for ap-

proaching the value 2.5g is different, i.e. approaching

the value for which the considerable part of the body

has a contact with the ﬂoor. The discussed plot jus-

tiﬁes the usefulness of H(t)/H(t − 1) feature in dis-

tinguishing the intentional lying on the ﬂoor from the

accidental falling.

Figure 6: Curves of H/H

max

change over time for an inten-

tional fall and lying on the ﬂoor.

The lying pose detector was evaluated by ﬁve stu-

dents. The experiments demonstrated that the algo-

rithm has very high detection ratio, slightly smaller

than 100%. However, the students found a couple of

poses that were similar to lying poses, and which were

recognized as lying poses. The use of the inertial sen-

sor as indicator of the fall reduces considerably the

possibility of false alarms of the system-based on ly-

ing pose detection at the cost of obtrusiveness.

In the next stage of the experiments we combined

the detector of lying pose with a detector using the

proposed temporal feature. Through combining the

discussed detectors we achieved reliable fall detection

with very small false alarm ratio. As expected, the

students found a small amount of daily actions, which

in some circumstances can lead to false alarm, mainly

due to imperfect detection of the moment of the body

impact on the basis of only vision techniques.

Having on regard that fall detection system should

have negligible false alarm ratio as well as low con-

sumption power, in the extended evaluation of the sys-

tem we employed an inertial sensor to sense the im-

pact, and particularly to determine more precisely the

moment at which it takes place. In response to poten-

tial fall signaled by the inertial sensor the lying pose

classiﬁer is executed to verify if the depth image con-

tains a person in a lying pose. If yes, the classiﬁer us-

ing the temporal feature is executed to check whether

this was a dynamical person action. In this case, the

value H(t) is calculated at the moment of the impact,

i.e. at time in which the acceleration exceeds the value

of 2.5 g.

The depth camera-based system for fall detection

has been tested using image sequences with daily

activities and simulated-falls performed by young

volunteers. Intentional falls were performed in a

room towards a carpet with thickness of about 2 cm.

A comprehensive evaluation showed that the sys-

tem has high accuracy of fall detection and very

low level (as low as 0%) of false alarms. In half

an hour experiment, in which more than 45000

depth images with fall-like activities and simulated

falls, all daily activities were distinguished from

falls like daily activities. The image sequences

with the corresponding acceleration data are available

at: http://fenix.univ.rzeszow.pl/∼mkepski/ds/uf.html.

The datasets contain 66 falls of which half of them

concerned persons falling of a chair. No false alarm

was reported and all intentional falls were indicated

appropriately. The classiﬁcation was done by k-

NNs. The algorithm has also been tested in an ofﬁce,

where the simulated falls were performed onto crash

mats. In a few minutes’ long video with walking, sit-

ting/standing, executed 20 times, crouching, executed

10 times, taking or putting an object from ﬂoor, re-

peated 10 times, and 20 intentional falls, all falls were

recognized correctly using only depth images.

The experimental results are very promising. In

order to be accepted by seniors, a system for fall

detection should be unobtrusive and cheap, and par-

ticularly it should have almost null false alarm ra-

tio as well as preserve privacy. The proposed algo-

rithm for fall detection was designed with regard to

factors mentioned above through careful selection of

its ingredients. In comparison to existing systems

(Mubashir et al., 2013), it has superior false alarm ra-

tio with almost perfect fall detection ratio, and meets

the requirements that should have a system to be ac-

cepted by the seniors. Moreover, due to low computa-

tional demands, the power consumption is acceptable

for seniors. Having on regard our previous imple-

mentation of a fall detection system on PandaBoard

as well as the computational demands of the current

algorithm it will be possible to implement the algo-

rithm on new PandaBoard and to execute the algo-

rithm in real-time. The advantage will be low power

consumption and easy setup of the system.

The depth images were acquired by the Kinect

sensor using OpenNI. The system was implemented

in C/C++ and runs at 30 fps on 2.4 GHz I7 (4 cores,

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

646

Hyper-Threading) notebook, powered by Windows.

For ceiling-mounted Kinect at the height of 2.6 m

from the ﬂoor the covered area is about 5.5 m

. With

Nyko lens the area covered by the camera is about

15.2 m

. The most computationally demanding op-

eration is extraction of the depth reference image of

the scene. For images of size 640× 480 the computa-

tion time needed for extraction of the depth reference

image is about 9 milliseconds.

5 CONCLUSIONS

In this work we demonstrated an approach for fall de-

tection using ceiling-mounted Kinect. The lying pose

is separated from common daily activities by a clas-

siﬁer, trained on features expressing head-ﬂoor dis-

tance, person area and shape’s major length to width.

To distinguish between intentional lying postures and

accidental falls the algorithm employs also motion be-

tween static postures. The experimental validation of

the algorithm that was conducted on realistic depth

image sequences of daily activities and simulated falls

shows that the algorithm allows reliable fall detection

with low false positives ratio. On more than 45000

depth images the algorithm gave 0% error. To reduce

the processing overload an accelerometer was used to

indicate the potential impact of the person and to start

analysis of depth images. The use of accelerometer

as indicator of potential fall simpliﬁes computation of

the motion feature and increases its reliability. Ow-

ing the use only depth images the system preserves

privacy of the user and works in poor lighting condi-

tions.

ACKNOWLEDGEMENTS

This work has been supported by the National Science

Centre (NCN) within the project N N516 483240.

REFERENCES

Aghajan, H., Wu, C., and Kleihorst, R. (2008). Distributed

vision networks for human pose analysis. In Mandic,

D., Golz, M., Kuh, A., Obradovic, D., and Tanaka, T.,

editors, Signal Processing Techniques for Knowledge

Extraction and Information Fusion, pages 181–200.

Springer US.

Bourke, A., O’Brien, J., and Lyons, G. (2007). Evaluation

of a threshold-based tri-axial accelerometer fall detec-

tion algorithm. Gait & Posture, 26(2):194–199.

Chen, J., Kwong, K., Chang, D., Luk, J., and Bajcsy, R.

(2005). Wearable sensors for reliable fall detection. In

Proc. of IEEE Int. Conf. on Engineering in Medicine

and Biology Society (EMBS), pages 3551–3554.

Cover, T. M. and Thomas, J. A. (1992). Elements of Infor-

mation Theory. Wiley.

Cover, T. M. and Thomas, J. A. (2005). Data Mining: Prac-

tical machine learning tools and techniques. Morgan

Kaufmann, San Francisco, 2nd edition.

Horn, B. (1986). Robot Vision. The MIT Press, Cambridge,

MA.

Jansen, B. and Deklerck, R. (2006). Context aware inactiv-

ity recognition for visual fall detection. In Proc. IEEE

Pervasive Health Conf. and Workshops, pages 1–4.

Kepski, M. and Kwolek, B. (2012). Fall detection on em-

bedded platform using Kinect and wirelessaccelerom-

eter. In 13th Int. Conf. on Computers Helping People

with Special Needs, LNCS, vol. 7383, pages II:407–

414. Springer-Verlag.

Kepski, M. and Kwolek, B. (2013). Unobtrusive fall detec-

tion at home using kinect sensor. In Computer Anal-

ysis of Images and Patterns, volume 8047 of LNCS,

pages I:457–464. Springer Berlin Heidelberg.

Marshall, S. W., Runyan, C. W., Yang, J., Coyne-Beasley,

T., Waller, A. E., Johnson, R. M., and Perkis, D.

(2005). Prevalence of selected risk and protective fac-

tors for falls in the home. American J. of Preventive

Medicine, 8(1):95–101.

Mastorakis, G. and Makris, D.(2012). Fall detection system

using Kinect’s infrared sensor. J. of Real-Time Image

Processing, pages 1–12.

Miaou, S.-G., Sung, P.-H., and Huang, C.-Y. (2006). A

customized human fall detection system using omni-

camera images and personal information. Distributed

Diagnosis and Home Healthcare, pages 39–42.

Mubashir, M., Shao, L., and Seed, L. (2013). A survey on

fall detection: Principles and approaches. Neurocom-

puting, 100:144 – 152. Special issue: Behaviours in

video.

Noury, N., Fleury, A., Rumeau, P., Bourke, A.,

OLaighin,

G., Rialle, V., and Lundy, J. (2007). Fall detection -

principles and methods. In Int. Conf. of the IEEE Eng.

in Medicine and Biology Society, pages 1663–1666.

Noury, N., Rumeau, P., Bourke, A.,

OLaighin, G., and

Lundy, J. (2008). A proposal for the classiﬁcation and

evaluation of fall detectors. IRBM, 29(6):340 – 349.

Pantic, M., Pentland, A., Nijholt, A., and Huang, T. (2006).

Human computing and machine understanding of hu-

man behavior: a survey. In Proc. of the 8th Int. Conf.

on Multimodal Interfaces, pages 239–248.

Rougier, C., Meunier, J., St-Arnaud, A., and Rousseau, J.

(2006). Monocular 3D head tracking to detect falls of

elderly people. In Int. Conf. of the IEEE Engineering

in Medicine and Biology Society, pages 6384–6387.

Weinland, D., Ronfard, R., and Boyer, E. (2011). A sur-

vey of vision-based methods for action representation,

segmentation and recognition. Comput. Vis. Image

Underst., 115:224–241.

Williams, G., Doughty, K., Cameron, K., and Bradley, D.

(1998). A smart fall and activity monitor for telecare

applications. In IEEE Int. Conf. on Engineering in

Medicine and Biology Society, pages 1151–1154.

FallDetectionusingCeiling-mounted3DDepthCamera

647