Table 1: Performance results for the five detectable events.
Precision Recall F1-measure
falling down 85.7% 100% 96.1%
sitting down 100% 100% 100%
standing up 100% 100% 100%
entering home 83.2% 71.4% 76.8%
exiting home 80.0% 66.6% 72.7%
5.2 Evaluation and Results
For evaluation, the processing results are compared
against manually annotated ground truth. In order
to account for annotation errors and detection un-
certainty, we allow a temporal window of ∆ f = 30
frames for matching ground truth to detection results.
The final results of our event detection method are
shown in Table 1. It can be seen that the event ”falling
down” has been recognized with 100% recall and a
few false positives. A few false positives are admis-
sible because in safety applications, the focus is on a
high recall rate. In our experiments a few false pos-
itives occured when people leaned down to help up
a person who has fallen down before. Our algorithm
is able to detect the ”sitting down” and ”standing up”
events with perfect precision and recall. The ”enter-
ing” and ”exiting” events are harder to detect, espe-
cially because in our dataset, people often enter or exit
the scene in groups of two or three.
6 CONCLUSIONS
In this paper we have shown how data from multi-
ple, heterogeneous image sensors can be efficiently
combined to detect a number of events with applica-
tion to surveillance in a smart home environment. We
have shown that for fusing multiple heterogeneous
data sources, a 3D voxel occupancy grid is beneficial.
Furthermore, we demonstrated simultaneous tracking
and event detection using an extended multi-object
Viterbi tracking framework. We applied our method
to the multi-camera, multi-modal Prometheus smart
home database. In this specific application, our al-
gorithm is capable of detecting falling people and a
number of other events. We showed excellent results
on this smart home database and showed that the pro-
posed application setup can in fact be used for assis-
tance systems for the elderly.
REFERENCES
Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast approxi-
mate energy minimization via graph cuts. IEEE Trans.
Pat. Analysis and Machine Intelligence, 23(11):1222–
1239.
Cheung, G. K., Baker, S., Simon, C., and Kanade, T. (2003).
Visual hull alignment and refinement across time: A
3D reconstruction algorithm combining shape-from-
silhouette with stereo. Comp. Vis. Pat. Rec.
Cipolla, R. and Blake, A. (1992). Surface shape from the
deformation of apparent contours. Int. J. Comput. Vi-
sion, 9(2):83–112.
Diraco, G., Leone, A., and Siciliano, P. (2010). An active
vision system for fall detection and posture recogni-
tion in elderly healthcare. In Design Automation &
Test, pages 1536–1541.
Fleuret, F., Berclaz, J., Lengagne, R., and Fua, P. (2007).
Multi-camera people tracking with a probabilistic oc-
cupancy map. IEEE Trans. Pat. Analysis and Machine
Intelligence.
Fornay, G. D. (1973). The viterbi algorithm. Proceedings
of the IEEE, 61(3):268–278.
Foroughi, H., Rezvanian, A., and Paziraee, A. (2008).
Robust fall detection using human shape and multi-
class support vector machine. In Proc. Indian Conf.
on Computer Vision, Graphics & Image Processing,
pages 413–420.
Kolmogorov, V. and Zabih, R. (2002). What energy func-
tions can be minimized via graph cuts? In Proc. Eu-
ropean Conf. on Computer Vision, pages 65–81.
Martin, W. and Aggarwal, J. (1983). Volumetric descrip-
tions of objects from multiple views. IEEE Trans. Pat.
Analysis and Machine Intelligence, 5(2):150–158.
Ntalampiras, S., Arsi
´
c, D., St
¨
ormer, A., Ganchev,
T., Potamitis, I., and Fakotakis, N. (2009).
PROMETHEUS database: A multi-modal cor-
pus for research on modeling and interpreting human
behavior. In Proc. Int. Conf. on Digital Signal
Processing.
Serra, J. (1983). Image Analysis and Mathematical Mor-
phology. Academic Press, Inc., Orlando, FL, USA.
Shoaib, M., Elbrandt, T., Dragon, R., and Ostermann, J.
(2010). Altcare: Safe living for elderly people. In 4th
Int. ICST Conf. on Pervasive Computing Technologies
for Healthcare 2010, volume 0.
Snow, D., Viola, P., and Zabih, R. (2000). Exact voxel oc-
cupancy with graph cuts. In IEEE Conf. on Computer
Vision and Pattern Recognition, pages 345–352.
Tsai, R. (1986). An efficient and accurate camera calibra-
tion technique for 3-D machine vision. In IEEE Conf.
on Computer Vision and Pattern Recognition, pages
364–374.
Zivkovic, Z. and van der Heijden, F. (2006). Efficient
adaptive density estimation per image pixel for the
task of background subtraction. Pattern Recogn. Lett.,
27(7):773–780.
EVENT DETECTION IN A SMART HOME ENVIRONMENT USING VITERBI FILTERING AND GRAPH CUTS IN
A 3D VOXEL OCCUPANCY GRID
247