Stone and Skubic, 2011; Tang, 2011; Xia et al., 2011),
however, there is no publication dealing with the topic
directly.
This paper tries to fill this gap and provide a
starting point for further research. We start by an-
alyzing the characteristics of the Kinect depth cam-
era (Section 2), and their impact on the problem of
background subtraction problem (Section 3). After
that, we we choose four background subtraction algo-
rithms (Section 4), and adapt them to the domain of
depth images (Section 5). Finally we evaluate them
using three different depth videos along with their
ground truth segmentation (Sections 6 and 7).
2 KINECT DEPTH IMAGE
CHARACTERISTICS
We start this section by delivering an overview of the
distinct characteristics of depth images provided by
Kinect. They will provide the basis to analyze the
problems associated with the task of foreground de-
tection. The functional principles of the Kinect will
not be discussed in this paper (Refer to (Khoshelham,
2011) instead).
Although depth image resolution is 640×480 pix-
els but the effective resolution is much lower since the
depth calculation depends on small pixel clusters. The
detection range is between 50 cm and about 5 m with
a field of view of approximately 58 °. Depth informa-
tion is encoded using 11 bit for the depth information
and 1 bit indicating an undefined value.
But the most important property is obviously the
usage of distance information instead of color intensi-
ties. This which makes the image independent of illu-
mination, texture and color. Direct sunlight, however,
can outshine the projected pattern, turning many pix-
els to undefined. Certain kinds of material properties
can also hinder a stable depth recognition, including
high reflectiveness and transparency or dark colors.
The depth image contains different types of distur-
bances and noise. We characterize the pixels accord-
ing to those errors as follows:
• Stable: A fixed depth value with only a small
variance increasing quadratically with range (see
(Khoshelham, 2011)).
• Undefined: A special value meaning that no
depth information is available. This is typical for
object shadows, direct sunlight, and objects below
the minimum range of 50cm.
• Uncertain: Switching in a random manner be-
tween the undefined and stable state. This is of-
ten the case for boundaries of undefined regions,
reflections, transparencies, very dark objects, and
fine-structured objects (e.g. hair).
• Alternating: Switching between two different
stable values.
Occasionally, there are pixels with “uncertain” and
“alternating” characteristics, i.e. they switch between
two different stable values and the undefined state.
It is also important to note that alternation and un-
certainty do not usually occur pixel-wise but cluster-
wise, therefore contours may differ substantially from
frame to frame.
3 FOREGROUND DETECTION
CHALLENGES
In the following we give a summary of challenges
faced by background subtraction algorithms that work
on depth images. The list is based upon the more de-
tailed summary of (Toyama et al., 1999). We recite
only the challenges related to depth images, and also
modified the descriptions to better reflect the charac-
teristics of depth images as provided by the Kinect
sensor.
Moved Objects: The method should be able to adapt
to changes in the background such as a moved
chair or a closed door.
Time of Day: Direct sunlight can outshine the in-
frared patterns used for depth estimation, result-
ing in undefined pixels in the according regions.
If the illumination changes, the state of the pixels
in the affected regions might also change (to sta-
ble or undefined), which results in the pixel class
“uncertain” (see Section 2). This is similar to the
moved object problem.
Dynamic Background This problem, originally re-
ferred to as waving trees in (Toyama et al., 1999),
can be caused by any constantly moving back-
ground object e.g. slowly pivoting fans.
Bootstrapping: In some environments it is neces-
sary to learn a background model in the presence
of foreground objects.
Foreground Aperture: When a homogeneous back-
ground object moves, changes in the inner part
might not be detected by a frame to frame differ-
ence algorithm. This is especially true for depth
images, because there is no color and texture.
Shadows and Uncertainty: The system has to cope
with undefined and uncertain pixels (see Sec-
tion 2) both in the fore- and background. Addi-
tionally, foreground objects often cast shadows,
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
432