dure are currently conducted using hard thresholds,
which we plan to make adaptive in the future. We plan
to make our tracking algorithm more robust to occlu-
sions and noise by using shape information from all
the previous time steps. A way to achieve this would
be building dynamic shape models (Cremers, 2006).
We provided a quantitative evaluation of the
method using human-annotated ground truth. Obtain-
ing ground-truth for video is however a very tedious
procedure and thus poses us limits. Since there is
no implementation of a similar algorithm performing
joint segmentation and tracking in depth space avail-
able, we compared our method to a standard color-
video segmentation algorithm (Grundmann et al.,
2010). We could show that our method outperformed
color-video segmentation for the videos analyzed.
However, this comparison may not be entirely fair,
since we are using a different feature, i.e., depth, and
not color.
Currently, the method needs ∼ 1.92 seconds to
process one frame of size 430 × 282 pixels in Matlab
on Intel 3.3 GHz processor. With an efficient C/C++
implementation of the method, we expect to gain real-
time performance, which is one of our next goals.
