tively, to our conservative association, tracklets and
optimal association). Stauffer (Stauffer, 2003) refers
to the later as track stitching. Both Stauffer and Neva-
tia (Huang et al., 2008) refer to the track segments as
tracklets.
An inherent issue of global optimization is easy to
understand: in a realistic scenario, we obviously don’t
have access to the whole video to analize it globally;
instead, a continuous video stream is received over
time, and we wish to obtain tracking results as imme-
diately as possible. To this date, this promising class
of methods hasn’t been able to make the leap from
global analysis of a single video segment to analy-
sis of continuous video streams. Our work intends to
bridge this gap, through the method outlined in Sec-
tion 4.
This paper is organized as follows. Section 2 de-
scribes the tracking framework based on a probabilis-
tic formulation of the problem, which is then solved
by the Hungarian Algorithm. Section 3 showcases
the appearance descriptor of our choice, the Region
Covariance Matrix (RCM). Section 4 presents the ex-
tension of the Dynamic Hungarian Algorithm to deal
with a sliding window, enabling its use in continuous,
streamed video, as opposed to small video segments
as has been the case in previous work. Finally, Sec-
tions 5 and 6 show, respectively, the experimental re-
sults and our conclusions.
2 TRACKING METHODOLOGY
Our tracking method starts with a stripped-down im-
plementation of the hierarchical tracker proposed by
Nevatia et al. (Huang et al., 2008). Their work fol-
lows the recent trend of computing association scores
between all pairs of detections and using the Hun-
garian algorithm to create a matching between them
1
, thus obtaining a set of tracks, in a way that opti-
mizes the association scores. They computed the as-
sociations progressively, through a hierarchy of low,
middle and high-level association schemes; the basic
framework for our tracker is adapted from theirs, and
will be described in this section.
2.1 Conservative Association
Recall that the main objective is to associate (match)
each detection to another one, optimizing some asso-
ciation criteria. In a typical scene, there’s a good num-
ber of associations that are straightforward to com-
pute. For example, a person walking down a corridor
1
In the tracking context, a matching indicates, for each
detection, which one comes next.
alone without any occlusion will yield a set of detec-
tions with high association scores, and no other detec-
tions should have equally high scores towards those
detections. In such cases matching is easily computed
and is unambiguous, by a process we call conserva-
tive association. This turns out to be an efficient op-
timization, relieving the Hungarian algorithm of this
duty (the algorithm’s running time is O(n
3
), with n
the number of detections).
2.1.1 Conservative Strategy
We denote r
i
as a detection response, which may con-
tain characteristics such as position, frame index, and
appearance properties. Instead of arbitrary scores, it
makes sense to maximize association probabilities, so
these will be used throughout the text. The aim of
this first take on matching is to consider matches that
have a high association probability (higher than an
arbitrary threshold θ
1
), but only if there is no other
conflicting match; that is, all other matches involving
these two detections have lower probabilities (by at
least θ
2
). This is defined in (1), where P
link
(r
i
|r
j
) is
the association probability between detections r
i
and
r
j
.
P
link
(r
i
|r
j
) > θ
1
min
P
link
(r
i
|r
j
) − P
link
(r
k
|r
j
),
P
link
(r
i
|r
j
) − P
link
(r
i
|r
k
)
> θ
2
,
∀r
k
∈ R −
r
i
,r
j
(1)
2.1.2 Association Probabilities
The association probabilities can be computed
through (2), which is simply the joint probability
of three probabilities of identity, called affinities.
A
δ
(r
i
|r
j
), δ ∈
{
p,s,a
}
are position, size and appear-
ance affinities (described in the next paragraph), and
t
k
is the frame index of the occurrence of detection r
k
.
Note that the only way for an association probability
to be non-zero is for the second detection to appear
exactly one frame after the first. This is part of the
conservative strategy, as occlusions (i.e., frame gaps
between detections) are not resolved at this stage.
P
link
(r
i
|r
j
) =
A
p
(r
i
|r
j
)A
s
(r
i
|r
j
)A
a
(r
i
|r
j
), if t
j
−t
i
= 1
0, otherwise
(2)
The position difference between two detections is
modeled through a two-dimensional Gaussian distri-
bution so the position affinity can be obtained from
the positions of two detections, p
i
and p
j
, as G(p
i
−
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
208