each target either at the prediction stage (Sanchez-
Matilla et al., 2016) or by post-processing the filter
outputs (Baisa and Wallace, 2017).
More recently, a new filter based on stochastic
populations has been developed with the concept of
partially-distinguishable populations and is termed as
Distinguishable and Independent Stochastic Popula-
tions (DISP) filter (Delande et al., 2016). This filter
can handle an unknown and time varying number of
targets in the scene with targets birth, death, miss-
detections and false alarms, however, it has a high
computational complexity. A low-complexity filter
called Hypothesized and Independent Stochastic Po-
pulation (HISP) filter (Houssineau and Clark, 2016)
has been derived from the DISP filter under some in-
tuitive approximations and was adapted for space si-
tuational awareness in (Delande et al., 2017). This
HISP filter has a linear complexity with both the num-
ber of hypotheses and the number of observations si-
milar to the PHD filter, however, unlike the PHD fil-
ter, it can preserve the distinct tracks for detected tar-
gets.
In this work, we propose an online multi-target vi-
sual tracker using tracking-by-detection approach for
real-time applications. Accordingly, we make the fol-
lowing three contributions. First, we apply the HISP
filter for tracking multiple targets in video sequences
acquired under varying environmental conditions and
targets density. Second, we alleviate the problem of
two or more targets having identical label taking into
account the weight propagated with each confirmed
hypothesis. Finally, we make extensive experiments
on Multiple Object Tracking 2016 (MOT16) bench-
mark dataset using the public detections provided in
the benchmark’s test set.
The paper is organized as follows. In section 2,
the HISP filter in video tracking context is described
in detail. In section 3, the applications and determi-
nation of some important variable values are given.
The experimental results are analyzed and compared
in section 4. The main conclusions and suggestions
for future work are summarized in section 5.
2 THE HISP FILTER
The HISP filter is a principled approximation of the
DISP filter for practical applications especially for fil-
tering in scenarios involving a large number of tar-
gets with moderately ambiguous data association. It
combines the advantages of engineering solutions like
MHT and point-process-based approaches like PHD
filter. It propagates track identities through time simi-
lar to MHT, however, it overcomes the drawbacks of
MHT such as its strong reliance on heuristics for the
appearance and disappearance of targets and a lack a
adaptivity by modelling all sources of uncertainties in
a unified probabilistic framework. Moreover, it has
a linear complexity in the number of hypotheses and
in the number of observations, however, the MHT fil-
ter has an exponential complexity with time and cubic
with the number of targets.
Let the time be indexed by the set T
.
= N. For
any t ∈ T, the target state space of interest and the ob-
servation space of interest are given by X
•
t
⊆ R
d
and
Z
•
t
⊆ R
d
0
, respectively. They are augmented with the
empty state ψ which describes the state of targets out-
side of the scene of interest and the empty observation
φ which describes missed detections, respectively,
to form the (full) target state space X
t
= X
•
t
S
{ψ}
and the (full) observation space Z
t
= Z
•
t
S
{φ}. The
set of collected observations is represented by
¯
Z
t
=
Z
t
S
{φ}; Z
t
for detected observations.
At any time t ∈ T, the HISP filter is basically ba-
sed on the following modelling assumptions: 1) a tar-
get produces at most one observation (if not, a miss
detection occurs), 2) an observation originates from
at most one target (if not, a false alarm occurs), 3) tar-
gets evolve independently of each other, and 4) obser-
vations resulting from target detections are produced
independently from each other.
For tracking applications, targets are distinguis-
hed by considering their observation histories. Let the
space O
t
be
¯
O
t
=
¯
Z
0
× ...×
¯
Z
t
, (1)
so that o
t
∈ O
t
takes the form o
t
=
(φ,...,φ,z
t
+
,..., z
t
−
,φ, ...,φ) with t
+
and t
−
the
time of appearance and disappearance of the conside-
red track in the scene of interest, and with z
t
∈
¯
Z
t
for
any t
+
≤ t ≤ t
−
. The observation history o
t
can also
be referred to as the observation path and the empty
observation path (φ,...,φ) ∈ O
t
is denoted by φ
t
.
Each target is identified by some index i in a set
I. A track i associated to an observation path with at
least one detection (i.e. o
i
t
6= φ
t
) cannot have a mul-
tiplicity n
i
greater than one since it cannot represent
more than one target, hence, the previously-detected
target represented by the track i is then distinguis-
hable. However, a track i associated to the empty
observation path o
i
t
= φ
t
represents a sub-population
of yet-to-be-detected (undetected) targets that are in-
distinguishable from one another, and may have a
multiplicity n
i
greater than one. The tracks cover
all the possible combinations of non-empty observa-
tion paths representing the previously-detected tar-
gets, and one (or possibly several) track(s) represen-
ting sub-population(s) of yet-to-be-detected targets.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
430