frames when no correspondence is found. The sys-
tem can so track completely object motion even when
the object is not sometime detected or is detected in-
correctly. This prevents the mobile object trajectories
from being fragmented. However, the “waiting state”
can cause an error when the correspondingmobile ob-
ject goes out of the scene definitively. Therefore, we
propose a rule to decide the moment when a tracked
object ends its life and also to avoid maintaining for
too long the “waiting state”. A more reliable tracked
object will be kept longer in the “waiting state”. In
our work, the tracked object reliability is directly pro-
portional to number of times this object finds matched
objects. The greater number of matched objects, the
greater tracked object reliability is. Let Id of a frame
be the order of this frame in the processed video se-
quence, a tracked object ends if:
F
l
< F
c
− min(N
r
,T
2
) (10)
where F
l
is the latest frame Id where this tracked ob-
ject finds matched object (i.e. the frame Id before en-
tering the “waiting state”), F
c
is the current frame Id,
N
r
is the number of frames in which this tracked ob-
ject was matched with a detected object, T
2
is a pa-
rameter to determine the number of frames for which
the “waiting state” of a tracked object cannot exceed.
With this calculation method, a tracked object that
finds a greater number of matched objects is kept in
the “waiting state” for a longer time but its “waiting
state” time never exceed T
2
. Higher the value of T
2
is set, higher the probability of finding lost objects is,
but this can decrease the correctness of the fusion pro-
cess.
We also propose a set of rules to detect the noisy
trajectories. The noise usually appears when wrong
detection or misclassification (e.g. due to low image
quality) occurs. A static object or some image regions
can be detected as a mobile object. However, a noise
usually only appears in few frames or does not dis-
place really (around a fixed position). We thus pro-
pose to use temporal and spatial filters to remove it. A
trajectory is composed of objects throughout time, so
it is unreliable if it cannot contain enough objects and
usually lives in the “waiting state”. Therefore we de-
fine a temporal threshold when a “waiting state” time
is greater, the corresponding trajectory is considered
as noise. Also, if a new trajectory appears, the system
cannot determine immediately whether it is noise or
not. The global tracker has enough information to fil-
ter out it only after some frames since its appearance
moment. Consequently, a trajectory that satisfies one
of the following conditions, is considered as noise:
T < T
3
(11)
(d
max
< T
4
) and (T ≥ T
3
) (12)
(
T
w
T
≥ T
5
) and (T ≥ T
3
) (13)
where T is time length (number of frames) of the
considered trajectory (“waiting state” time included);
d
max
is the maximum spatial length of this trajectory;
T
w
is the total time of “waiting state” during the life of
the considered trajectory; T
3
, T
4
and T
5
are the prede-
fined thresholds. While T
4
is a spatial filter threshold,
T
3
and T
5
can be considered as temporal filter thresh-
olds to remove noisy trajectories. The condition (11)
is only examined for the trajectories which end their
life according to equation (10).
4 EXPERIMENTATION
AND VALIDATION
We can classify the tracker evaluation methods by
two principal approaches: off-line evaluation using
ground truth data (C. J. Needham and R. D. Boyle,
2003) and on-line evaluation without ground truth
data (D. P. Chau et al., 2009b). In order to be able
to compare our tracker performance with the other
ones, we decide to use the tracking evaluation met-
rics defined in ETISEO benchmarking project (A.
T. Nghiem et al., 2007) which comes from the first
approach. The first tracking evaluation metric M
1
,
which is the “tracking time” metric measures the
percentage of time during which a reference object
(ground truth data) is tracked. The second metric
M
2
“object ID persistence” computes throughout time
how many tracked objects are associated with one ref-
erence object. The third metric M
3
“object ID confu-
sion” computes the number of reference object IDs
per tracked object. These metrics must be used to-
gether to obtain a complete tracker evaluation. There-
fore, we also define a tracking metric M taking the
average value of these three tracking metrics. All of
the four metric valuesare defined in the interval [0, 1].
The higher the metric value is, the better the tracking
algorithm performance gets.
In this experimentation, we use the people de-
tection algorithm based on HOG descriptor of the
OpenCV library (http://opencv.willowgarage.com/
wiki/). Therefore we focus the experimentation on
the sequences containing people movements. How-
ever the principle of the proposed tracking algorithm
is not dependent on tracked object type.
We have tested our tracker with five video se-
quences. The first three videos are selected from
ETISEO data in order to compare the proposed
tracker performance with that from other teams. The
last two videos are extracted from different projects
so that the proposed tracker can be tested with more
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
572