To approximate 2
nd
-order Markov chain on the
ambiguous tracklets, given the Hungarian
association results, we only accept the connection of
an ambiguous tracklet at the end with the higher link
probability than the other end. The connection of the
other end is left for association in the following
iterations, when there may be fewer ambiguities
(e.g. detection update might have been performed to
correct the detections or retrieve the missed
detections, or the appearance model may have been
updated, or the degenerate tracklet has linked to
another tracklet and hence contains motion
information).
After the global Hungarian matching, new links
may have been established and we can go on
performing detection update and local tracklet
linking. The iterative process ends when no new
links can be found using the global Hungarian
association.
2.5 Recovery from Identity Switches
Identity switch may exist in the original tracklets,
which are usually caused by occlusions where
accurate detection is difficult. Within the proposed
association framework, as the detection update
proceeds, the renewed detection may deviate from
the original detection farther and farther away due to
the guidance of the reliable temporal information.
When the deviation becomes very significant, i.e.
the intersection ratio between the updated detection
and the original detection is quite small or the
appearance affinity between them is not high
enough, we doubt that there may be something
inconsistent. In this situation, we break up the
tracklet at that point and look for possible better
association for the resulting two separated tracklets.
3 EXPERIMENTAL RESULTS
In this section, we demonstrate the performance of
our proposed approach on two public data sets,
namely the CAVIAR data set and the PETS 2009
data set, which have been widely used for testing the
performance of multi-target tracking works.
In our experiment, parameters not specified
manually are learned through 90 ground truth
trajectories of a video captured by ourselves where
mutual occlusion happens frequently, and these
parameters are set exactly the same for both tested
data sets.
To determine whether a target is being tracked,
the commonly used PASCAL criterion, i.e. the
intersection over union greater than 0.5 is adopted
for all the experiments.
For quantitative evaluation of the proposed
approach, we follow the currently most widely
accepted protocol, the CLEAR MOT metrics
(Stiefelhagen, 2006): The Multi-Object Tracking
Accuracy (MOTA) combines three types of errors –
false positives (FP), missed targets (FN), and
identity switches (IDs) – and is normalized such that
the score of 100% corresponds to no errors (all three
error types are weighted equally in our evaluation);
The Multi-Object Tracking Precision (MOTP)
measures the alignment of the tracker output w.r.t.
the ground truth. We also report recall, precision,
False alarm per Frame (Fa/F), as well as Mostly Lost
(ML), Partially Tracked (PT), and Mostly Tracked
(MT) scores, and the number of identity switches
(IDs) and fragmentations (Frag) of the produced
trajectories compared with ground truth trajectories
according to Li, 2009.
Two state-of-the-art tracklet based data
association approaches Kuo, 2011 and Yang, 2012
are selected for comparison. In Kuo, 2011, a robust
appearance model is learned for each target
(PRIMPT), and in Yang, 2012, both appearance
models and motion patterns are learned
(NLMPRAM). For fair comparison, the detections,
ground truth and the evaluation tool are downloaded
from the homepage of the first author of Yang, 2012
(http://iris.usc.edu/people/yangbo/downloads.html).
3.1 Performance on the CAVIAR Data
Set
As the proposed approach requires additional
computational time to perform detection update and
missed detection recovery, to reduce the run time,
for the CAVIAR data set, we sample 1 frame out of
every 10 frames from the video sequences for
tracking, i.e. the frame rate of the input to the
tracking approach is 2.5f/s.
20 sequences of the CAVIAR data set have been
evaluated as is done in Kuo, 2011 and Yang, 2012,
and Table 1 lists the comparison of the results. It can
be seen that our approach outperforms Kuo, 2011
and Yang, 2012 in terms of recall, precision, number
of mostly lost tracks and identity switches. However,
the number of fragmentations of our approach is
higher than both Kuo, 2011 and Yang, 2012. Figure
6 (a) illustrates the tracking result of our approach
on CAIVAR data set.
RobustMulti-HumanTrackingbyDetectionUpdateusingReliableTemporalInformation
393