Mono-camera multi-target tracking is a largely
tackled problem in the Computer Vision community:
our approach is based on different associated assess-
ments. First, particle filtering algorithms’ interest
for tracking (CONDENSATION) have been established
since the initial work of Isard and Blake in (Isard
and Blake, 2001), notably for multiple targets. Then,
since (Okuma et al., 2004), tracking-by-detection has
emerged and particularly the temporal integration of
tracklets, which robustness has been proven by Kau-
cic et al. in (Kaucic et al., 2005). Tracklets optimisa-
tion has also been extended to two cameras present-
ing a disjoint field of view by (Kuo et al., 2010). This
method yet does not work online, as the optimisation
is conducted on a temporal window.
In opposition to them, our approach places itself
in the markovian formalism for the tracking module.
Our approach is inspired of (Breitenstein et al., 2010)
and (Wojek et al., 2010). Like (Breitenstein et al.,
2010), it is based on distributed particle filters en-
hanced by a reidentification component coming from
a discrete identity variable also sampled. They are
termed mixed-state particle filters. Then, in the vein
of (Wojek et al., 2010), we perform a tracklet tempo-
ral integration, but on the identities here, and not for
cameras but on the whole network.
3 TRACKING-BY-
REIDENTIFICATION WITHIN
A CAMERA
In this article, we propose an extension to NOFOV
networks of the tracking-by-detection algorithm pro-
posed by Breitenstein et al. in (Breitenstein et al.,
2010), introducing the notion of global identity that
we seek to retrieve for each target. We present in
this section our implementation of (Breitenstein et al.,
2010) and how the use of mixed-state particle filtering
for reidentification (Meden et al., 2011) comes to ex-
tend that approach.
3.1 Targets Description
3.1.1 Global Identities Learning
Each reidentification algorithm needs a first view be-
fore allowing any reidentification. Here, we assume
that such a database is acquired offline. To do so,
we extract a collection of key-frames from one of the
cameras (e.g. positioned in the entrance hall of the
building to monitor), and we use these as description
of our global identities. The choice of the key-frames
is done with K-means on tracking sequences from the
chosen camera as detailled in (Meden et al., 2011).
Thus, these key-frames encode the variability of the
identity during its first tracking. Figure 3 presents
the identity database used for the network of figure 2,
learned in camera 1.
Figure 3: Key-frames of each identity for the NOFOVNet-
work sequence (issued from camera 1).
3.1.2 Target Appearance Modelling
We use the same appearance model as depicted
in (Meden et al., 2011) to describe the targets and
their identities in the database: horizontal stripes of
color distributions, computed in the RGB space. The
similarity between two descriptors is the Bhattachar-
rya distances between corresponding stripes, normal-
ized by a gaussian kernel. This allows us to compute
similarities to the appearance model of a tracker, and
also to the key-frames of an identity in the database,
respectively noted w
App
(.) and w
Id
(.).
3.2 Detections Integration
3.2.1 Association to Detections
Our approach favor a tracking-by-detection strategy
via the classical HOG detector proposed by Dalal and
Triggs in (Dalal and Triggs, 2005). These detections
are integrated in the tracking process by a greedy as-
sociation stage. After that association, each tracker
has potentially received a detection which will be
used to update the particles. To do so, an associa-
tion matrix is built between trackers and detections.
The score of pair detection d vs. tracker tr given by
equation (1), involves:
• the distance between the tracker’s particles and
the detection, evaluated under a gaussian kernel
p
N
(.) ∼ N (.,σ
2
) ;
• the tracker’s box area A(tr) relatively to the detec-
tion’s one also evaluated under a gaussian kernel;
• the tracker’s appearance model evaluated on the
detection (w
App
(.)).
S(d,tr) =
N
∑
p∈tr
p
N
(d − p)
| {z }
euclidean distance
× p
N
|A(tr) − A(d)|
A(tr)
| {z }
relative size
× w
App
(d,tr)
| {z }
appearance model
(1)
TRACKING-BY-REIDENTIFICATION IN A NON-OVERLAPPING FIELDS OF VIEW CAMERAS NETWORK
97