tic process at time t and by y
t
∈ Y the measure-
ment state. The non-linear Bayesian tracking con-
sists in estimating the posterior filtering density func-
tion p(x
t
|y
1:t
) through a non-linear transition function
x
t
= f
t
(x
t−1
, v
t
) and a non-linear measurement func-
tion y
t
= h
t
(x
t
, w
t
). Particle filters, also known as se-
quential Monte-Carlo methods, are used to approxi-
mate the posterior distribution by a weighted sum of
Dirac masses centered on hypothetical state realiza-
tions of the state x
t
, also called particles. For more
details about particle filters techniques, see (Doucet
et al., 2001).
In this paper, we focus on the design of the
likelihood density function induced by h
t
, which is
of prime interest in Bayesian estimation methods,
and especially in a particle filter framework, since
it weights the particle cloud. As we will see in the
next section, using several features or sensors usually
helps, but imposes to properly model these different
pieces of information, which can be redundant or con-
flicting. In Section 4, we propose a new method for
integrating several features in the likelihood process,
by partitioning the space of the object feature, leading
to a mixture of likelihood density functions.
3 STATE OF THE ART
Using several features, several sensors, or several ap-
pearance models, are distinct concepts. In the parti-
cle filter framework, multi-sensors models and multi-
features ones often lead to similar implementations,
and are therefore not always clearly distinguished in
the literature. Here we use the term multi-modalities
to denote these two types of data. We describe next
two popular models, which are suited in most cases,
dealing with several features of several sensors.
In the following, x
t
∈ X denotes the hidden state
of a stochastic process at time t, and y
t
= (y
1
t
, . . . , y
R
t
)
is a vector of R components, where y
r
t
stands for the
r
th
modality.
The first model consists in factorizing the like-
lihood density as a mixture model, in which
each component represents a modality: p(y
t
|x
t
) =
∑
R
r=1
π
r
t
p(y
r
t
|x
t
), with {π
r
t
}
R
r=1
the “relevance proba-
bility” of the modalities (
∑
R
r=1
π
r
t
= 1), i.e. the prob-
ability that the modality r is the one which describes
the state x
t
. These probabilities are either fixed by
hand (Xu and Li, 2005), or adaptive but with a fixed
set a possible values (Hotta, 2006).
The second model introduces confidence or reli-
abilitiy in a modality. It considers a conditional in-
dependence between the modalities according to the
state x
t
. Confidence indices {α
r
t
}
R
r=1
are then added in
a ad hoc way as exponents of the marginal likelihoods
p(y
r
t
|x
t
), r = 1, . . . , R: p(y
t
|x
t
) =
∏
R
r=1
p(y
r
t
|x
t
)
α
r
t
,
with α
r
t
∈ [0, 1]. The values α
r
t
represent the confi-
dence in the modality r. Unlike the first model, in-
dices α
r
t
are independent of each other, which facili-
tates their update. They can be defined using differ-
ent features, one can see for example (Brasnett et al.,
2007; Erdem et al., 2010).
When the appearance of the object evolves dur-
ing time, for example because of luminosity or pose
changes, tracking algorithms using a correlation cri-
terion between a reference model and a candidate
must update the reference model to stay robust. The
implementation of a model with a changing appear-
ance consists in updating progressively the reference
model, as it has notably been proposed in (Nummiaro
et al., 2002; Mu
˜
noz-Salinas et al., 2008).
Here, instead of updating the reference model, we
propose a different approach, that explicitly models
several components which may be related to several
appearances.
All methods described in this section aim at defin-
ing adaptive weights, of probability, confidence or
model update. This adaptive feature enhances the
models with more flexibility and robustness. How-
ever, the update is often difficult and therefore often
performed in an heuristic way, by computing the val-
ues a posteriori according to a defined criterion. The
strategy is therefore not directly included in the par-
ticle filter framework, and delivers a single parameter
set for all the particles. Hence, errors can propagate
and accumulate during time, thus definitely biasing or
deteriorating the reference model. This may lead to
unsuitable likelihoods and penalize the tracking task.
The model we propose defines the likelihood by a
mixture of densities, in which each particle is associ-
ated with a different set of weights. Hence, this strat-
egy does not suffer from the aforementioned problem.
The originality comes from the fact that a weight is
related to a decomposition of the state and the obser-
vation and not to a feature or a sensor.
4 MULTIPLE MODEL
LIKELIHOOD
We propose in this section to define a multiple model
likelihood. We consider that an appearance is a possi-
ble representation of an object, according to a con-
sidered feature. This modeling is useful when ob-
ject appearance (color, shape,...) changes during time.
For example, in a 3D face tracking problem, one may
define several components, that we call postures, for
which the probabilities are computed using the ori-
OBJECT TRACKING BASED ON PARTICLE FILTERING WITH MULTIPLE APPEARANCE MODELS
605