(a) (b) (c)
Figure 2: Sparse representation of object category
car
: (a)
front view, (b) side view, (c) rear view. The parts used in the
training stage are marked with green rectangles containing
the part-ID.
presented in (Bachmann and Dang, 2008) the assump-
tion can be made that the part locations are indepen-
dent
P
M
(Φ) =
n
∏
i=1
P
M
(φ
i
). (15)
Here, only the metric height above the estimated
road plane has been used as structural informa-
tion. Maximizing P
M
(Φ|Y) is particularly easy as
P
M
(Y|Φ)P
M
(Φ) can be solved independently for
each φ
i
. For n parts and N possible locations in the
image this can be done in O(nN) time. A major draw-
back of this method is that it encodes only weak spa-
tial information and is unable to accurately represent
objects composed of various parts.
The most obviousapproachto represent multi-part
objects is to make no independenceassumption on the
locations of different parts. Though theoretically ap-
pealing the question of how to efficiently perform in-
ference on this spatial prior is not trivial.
A balance between the inadequate independence
assumption and the strong but hard to implement
full dependency between object parts is assumed
by maintaining certain conditional independence as-
sumptions. These assumptions can be elegantly rep-
resented using an MRF where the location of part φ
i
is
independent of the values of all other parts φ
j
, j 6= i,
conditioned on the values of the neighbours G
i
of φ
i
in
an undirected graph G(Φ, E). The structural prior is
characterised by pairwise only dependencies between
parts.
Sparse Object Model. The spatial prior is modeled
as a star structured graph with the location of the ob-
ject parts being conditioned on the location of refer-
ence point φ
R
. For a better understanding φ
R
can be
interpreted as center of mass of the object. All ob-
ject parts arranged around φ
R
are independent of one
another. A similar model is also used by e.g. (Cran-
dall and Huttenlocher, 2007; Fischler and Elschlager,
1973). Let G = (Φ, E) be a star graph with central
node φ
R
. Graphical models with a star structure have
a straight forward interpretation in terms of the con-
ditional distribution
P
M
(Φ) = P(φ
R
)
n
∏
i=1
P
M
(φ
i
|φ
R
). (16)
Reference point φ
R
acts as the anchor point for all
neighbouring parts. The positions of all other parts
in the model are evaluated relative to the position of
this reference point. In this work we chose φ
R
to be
virtual, i.e. there exists no measurable quantity that
indicates the existence of the reference point itself.
We argue that this makes the model insensitive to par-
tial object occlusion and, therefore, to the absence of
reference points. P
M
(Φ) is modelled using a Mix-
ture of Gaussian (MoG). The model parameter subset
M = (s, ·), with mean µ
i,R
and covariance σ
i,R
stating
the location of φ
i
relative to the reference point φ
R
,
has been determined in a training stage.
An optimal object part configuration (see Equa-
tion (13)) can be written in terms of observing
an object at a particular spatial configuration Φ =
(φ
1
, .., φ
n
), given the observations Y in the image.
With the likelihood function of seing object part i at
position φ
i
(given by Equation (14)) and the structural
prior in Equation (16) this can be formulated as
P
M
(Φ|Y) ∝ P(φ
R
)Γ(φ
R
|Y) , (17)
where the quality of the reference point φ
R
relative to
all parts φ
i
within the object definition is written
Γ(φ
R
|Y) = max
φ
n
∏
i=1
P
M
(φ
i
|φ
R
)P
M
(Y|φ
i
). (18)
What we are interested in, is finding the best configu-
ration for all n parts of the object model relative to φ
R
.
To reduce computational costs only points are further
processed with a likelihood P
M
(Y|φ
i
) > T, where T
is the acceptance threshold for the object hypothesis
to be true. This results in a number of candidates m
for each object part i. As this is computationally in-
feasible (O(m
n
)) for large growing n we propose a
greedy search algorithm to maximise P
M
(Φ|Y) over
all possible configurations {φ
j
i
:i = 1, .., n; j = 1, .., m}
as outlined in Table1.
2.3 Context Information
The MRF presented above efficiently models local
image information consisting of low-level features
enriched by high-level category-specific information.
However, context information capturing the over-
all global consistency of the segmentation result has
been ignored so far. By introducing a set of seman-
tic categories into the segmentation process, it is now
possible to derive category-specific object character-
istics not only on a local, object-intrinsic level but
BAYESIAN SCENE SEGMENTATION INCORPORATING MOTION CONSTRAINTS AND CATEGORY-SPECIFIC
INFORMATION
295