ona et al., 2009) to be one of the best approaches, it
raises many false alarms. This work has however been
extended recently in (Bayona et al., 2010). They pre-
vent false alarms caused by moving objects by build-
ing a mask of moving regions.
Effort has been done to try to achieve robust-
ness to occlusions. However, methods based on sub-
sampling, which were proved to perform the best
(Bayona et al., 2009), rely on the on logical opera-
tions which cannot guarantee that the same object is
observed.
In the past years, multi-camera object localisation
has already been studied. In (Beynon et al., 2003) the
authors make a ground plane assumption and can thus
easily retrieve the world coordinates. A cost func-
tion based on colour, blob area and position is built to
measure the similarity of 2D observations to already
observed 3D world objects. They use a linear assign-
ment problem algorithm to perform an optimal asso-
ciation between observations and tracked objects. In
(Miezianko and Pokrajac, 2008) the authors also as-
sume that the 3D scene is planar. Once they have lo-
cated an object in a camera, it is projected onto the 2D
plane using a homography. The location of objects
are the local maxima of overlap in the orthoimage. In
(Utasi and Csaba, 2010) the authors define an energy
function based on geometric features depending on
the position and height of objects and which is max-
imal for the real configuration. The optimal configu-
ration is found using multiple death and birth dynam-
ics, an iterative stochastic optimisation process. In
(Fleuret et al., 2008) the authors discretise the ground
plane into a grid. A rectangle modelling a human sil-
houette is projected on cameras from each position
on the grid. This serves as an evidence of the occu-
pancy of the ground by a person. In (Khan and Shah,
2009) the authors introduce a planar homographic oc-
cupancy constraint which fuses foreground informa-
tion from multiple cameras. This constraint brings
robustness to occlusion and allows the localisation of
people on a reference plane.
Among these methods some assume that the 3D
world is planar through the use of homographies,
other because they have to reduce the search space
for their optimisation process. We will propose a di-
rect matching method which enables the computation
of 3D positions and heights of stationary objects.
3 OBJECT DETECTION
Our stationary object detection algorithm can be di-
vided into three main steps. First a background sub-
traction stage generates an image containing the age
of the re-identified foreground. Then this informa-
tion is used to generate a segmentation of the visible
stationary objects. Finally one binary mask for each
stationary object is updated.
3.1 Background Subtraction
We use the background subtraction algorithm from
(Guillot et al., 2010) and extend it by building also
a foreground model. The original image is tiled as
a regular square grid of 8 × 8 blocks on which over-
lapping descriptors are computed. The background
subtraction therefore generates an image whose pixels
can be assimilated to the blocks of the original image.
To this aim, a descriptor is computed at each
block. If it doesn’t match the background model then
it is checked against the foreground model. If a match
is found in the foreground model then it is updated,
otherwise a new foreground component is created and
its time of creation is recorded. The foreground model
at a specific block is emptied when background is ob-
served. Thus, the output of the background subtrac-
tion stage is an image whose pixels contain 0 when
background is observed, or the age of the foreground
descriptor.
3.2 What We Want to Segment
Segmenting unknown stationary objects is a very dif-
ficult problem which we will not try to address in the
general case. For instance, if two objects appear at
the same time and are detected as a single blob in the
image, we do not try to separate them. What we want
to do is to give different labels to objects appearing
at different times while giving a single label to an ob-
ject appearing under partial occlusion (eg: a man par-
tially occludes a baggage then leaves). This is not an
easy task since at a block level it is impossible to state
whether we are observing an object or an occluder.
To this aim we construct in 3.3 an energy func-
tion under the following assumption. Blocks should
be grouped under a same label l when l is a compati-
ble label for all these blocks.
3.3 Segmentation
Markov Random Fields are widely used in image seg-
mentation when the problem can be written as the
minimisation of an energy function. Let G = (V, E)
be a graph representing an image. Each vertex v ∈ V
corresponds to a pixel of the image, and each edge
e ∈ E ⊆ V ×V corresponds to a neighbourhood rela-
tion. Let L be a set of labels.Each labelling x ∈ L
|V |
is assigned an energy, which we try to minimise. The
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
592