jects that are not very fast, but might fail in other con-
ditions. The method proposed by (Chen et al., 2001)
formulates the tracking problem as a bipartite graph
matching, solving it by the Hungarian algorithm. It
recognizes an occlusion, but is able to preserve the
object identities only if the horizontal projection of
the detected blob shows a separate mode.
In the second category, detection and tracking are
performed at once, usually on the basis of an object
model that is dynamically updated during the track-
ing. These methods are computationally more expen-
sive, and often have problems with the initial defini-
tion of the object models, that in some cases has to
be provided by hand. The paper by (Comaniciu et al.,
2000) proposes the use of Mean Shift, a fast, iterative
algorithm for finding the centroid of a probability dis-
tribution, for determining the most probable position
of the tracking target. It requires a manual selection
of the objects being tracked in the initial frame, and
deals only with partial occlusions. (Tao et al., 2002)
have proposed a method based on a layered represen-
tation of the scene, that is created and updated using
a probabilistic framework. Their method is able to
deal with occlusions, but is extremely computational
expensive. The method by (Wu and Nevatia, 2005)
tracks people in a crowded environment. However it
uses an a priori model of a person, that is not extend-
able to other kind of objects. Several recent methods
(Bazzani et al., 2010) have investigated the use of Par-
ticle Filters, that are a tool based on the approximate
representation of a probability distribution using a fi-
nite set of samples, for solving the tracking problem
in a Bayesian formulation. Particle Filters look very
promising, since they make tractable a very general
and flexible framework. However, the computational
cost is still too high for real-time applications, espe-
cially with multiple occluding targets.
In this paper we propose a real-time tracking algo-
rithm belonging to the first category; it assumes that
an object detection based on background subtraction
generates its input data. The algorithm is robust with
respect to the errors generated by the object detection
(spurious or missing objects, split objects) and is able
to work with partial and total occlusions.
Most of the algorithms in the first category make
their tracking decisions by comparing the evidence at
the current frame with the objects known at the pre-
vious one; all the objects are dealt with in the same
way, ignoring their past history that can give useful
hints on how they should be tracked: for instance, for
objects stable in the scene, information such as their
appearance should be considered more reliable.
To exploit this idea, the algorithm adopts an ob-
ject model based on a set of scenarios, in order to dif-
ferently deal with objects depending on their recent
history; the scenarios are implemented by Finite State
Automata, each describing the different states of an
object and the conditions triggering the transition to
a different state. The state is used both to influence
which processing steps are performed on each object,
and to choose the most appropriate value for some of
the parameters involved in the processing.
2 THE PROPOSED METHOD
Before starting the description of the algorithm, we
need to introduce some terminology and notations.
A blob is a connected set of foreground pixels pro-
duced by a detection algorithm, which usually finds
the foreground pixels by comparing the frame with a
background model; then the foreground pixels are fil-
tered to remove noise and other artifacts (e.g. shad-
ows); finally, the foreground pixels are partitioned
into connected components, which are the blobs. The
tracking algorithm receives in input the set of blobs
detected at each frame. We assume that the detection
phase uses a dynamic background model dealing with
lighting changes; noise reduction, shadow and small
blob removal are further carried out. See details in
(Conte et al., 2010).
An object is any real-world entity the system is in-
terested in tracking. Each object has an object model,
containing such information as the object class (e.g.
a person or a vehicle), state (see subsection 2.1),
size, position, trajectory and appearance (see subsec-
tion 2.4). A group object corresponds to multiple real-
world entities tracked together; if a group is formed
during the tracking (i.e. it does not enter the scene as
a group), its object model mantains a reference to the
models of the individual objects of the group.
The task of the tracking algorithm is to associate
each blob to the right object, in such a way as to
preserve the identity of real-world objects across the
video sequence; in the process the algorithm must
also create new object models or update the existing
ones as necessary.
In real cases, the detection phase produces some
common errors:
• Spurious Blobs, i.e. blobs not corresponding
to any object; they can be caused by lighting
changes, movements of the camera or of the back-
ground, and other transient changes that the detec-
tion algorithm was not able to filter out;
• Ghost Blobs, i.e. blobs appearing where there
was an object previously considered as part of the
background, that has moved away (e.g. a parked
car that starts moving);
TRACKING INTERACTING OBJECTS IN COMPLEX SITUATIONS BY USING CONTEXTUAL REASONING
105