tures. Several interest point detectors can be used.
Morevec’s interest operator (Moravec, 1979), Harris
corner detector (Harris and Stephens, 1988), Kanade-
Lucas-Tomasi (KLT) detector (Shi and Tomasi, 1994)
and Scale-invariant feature transform (SIFT) detector
(Lowe, 2004) are often mentioned in the literature.
However in experiments presented here, SIFT detec-
tor has been used. According to the survey by Miko-
lajczyk and Schmid (Mikolajczyk and Schmid, 2003)
this detector outperforms most point detectors and is
more resilient to image deformations.
SIFT features were introduced by Lowe (Lowe,
2004) in 1999. They are local and based on the ap-
pearance of the object at particular interest points.
They are invariant to image scale and rotation. In
to these properties, they are also highly distinctive,
relatively easy to extract and are easy to match against
large databases of local features.
As seen on the left image in Figure 2 features are
extracted not only on moving objects but also all over
the scene. Previously computed rectangles around the
objects which were detected by background subtrac-
tion are used to cluster and to select features of inter-
est. These are features which are located on moving
objects. However one could also think of clustering
the features due to the length and angle of the trans-
lation vectors defined by two corresponding features
in two images. Features located on moving objects
produce longer translational vectors as those located
on the background. In such an approach no back-
ground subtraction would be necessary and a lot of
computational cost would be saved. This method was
also implemented but led to very unstable results and
imposed too many constraints on the scene and on
the object motion. For example to separate features
which represent a slowly moving object from those
which lie on the background it was necessary to de-
fine a threshold. It is easy to see that this approach
is not applicable when for example an object stops
moving before changing its direction. Later it will be
shown that the results of the background subtraction
method are also used to generate a model of an object.
Although the proposed tracking method was suc-
cessfully tested in an indoor environment, it does have
several drawbacks. The method tracks objects based
on the features which have been extracted and tracked
to the next image. However point correspondences
can not be found in every situation. It especially be-
comes complicated in the presence of partial or full
occlusions of an object. The same problem arises due
to changing appearance of an object. Usually an ob-
ject does have different views. If it rotates fast only
few or no point correspondences at all can be found.
The above problems could be solved if a mul-
tiview appearance model of a tracked object would
be available. In that case an object could have been
detected in every image, hereby improving the per-
formance of the tracking algorithm. Usually such
models are created in an offline phase. However, in
most applications it is not practicable. Such an ap-
proach also restricts the number of objects which can
be tracked. Depending on the application, an expert
usually defines the amount and the kind of objects to
be tracked and trains the system to recognize these ob-
jects. Next section describes how a multiview appear-
ance model of a previously unknown object is gener-
ated online.
4 GENERATION OF AN OBJECT
MODEL
Usually objects appear different from different views.
If such an object performs a rotational movement the
system has to know the different views of that ob-
ject so that it can track it. A suitable model for that
purpose is a so called multiview appearance model.
A multiview appearance model encodes the different
views of an object so that it can be recognized in dif-
ferent positions.
Several works have been done in the field of ob-
ject tracking using multiview appearance models. For
example Black and Jepson (Black and Jepson, 1998)
proposed a subspace based approach, where a sub-
space representation of the appearance of an object
was built using Principal Component Analysis. In
2001, Avidan (Avidan, 2001) used a Support Vector
Machine classifier for tracking. The tracker was pre-
viously trained based on positive examples consist-
ing of images of an object to be tracked and negative
examples consisting of things which were not to be
tracked. However, such models are usually created
in an offline phase, which is not practicable in many
applications.
In the approach presented here a multiview ap-
pearance model of a previously unknown object is
generated online. The input to the model genera-
tion module consists of all moving objects detected
by background subtraction and all correspondences
established through the object tracker described in
section 3. Figure 3 shows graphically the procedure
which is followed by the model generation module.
After background subtraction was performed and
moving objects were detected in the image, image re-
gions representing those objects were used to build
models of them. In the database a model of an ob-
ject was stored as a collection of different views of
that object. The database was created and updated us-
ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL
395