Singer, 1999), is used to learn these simple shape fea-
tures for each class. The idea of the proposed model,
which is combining and boosting interest point detec-
tor together with local descriptors for recognition, is
normally used for the generic recognition tasks using
2D images and has never been used with range im-
ages.
The outline of the paper is as follows. The related
work is summarized in section 2. Section 3 describes
the new 3D object category database. The proposed
generic 3D object recognition model is described and
explained in section4. Experimental evaluations and
results obtained are presented in section 5. Conclu-
sions are finally drawn in section 6.
2 RELATED WORK
Most of the recent researches and approaches in
generic object recognition have focused on model-
ing the appearance and shape variability of objects
with limited number of changes in viewing point
(e.g. (Fergus et al., 2003; Leibe et al., 2004)). One
main reason is that most of the current object cat-
egory datasets contain images with small variations
in viewing point (e.g. Caltech 4 and UIUC cars).
A small number of research have investigated the
problem of generic 3D object recognition. One of
these approaches is presented by Savarese and Fei-
Fei (Savarese and Fei-Fei, 2007). In their approach,
a model of an object category is captured by linking
together diagnostic parts of the objects from different
viewing points. These parts are large and discrimi-
native regions of the objects and consists of many lo-
cal invariant features. To form a model of the object
class, the parts are connected through their mutual ho-
mographic transformation. The resulting model is a
summarization of both appearance and geometry in-
formation of the object class. In addition to that,
(Savarese and Fei-Fei, 2007) introduced a new 3D
object dataset. However, the approach presented in
this paper is totally different from the approach of
(Savarese and Fei-Fei, 2007). The main difference is
that range images are used in our proposed approach,
which is not the case in (Savarese and Fei-Fei, 2007)
as they use 2D images. Furthermore, only surface
shape features are used here to represent the instances
of the object classes while no appearance information
is used.
Another approach, which is closer to the work pre-
sented in this paper, is described in (Ruiz-correa et al.,
2003). The approach developed to recognize objects
belonging to a particular shape class in range images.
In their approach, first, shape class components are
learnt and extracted from range images. Then, the
spatial relationships among the extracted components
are encoded using a shape representation called sym-
bolic surface signature. This results in forming a
shape class model that consists of three-level hierar-
chy of classifiers where the first two levels of the hi-
erarchy extract the component and the third one ver-
ifies their geometric relationships. The dataset used
for the purpose of learning and classifying the model
is range images of objects made of clay. The dataset
is then enlarged by applying deformations to the orig-
inal clay objects to offer intra-class variabilities.
Although our proposed approach agrees with the
approach of (Ruiz-correa et al., 2003) in that sur-
face shape descriptors are used to represent the object
classes in real range images, there exist main impor-
tant differences between the two approaches. First,
a combination of three different simple local surface
features is used in our approach as a representation of
the instance of the different object categories. Sec-
ond, learning is performed here using boosting which
is different from the learning technique, namely Sup-
port Vector Machines (SVM), used in (Ruiz-correa
et al., 2003). Moreover, a dataset of real range im-
ages and of real different object categories is used in
our approach. The dataset contains large intra-class as
well as inter-class variabilities, so it is not necessary
to apply any deformation to enlarge it.
3 3D OBJECT CATEGORY
DATASET
An object category database of 936 2D/3D images
(2D grayscale as well as range data) of 26 objects (36
images per object) is built using a 3D Time-of-Flight
PMD camera (Lange, 2000). The objects are in-
stances of three main visual categories (classes): cars,
motors and animals. A fourth class is constructed to
be used as a background or a negative class during
training and testing. This background class consists
of objects which are visually different from the
objects instances of the three main classes.
Due to the difficulty to record different outdoor
views of real objects using the PMD camera
2
, human
made objects (toys) are used to build the database.
The instances of each object class are chosen with
different sizes and appearances to achieve large
intra-class variability as much as possible.
2
Settings required to use a PMD camera make it difficult
to acquire outdoors views of real objects.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
322