sponding label C
k+1
, and runs M rounds of the algo-
rithm shown in figure 1.
The idea is to add the new class to the system tak-
ing advantage of the previous shared feature space de-
fined in the classification of the known tasks in the
first step. Provided an optimal binary grouping at
each step trained according a large enough number of
samples and classes, we assign the new class samples
to the positive or negative cluster minimizing the er-
ror criterion. The algorithm is iterated the same fixed
amount of times M. Notice that the method allows
the inclusion of many new tasks, given that the same
process can be iteratively repeated adding a new class
each time. This approach is computationally fast,
avoiding the most computationally expensive step of
finding the optimal binary subgroup.
3 EXPERIMENTS
The experiments have been performed using two dif-
ferent face databases: the Face Recognition Grand
Challenge (Phillips et al., 2005), and the AR Face
database (Martinez and Benavente, 1998). The idea
of the experimental section is to show the evolution
of the performance of our proposal as new classes are
added to the system. We compare our proposal with a
variation of the classic eigenface approach (Turk and
Pentland, 1991) followed by a NN classification rule.
We use PCA to extract 500 features from each data
set, and then a discriminant analysis step is performed
to obtain the 200 final features from each example.
The NDA algorithm has been used for this purpose,
which has been shown to improve the performance of
other classic discriminant analysis techniques (Bres-
san and Vitria, 2003) under the NN rule. The new
classes are added by projecting the training vectors on
the reduced space, and using this projected features as
a model for the new classification task.
Images from both data sets have been previously
converted from the original RGB space to gray scale.
Then we perform a geometric normalization using the
center coordinates of each eye. Images have been ro-
tated and scaled according to the inter-eye distance,
in such a way that the center pixel of each eye coin-
cides in all of them. The samples were then cropped
obtaining a 37× 33 thumbnail, therefore only the in-
ternal region of the faces has been preserved. The
final sample from each image is encoded as a 1221
feature vector. In Figure 2 some examples from both
databases are shown. From each data set, we have
used only images from subjects that contain at least
20 samples (10 for training and the rest for testing).
3.1 Results
The experiments have been repeated 10 times, the re-
sults shown in table 1 are the mean accuracies of each
method. The 95% confidence intervals are also shown
near each value. The experimental protocol follows
these steps for each database: (1) We randomly take
25 classes (people) from the data set. (2) We learn a
classifier using 10 training samples from the 25 peo-
ple. (3) We progressively add a new class without
retraining the system. The remaining samples from
each class are used for testing the resulting classifier.
The results with the FRGC database show an ac-
curacy close to 98% using our boosted approach for
the initial problem with 25 classes, while the applica-
tion of feature extraction methods with the NN classi-
fier obtains an initial 92%. This experiment suggests
that for a perfectly acquired and normalized set, the
use of shared boosting is the best option for multi-
class face problems. Figure 3 shows the accuracies as
a function of the number of classes added. The ac-
curacy on the first 25 steps remains constant given
that the classifier is initially trained on this subset.
Notice that from that point the accuracy decreases,
as expected, when new classes are added to the sys-
tem. This fact is due to 2 reasons: first, usually the
more classes has a classification problem the more
decreases the accuracy, and second, when new sam-
ples are added to the system, there is an implicit er-
ror given that the classifier has not been retrained.
Nevertheless, the accuracy does not decrease drasti-
cally, even when we increase the number of classes
an 800%.
On the other hand, we also show the absolute and
relative loss of accuracy as new classes are added (see
Table 1). For each data set we add up to the max-
imum number of classes (160 and 86 for the FRGC
and AR Face respectively) and take the resulting ac-
curacy. The absolute decrease is computed as the ac-
curacy using 25 classes minus the accuracy using the
maximum number of classes. The relative decrease
is computed as the absolute decrease divided by the
initial accuracy considering the 25 classes. Using our
approach the accuracy decreases less, specially in the
case of the AR Face data set, obtaining a more ro-
bust classification rule in presence of occlusions and
strong changes in the illumination.
The main advantage using our adding-class ap-
proach, is the reduction on the computational needs.
It has been shown experimentally that the use of joint
boosting achieves high accuracies in face classifica-
tion. Nevertheless, the computational cost makes the
method unfeasible when the problem has too many
classes. The clustering step to build binary problems
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
588