the concept of affine relations among binary clas-
sifiers and present a principled way to find groups
of high correlated dichotomies. Furthermore, we
present a strategy to reduce the number of required
dichotomies in the multi-class process.
Contemporary Vision and Pattern Recognition
problems such as face recognition, fingerprinting
identification, image categorization, DNA sequencing
among others often have an arbitrarily large number
of classes to cope with. Finding the right descriptor
is just a first step to solve a problem. Here, we show
how to use a small number of simple, fast, and weak
or strong base learners to get better results, no matter
the choice of the descriptor. This is a relevant issue
for large-scale classification problems.
We validate our approach using data sets from the
UCI repository, NIST, Corel Photo Gallery, and the
Amsterdam Library of Objects. We show that our ap-
proach provides better results than OVO, OVA, and
ECOC approaches based on other decoding strate-
gies. Furthermore, we also compare our approach to
Passerini et al. (Passerini et al., 2004), who proposed
a Bayesian treatment for decoding assuming indepen-
dence among all binary classifiers.
2 STATE-OF-THE-ART
Most of the existing literature addresses one or more
of the three main parts of a multi-class decomposi-
tion problem: (1) the ECOC matrix creation; (2) the
dichotomies choice; and (3) the decoding.
In the following, let T be the team (set) of used
dichotomies D in a multi-class problem, and N
T
be
the size of T . Recall that N
c
is the number of classes
1
.
There are three broad groups for reducing multi-
class to binary: One-vs-All, One-vs-One, and Error
Correcting Output Codes based methods (Pedrajas
and Boyer, 2006).
1. One-vs-All (OVA). Here, we use N
T
= N
c
=
O(N
c
) binary classifiers (dichotomies) (Clark and
Boswell, 1991; Anand et al., 1995). We train the
i
th
classifier using all patterns of class i as pos-
itive (+1) examples and the remaining class pat-
terns as negative (−1) examples. We classify an
input example x to the class with the highest re-
sponse.
2. One-vs-One (OVO). Here, we use
N
T
=
N
c
2
= O(N
2
c
) binary classifiers.
We train the ij
th
dichotomy using all patterns
of class i as positive and all patterns of class j
as negative examples. In this framework, there
1
In the Appendix, we provide a table of symbols.
are many approaches to combine the obtained
outcomes such as voting, and decision directed
acyclic graphs (DDAGs) (Platt et al., 1999).
3. Error Correcting Output Codes (ECOC). Pro-
posed by Dietterich and Bakiri (Dietterich and
Bakiri, 1996), in this approach, we use a coding
matrix M ∈ {−1, 1}
N
c
×N
T
to point out which
classes to train as positive and negative examples.
Allwein et al. (Allwein et al., 2000) have extended
such approach and proposed to use a coding ma-
trix M ∈ {−1, 0 , 1}
N
c
×N
T
. In this model, the
j
th
column of the matrix induces a partition of
the classes into two meta-classes. An instance x
belonging to a class i is a positive instance for
the j
th
dichotomy if and only if M
ij
= +1.
If M
ij
= 0, then it indicates that the i
th
class
is not part of the training of the j
th
dichotomy.
In this framework, there are many approaches
to combine the obtained outcomes such as vot-
ing, Hamming and Euclidean distances, and loss-
based functions (Windeatt and Ghaderi, 2003).
When the dichotomies are margin-based learners,
Allwein et al. (Allwein et al., 2000) have showed
the advantage and the theoretical bounds of us-
ing a loss-based function of the margin. Klau-
tau et al. (Klautau et al., 2004) have extended such
bounds to other functions.
Pedrajas et al. (Pedrajas and Boyer, 2006) have
proposed to combine the strategies of OVO and
OVA. Although the combination improves the over-
all multi-class effectiveness, the proposed approach
uses N
T
=
N
c
2
+ N
c
= O(N
2
c
) dichotomies
in the training stage. Moreira and Mayoraz (Mor-
eira and Mayoraz, 1998) also developed a combi-
nation of different classifiers. They have consid-
ered the output of each dichotomy as a probability
of the pattern of belonging to a given class. This
method requires
N
c
(N
c
+1)
2
= O(N
2
c
) base learners.
Athisos et al. (Athisos et al., 2007) have proposed
class embeddingsto choose the best dichotomies from
a set of trained base learners.
Pujol et al. (Pujol et al., 2006) have pre-
sented a heuristic method for learning ECOC ma-
trices based on a hierarchical partition of the class
space that maximizes a discriminative criterion.
The proposed technique finds the potentially best
N
c
− 1 = O(N
c
) dichotomies to the classifi-
cation. Crammer and Singer (Crammer and Singer,
2002) have proven that the problem of finding opti-
mal discrete codes is NP-complete. Hence, Pujol et al.
have used a heuristic solution for finding the best can-
didate dichotomies. Even such solution is computa-
tionally expensive, and the authors only report results
for N
c
≤ 28.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
324