final decision. For example, the input sample is as-
signed to the class with the closest codeword, accord-
ing to a distance measure, such as the Hamming one.
In this framework, in (Dietterich and Bakiri, 1995)
the authors proposed an approach, known as error-
correcting techniques (ECOC), where they employed
error-correcting codes as a distributed output repre-
sentation. Their strategy was a decomposition method
based on the coding theory that allowed obtaining a
recognition system less sensitive to noise via the im-
plementation of an error-recovering capability. Al-
though the traditional measure of diversity between
the codewords and the outputs of dichotomizers is
the Hamming distance, other works proposed differ-
ent measures. For example, Kuncheva in (Kuncheva,
2005) presented a measure that accounted for the
overall diversity in the ensemble of binary classifiers.
The last approach is called n
2
classifier. In this
case the recognition system is composed of (n
2
−n)/2
base dichotomizers, where each one is specialized
in discriminating respective pair of decision classes.
Then, their predictions are aggregated to a final deci-
sion using a voting criterion. For example, in (Jelonek
and Stefanowski, 1998) the authors proposed a voting
scheme adjusted by the credibilities of the base classi-
fiers, which were calculated during the learning phase
of the classification.
This short description of the methods so far shows
that the recognition systems based on decomposition
methods are constituted by an ensemble of binary dis-
criminating functions. On this motivation, for brevity
such systems are referred to as Multy Dichotomies
System (MDS) in the following.
In the framework of the one-per-class approach,
we present here a novel reconstruction rule that re-
lies upon the quality of the input pattern and looks
at the reliability of each classification act provided
by the binary modules. Furthermore, the classifica-
tion scheme that we propose allows employing either
a single expert or an ensemble of classifiers internal
to each module that solves a dichotomy. Finally, the
effectiveness of the recognition system has been eval-
uated on four different datasets that belongs to biolog-
ical and medical applications.
The rest of the paper is organized as follows: in
the next section we introduce some notations and
we present general considerations related to the sys-
tem configuration. Section 3 details the reconstruc-
tion method and section 4 describes and discusses
the experiments performed on four different medical
datasets. Finally section 5 offers a conclusion.
2 PROBLEM DEFINITION
2.1 Background
Let us consider a classification task on c data classes,
represented by the set of labels Ω = {ω
1
,· ·· , ω
c
},
with c > 2. With reference to the one-per-class ap-
proach, the multiclass problem is reduced into c bi-
nary problems, each one addressed by one module of
the pool M = {M
1
,· ·· , M
c
}. We say that the module,
or the dichotomizer, M
j
is specialized in the jth class
when it aims at recognizing if the sample x belongs
either to the jth class ω
j
or, alternatively, to any other
class ω
i
, with i 6= j. Therefore each module assigns
to the input pattern x ∈ ℜ
n
a binary label:
M
j
(x) =
1 if x ∈ ω
j
0 if x ∈ ω
i
, i 6= j
(1)
where M
j
(x) indicates the output of the jth module on
the pattern x. On this basis, the codeword associated
to the class ω
j
has a bit equal to 1 at the jth position,
and 0 elsewhere.
Notice that we have just mentioned module and
not classifier to emphasize that each dichotomy can
be solved not only by a single expert, but also by an
ensemble of classifiers. However, to our knowledge,
the system dichotomizers typically adopt the former
approach, i.e. they are composed by one classifier
per specialized module. For example, for their exper-
imental assessments the authors used a a decision tree
and a multi layer perceptrons with one hidden layer
both in (Mayoraz and Moreira, 1997) and (Masulli
and Valentini, 2000), respectively. The same func-
tions were employed by Dietterich and Bakiri for the
evaluation of their proposal in (Dietterich and Bakiri,
1995), whereas Allwein et al. used a Support Vector
Machine (Allwein et al., 2001). A viable alternative to
using a single expert is the combination of classifiers
outputs solving the same recognition task. The idea is
that the classification performance attainable by their
combination should be improved by taking advan-
tage of the strength of the single classifiers. Classi-
fier selection and fusion are the two main combina-
tion strategies reported in the literature. The former
presumes that each classifier has expertise in some
local area of the feature space (Woods et al., 1997;
Kuncheva, 2002; Xu et al., 1992). For example, when
an unknown pattern is submitted for classification, the
more accurate classifier in the vicinity of the input is
selected to label it (Woods et al., 1997). The latter al-
gorithms assume that the classifiers are applied in par-
allel and their outputs are combined to attain some-
how a group of “consensus” (De Stefano et al., 2000;
Kuncheva et al., 2001; Kittler et al., 1998). Typi-
EXPERIMENTS ON SOLVING MULTICLASS RECOGNITION TASKS IN THE BIOLOGICAL AND MEDICAL
DOMAINS
65