Hall, 2001). By applying it in conjunction with a deci-
sion tree learner, the authors show that it outperforms
the naive approach, which treats the class values as
an unordered set. Compared to special-purpose algo-
rithms for ordinal classification, the method has the
advantage that it can be applied without any modi-
fication to the underlying learning scheme. The ra-
tionale encompasses using (K − 1) standard binary
classifiers to address the K-class ordinal data prob-
lem. Toward that end, the training of the i-th clas-
sifier is performed by converting the ordinal dataset
with classes C
1
, . . . , C
K
into a binary dataset, discrim-
inating C
1
, . . . , C
i
against C
i+1
, . . . , C
K
. To predict the
class value of an unseen instance, the (K − 1) outputs
are combined to produce a single estimation. Any bi-
nary classifier can be used as the building block of
this scheme. Observe that the (K − 1) classifiers are
trained in an independent fashion. This independence
is likely to lead to intersecting boundaries, a topic to
which we will return further on in this paper.
The Data Replication Method (Cardoso and
da Costa, 2007) overcomes the limitations identified
above by building all the boundaries at once. That
guarantees that the boundaries of the classifiers will
never intersect. This method, however, has limitations
with methods that build the decision function itera-
tively (and greedily), and therefore cannot be easily
mapped to ADABOOST.
In the ensemble approach to ordinal data classi-
fication, although not directly related to our work, it
is worth mentioning the work introducing global con-
straints in the design of decision trees (Cardoso and
Sousa, 2010; Sousa and Cardoso, 2011). The method
consists on growing a tree (or an ensemble of trees)
and relabeling the leaves according to certain con-
straints. Therefore, the trees are still built without
taking the order into account and only post-processed
to satisfy ordinality constraints. Moreover, the post-
processing is very computationally demanding, only
possible in low dimensional input spaces. More re-
cently, the combination of multiple orthogonal direc-
tions has been suggested to boost the performance of
a base classifier (Sun et al., 2014). Sequentially, mul-
tiple orthogonal directions are found; these different
directions are combined in a final stage.
There are also some boosting-related approaches
for ordinal ranking. For example, RankBoost (Fre-
und et al., 2003) approach is based on the pairwise
comparison perspective. Lin and Li proposed or-
dinal regression boosting (ORBoost) (Lin and Li,
2006), which is a special instance of the extended bi-
nary classification perspective. The ensemble method
most in line with our work is ADABOOST.OR (Lin
and Li, 2009). This method uses a primal-dual ap-
proach to solve an ordinal problem both in the bi-
nary space and the ordinal space, by taking into ac-
count the order relation when updating the binary
point’s weights. However, ADABOOST.OR is more
constrained than our proposed approach; while AD-
ABOOST.OR is closer to a single ADABOOST instan-
tiated with an ordinal data classifier, our approach is
closer to having multiple ADABOOST coupled in the
construction of the weak classifier.
2 BACKGROUND
In this section we start by analysing the Frank and
Hall’s approach to ordinal classification (Frank and
Hall, 2001), which facilitates the introduction of the
Data Replication Method (Cardoso and da Costa,
2007). The Data Replication Method is a framework
for ordinal data classification that allows the applica-
tion of most binary classification algorithms to ordinal
classification and imposes a parallelism constraint on
the resulting boundaries. In the end, we summarize
the ADABOOST ensemble method, paving the way to
the presentation of the proposed adaptation of AD-
ABOOST to ordinal data.
2.1 Frank and Hall Method
Suppose we want to learn a function f : X → Y ,
where X is our feature space and Y =
{
C
1
, C
2
, ..., C
K
}
is our output space, where our labels are ordered ac-
cording to C
1
≺ C
2
≺ ... ≺ C
K
. Also, assume that we
have a dataset D = (D, f ), where D ⊆ X is our set of
examples and f : D → Y gives us the label of each
example.
The Frank and Hall method transforms the K class
ordinal problem into (K − 1) binary problems by cre-
ating (K − 1) datasets D
k
= (D, f
k
) where:
f
k
(x) =
(
C
−
if f (x) C
k
C
+
if f (x) C
k
Intuitively, learning a binary classifier from each
of the D
k
datasets will create (K − 1) classifiers that
answer the questions “is the label of point x larger
than C
k
?”. This is to say, each classifier will give us
an estimate of P( f (x) C
k
).
Frank and Hall then propose that one finds the
P( f (x) = C
k
) using the usual rule:
1 − P( f (x) C
1
) if k = 1
P( f (x) C
k−1
) − P( f (x) C
k
) if k ∈ [2, K − 1]
P( f (x) C
K−1
) if k = K
oAdaBoost-AnAdaBoostVariantforOrdinalClassification
69