also be determined by any algorithm yielding a linear
classifier. Such classifiers do not only allow to easily
identify the main properties of the different classes,
but also allow to identify properties of potential sub-
classes.
1.1 2-class Eigen Transformation
Classification Trees
The new algorithm we introduce in this paper is the
2-class eigen transformation classification tree (2C-
ETCT) and is based on the eigenvalue-based classi-
fication tree (EVCT) algorithm (Plastria, De Bruyne
and Carrizosa 2008).
The first step in building a 2C-ETCT is to trans-
form the feature space using an ordered transforma-
tion matrix completely or partially based on an eigen
transformation. As the classification power of the tree
can be estimated quite accurately as a consequence of
the a priori fixed structure of the tree, the 2C-ETCT
algorithm allows many transformations to be used si-
multaneously and the algorithm will select the best
performing transformation automatically. After the
feature space has been transformed, the tree is grown.
Due to the fact that the transformation ordered the
new features by relevance, the selection of the split
feature will be very straightforward. The split in the
top node is done based on the first feature, in the
nodes on the second level the splits are done based
on the second feature, etc. Theoretically, the depth of
the tree can equal the number of features, but if the
tree ends up being very large, the instances are prob-
ably too dispersed for this algorithm to outperform
existing methods. In these cases it is probably bet-
ter to use another tree algorithm. The algorithm will
outperform if the data set has the structure described
in the introduction and when the main split happens
in the top node and very few splits are needed for the
additional clusters.
Once a splitting feature has been chosen, the ac-
tual splits are calculated by taking the midpoint of two
consecutive instances that minimizes the number of
misclassifieds. The splits are made using this criterion
and not the more popular information gain (Quinlan
1993) or Gini index (Breiman et al. 1984), because
the goal is to base the construction as much as possi-
ble directly on the classification power. We also don’t
need to use the criteria to select a feature to split on.
After the tree is grown, it is pruned. As all 2C-
ETCTs are pruned versions of the largest 2C-ETCT,
we can use an internal cross-validation to determine
the optimal size of the tree. This way the entire train-
ing data can be used in all stages of the construction
of the tree in a statistically sound way and should lead
to less overfitting than for example the estimated er-
ror rates used in C4.5 classification trees for pruning
(Quinlan 1993). As we are using the same principle
to build and to evaluate the classifier, this technique
should yield reliable outputs.
1.2 Eigen Transformations
We will be using six eigen transformations, which
were also used in (Plastria, De Bruyne and Carri-
zosa 2008). The first three do not start from a sep-
arate classifier, but will retain the first eigenvector to
perform the first split. The first one is the unsuper-
vised principal component analysis (Hotelling 1933;
Jolliffe 1986). The second and third transformation
are the supervised Fisher’s linear discriminant anal-
ysis (Fisher 1936) and the principal separation com-
ponent analysis (Plastria, De Bruyne and Carrizosa
2008). The last three start with a separate first vector.
In practice one might prefer a vector based on a pow-
erful classifier such as support vector machines, but
here we choose a straightforward vector given by the
means of the instances of the two classes.
Using the following notations
• A : the matrix of p
A
columns representing the in-
stances of the first set
• B : the matrix of p
B
columns representing the in-
stances of the second set
• T = [A, B] : the matrix of p
T
= p
A
+ p
B
columns
representing the instances of both sets
• For a general matrix M ∈ R
d×p
M
– d : the original dimension of the data (number
of attributes)
– Mean(M) ∈ R
d×1
: the mean of the instances
of M
– Cov(M) ∈ R
d×d
: the covariance matrix of M
– Mom(M) ∈ R
d×d
: the matrix of second mo-
ments (around the origin) of M
– Eig(M) ∈ R
d×d
: the matrix of eigenvectors of
M
we use the following eigen transformation matrices R:
• the transformation matrix based on principal com-
ponent analysis
R = Eig(Cov(T ))
• the transformation matrix based on Fisher’s linear
discriminant analysis
S
W
=
p
A
Cov(A) + p
B
Cov(B)
p
T
S
B
= Cov(T ) − S
W
R = Eig(S
−1
W
S
B
)
KDIR 2009 - International Conference on Knowledge Discovery and Information Retrieval
252