co-association matrix after removal of ambiguities.
Since this solution is achieved in absence of ambigu-
ous patterns, we assume a more robust representation
of the surrogate classes is attained in the output clus-
ters. Of course, a certain error is made by the cluster-
ing process, which can be measured if reference class
labels are available for a dataset, by calculating the
normalised mutual information (NMI) between the
cluster solution and the reference labels. However,
we ignore this error, assuming that the class structure
is adequately covered in the cluster solution. Next,
we assign a different “virtual” label to each one of
the obtained clusters - or classes. Thereby, a training
set is implicitely generated in an unsupervised man-
ner, using only the information in the ensemble - the
only supervised action involved in the whole process
is the selection of an ANCO threshold for detecting
ambiguous patterns, but we have shown (Section 4)
how this threshold can be easily determined by us-
ing histograms. This automatically generated train-
ing set is then used to train a model based on Support
Vector Machines to find the hyperplanes separating
the (virtual) classes. Finally, the SVM model is ap-
plied to make predictions on the ambiguous patterns,
previously removed from the ensemble. Hence, the
SVM decides which cluster in the consensus solution
an ambiguous pattern should be reallocated to.
6 EVALUATION AND TESTS
For evaluation purposes, we compared the clustering
solutions with the reference category labels, which
are available for all analysed data sets. There are dif-
ferent external validation metrics which can be used
to measure the correspondence between a cluster par-
tition and the reference labels, including entropy, pu-
rity (Boley et al., 1999), or the Normalised Mutual
Information (NMI, Equation 2). In this paper we se-
lected the latter one due to its property of impartiality
versus the number of clusters, in contrast to entropyor
purity, as suggested by Stern and Gosh (Strehl et al.,
2002).
We thus compared the NMI-based quality of the
ensemble consensus solutions (by using the agglom-
erative and pam algorithms as different consensus
clusterers applied to the co-association matrix) with
the values obtained by their respective base clusterers.
In addition, as is the focus of this work, we also eval-
uated the final ensemble solution when our scheme to
tackle ambiguities is introduced. In this respect, two
situations have been considered: (a) simple removal
of ambiguous patterns (in which case the category la-
bels corresponding to ambiguities have been also re-
moved from the reference label sets prior to test), and
(b) post-processing ambiguities with the help of sup-
port vector machines.
Tables 1 to 5 show the results obtained with the
evaluated approaches on the mixtures of Gaussians,
PENDIG, BREAST and WINE data sets. The first
rows show NMI values obtained by the base clusterers
(the complete, average and centroid linkages and the
partitioning around medoids applied on the original
matrix of object distances (ensemble components).
The second rowsrefer to the aggregate ensemble solu-
tions obtained by applying again the initial clustering
algorithms (complete, average, centroid linkage and
pam) as different consensus clusterers used to reclus-
ter the co-association matrices. The third rows indi-
cate the performance of the ensembles when the am-
biguity detection (AD) schemes are applied and the
ambiguous patterns are removed prior to consensus
clustering. Finally, the fourth and last rows show the
NMI values obtained by the final ensemble solutions
when the AD is introduced to detect ambiguities and
Support Vector Machines models are applied to post-
process and reallocate ambiguous data, using radial
and linear kernel functions (referred to as svm R and
L, respectively). Note also that the last columns in
each row refer to the average NMI scores of the four
clustering algorithms in each case.
As it can be observed, the ensemble approach out-
performs the corresponding base clusterers in all data
sets except WINE. The poorer performance in this
dataset can be associated to the inability of two of the
base clusterers (50% of the ensemble components) in
recovering any class structure (NMI values lower than
0.40%). This considerable proportion (50% of the en-
semble components) of “bad” clusterings has an im-
pact on the new co-association matrix, in such a way
that a third agglomerative approach fails to achieve an
adequate consensus, although the same algorithm was
originally able to recover more than 50% of the class
structure (NMI score) by using the object distance
matrix. On the other hand, note that the consensus
based on the partitioning around medoids algorithm
(pam) outperforms the corresponding base clusterer
in the ensemble (pam applied to original distances),
which also shows the best performance among the
base clusterers.
Nevertheless, the ensemble solutions outperform
the base components in all other data sets, where at
least 3/4 of the ensemble components are able to de-
tect some class structure (NMI values greater than
50%). The robustness of the ensemble solution is
also revealed by smaller standard deviations of NMI
scores across different consensus clusterings, in com-
parison to the respective base clusterings.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
628