tion is to split the population into groups possess-
ing different building blocks as in (Emmendorfer and
Pozo, 2009), then supervisory information is the pres-
ence/absence of each building block for each individ-
ual a given population. After trained, the supervised
clustering algorithm would be able to find partitions
for greater populations under similar conditions, for
similar problems.
Those few points about opportunities related to
the application of clustering algorithms in evolution-
ary computation lead to the description of a cluster-
ing algorithm which can be more relaxed than oth-
ers, should deal with incremental data and also ex-
plore the potentially abundant supervisory informa-
tion available from the execution of evolutionary al-
gorithms on known problems. This paper proposes
an algorithm which attempts to fulfill those require-
ments, potentially increasing the effectiveness of the
clustering task when applied to EC.
2 SUPERVISED CLUSTERING
The search for the best partition of a given set of data
points is not a straightforward task. Even when a dis-
tance is known, many possible answers about what
is the correct clustering might be all equally likely.
Unsupervised clustering is an ill-defined task if we
do not restrict the criteria used to characterize a good
clustering (Romer et al., 2004). The bias resulting
from the clustering algorithm behavior can, more or
less explicitly, impose some restrictions and guide the
search to one of the possible answers.
Several definitions exist for what is a good parti-
tioning. An unsupervised clustering algorithms fol-
lows an specific definition and tries to find partitions
which respect criteria defined a priori.
In supervised clustering, on the other hand, the
definition of good or bad clustering is implicit, hid-
den under the available labeled data. Supervised clus-
tering is the task of automatically adapting a cluster-
ing algorithm, which learns to cluster with the aid
of a training set consisting of item sets and complete
partitionings of those item sets (Finley and Joachims,
2005). A clustering algorithm is trained using known
“good” partitions of previously stored data. If the
algorithm generalizes well, it will be able find clus-
ters when unlabeled data is provided. This technique
avoids most of the subjective aspects of clustering,
since the user beliefs about expected answers are ex-
pressed in training data.
A popular technique for solving supervised clus-
tering is based on building a binary classifier from
pairwise relations observed in data (Iii et al., 2005).
For a given input set, a binary classifier is trained on
all pairs of input data points. The class of each pair of
points is the binary information about the actual co-
membership of that pair. The answer of the classifier
can be used as a metric or taken as the evidence that a
given pair of data points should be clustered together.
This learned metric is then adopted using some con-
ventional clustering algorithm, like k-means.
Depending on how specific the attributes are, the
binary classifier will not be able to generalize to other
domains. Usually, the classifier is built upon the orig-
inal attributes, what makesgeneralization restricted to
data which comes from the same domain as training
data comes from. Density-based derived attributes al-
leviate this problem, since the notion of density is not
tied to a specific set of attributes.
3 A SUPERVISED CLUSTERING
ALGORITHM APPLIED TO
EVOLUTIONARY
COMPUTATION
This section illustrates one possible scheme for the
design of a supervised clustering algorithm which ex-
plores some of the specific aspects of evolutionary
computation. The implications of the algorithm and
its adoption in EC are discussed.
The algorithm is trained over some small popu-
lations which were already clustered adequately. A
viable approach is to select a smaller instance of the
same problem, or a similar one, then run the evolu-
tionary algorithm in order to obtain a small popula-
tion. Each individual of the population must be (man-
ually or automatically) labeled, according to what one
believes to be the best clustering. For instance, if the
intention is to solve multimodal problems, then each
cluster corresponds to a different optimum.
A probabilistic model is inferred from pairwise in-
formation about co-membership. Each pair of data
points from the training set has a binary label which is
1 if both points belong to the same cluster and 0 oth-
erwise. Additionally, a pairwise neighborhood must
be defined, which defines the local region around any
given pair of points. Many alternatives might be
tested. The Gabriel Graph (Urquhart, 1982) already
defines a neighborhood for a pair of points: it is re-
lated to the smallest hyperspherical region centered in
the the median between the pair of points, which in-
cludes those points. Attributes such as the density of
points in the neirghborhood will be computed. Other
supervised clustering algorithms like in (Kamishima
and Motoyoshi, 2003) also adopt attributes like this.
ICEC 2010 - International Conference on Evolutionary Computation
290