ON THE DESIGN OF POPULATIONAL CLUSTERING

Leonardo Ramos Emmendorfer

Centro de Ciˆencias Computacionais, Federal University of Rio Grande, CEP 96201-900, Rio Grande, Brazil

Keywords:

Supervised clustering, Building blocks, Diversity.

Abstract:

The application of clustering algorithms for partitioning the population in evolutionary computation is dis-

cussed. Speciﬁc aspects which characterize this task lead to opportunities which can be explored by the

clustering algorithm. A supervised clustering algorithm is described, which illustrates the exploration of those

opportunities.

1 INTRODUCTION

The role of the population in evolutionary computa-

tion (EC) is to maintain the information acquired dur-

ing the search. The translation of the potentially huge

amount of information represented by the successive

populations into useful knowledge which can help

guiding the search has motivated the adoption of sta-

tistical learning models, under the framework of es-

timation of distribution algorithms (Muhlenbein and

Paaβ, 1996)(Etxeberria and Larra˜naga, 1999), and of

machine learning techniques like in (Michalski, 2000)

and (Miqu´elez et al., 2004).

Clustering the population into partitions is one of

the most widely adopted learning approaches in evo-

lutionary computation. The application of cluster-

ing in the EC context has many aims: maintaining

niches in order to preserve diversity and preventing

premature convergence; improving multimodal prob-

lem solving as in (Pe˜na et al., 2005) and (Streichert

et al., 2003), detecting promising areas of the search

space as in (Oliveira et al., 2004) or to improve build-

ing blocks and revealing the problem structure as in

(Emmendorfer and Pozo, 2009). In all those cases,

clustering was shown to be highly useful tool in EC.

An important issue about the application of clus-

tering in EC is the computational cost, since the total

number of generations of a single run might be very

high. Any competent clustering method could be ap-

plied, but evolutionary computation has some features

which should be better explored when choosing or

designing the clustering algorithm to be used in this

context. This careful exploration might allow one to

design a computationally less expensive clustering al-

gorithm without loss of effectiveness in the task.

The ﬁrst aspect we point out is the sequentiality of

EC, since a given population usually maintains some

degree of similarity with the previous one. From a

machine learning perspective, the sequence of popu-

lations in EC can be modeled as a data stream. An-

other aspect which should be better understood is

about how accurate must the clustering algorithm be

when applied to EC. Since more generations are to

come, this might be an opportunity to be relaxed when

trying to ﬁnd the best partitioning at every genera-

tion. This hypothesis being true, computational re-

quirements would potentially be reduced. A third as-

pect to be exploredis that it is relatively easy to detect,

even manually, the correct partitioning of small popu-

lations. The information supplied by this supervisory

data provides a great opportunity for a novel class of

clustering algorithms called learning from cluster ex-

amples or supervised clustering to be applied when

greater populations are required.

In supervised clustering, the algorithm is trained

with labeled data before applied to unlabeled data

(Finley and Joachims, 2005). Labeled data corre-

sponds to correct partitioning of complete data sets.

A slight difference exists between general supervised

learning and supervised clustering. In supervised

clustering, only supervisory information regarding to

which objects should be grouped is provided, and

there is not a predeﬁned set of classes, as required

by general supervised learning (Kamishima and Mo-

toyoshi, 2003).

This work discusses the application of supervised

clustering in the population of evolutionary algo-

rithms. In this context, a supervised clustering al-

gorithm can be provided with full labeled data about

previous populations. If, for instance, the inten-

289

Ramos Emmendorfer L..

ON THE DESIGN OF POPULATIONAL CLUSTERING.

DOI: 10.5220/0003114202890292

In Proceedings of the International Conference on Evolutionary Computation (ICEC-2010), pages 289-292

ISBN: 978-989-8425-31-7

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

tion is to split the population into groups possess-

ing different building blocks as in (Emmendorfer and

Pozo, 2009), then supervisory information is the pres-

ence/absence of each building block for each individ-

ual a given population. After trained, the supervised

clustering algorithm would be able to ﬁnd partitions

for greater populations under similar conditions, for

similar problems.

Those few points about opportunities related to

the application of clustering algorithms in evolution-

ary computation lead to the description of a cluster-

ing algorithm which can be more relaxed than oth-

ers, should deal with incremental data and also ex-

plore the potentially abundant supervisory informa-

tion available from the execution of evolutionary al-

gorithms on known problems. This paper proposes

an algorithm which attempts to fulﬁll those require-

ments, potentially increasing the effectiveness of the

clustering task when applied to EC.

2 SUPERVISED CLUSTERING

The search for the best partition of a given set of data

points is not a straightforward task. Even when a dis-

tance is known, many possible answers about what

is the correct clustering might be all equally likely.

Unsupervised clustering is an ill-deﬁned task if we

do not restrict the criteria used to characterize a good

clustering (Romer et al., 2004). The bias resulting

from the clustering algorithm behavior can, more or

less explicitly, impose some restrictions and guide the

search to one of the possible answers.

Several deﬁnitions exist for what is a good parti-

tioning. An unsupervised clustering algorithms fol-

lows an speciﬁc deﬁnition and tries to ﬁnd partitions

which respect criteria deﬁned a priori.

In supervised clustering, on the other hand, the

deﬁnition of good or bad clustering is implicit, hid-

den under the available labeled data. Supervised clus-

tering is the task of automatically adapting a cluster-

ing algorithm, which learns to cluster with the aid

of a training set consisting of item sets and complete

partitionings of those item sets (Finley and Joachims,

2005). A clustering algorithm is trained using known

“good” partitions of previously stored data. If the

algorithm generalizes well, it will be able ﬁnd clus-

ters when unlabeled data is provided. This technique

avoids most of the subjective aspects of clustering,

since the user beliefs about expected answers are ex-

pressed in training data.

A popular technique for solving supervised clus-

tering is based on building a binary classiﬁer from

pairwise relations observed in data (Iii et al., 2005).

For a given input set, a binary classiﬁer is trained on

all pairs of input data points. The class of each pair of

points is the binary information about the actual co-

membership of that pair. The answer of the classiﬁer

can be used as a metric or taken as the evidence that a

given pair of data points should be clustered together.

This learned metric is then adopted using some con-

ventional clustering algorithm, like k-means.

Depending on how speciﬁc the attributes are, the

binary classiﬁer will not be able to generalize to other

domains. Usually, the classiﬁer is built upon the orig-

inal attributes, what makesgeneralization restricted to

data which comes from the same domain as training

data comes from. Density-based derived attributes al-

leviate this problem, since the notion of density is not

tied to a speciﬁc set of attributes.

3 A SUPERVISED CLUSTERING

ALGORITHM APPLIED TO

EVOLUTIONARY

COMPUTATION

This section illustrates one possible scheme for the

design of a supervised clustering algorithm which ex-

plores some of the speciﬁc aspects of evolutionary

computation. The implications of the algorithm and

its adoption in EC are discussed.

The algorithm is trained over some small popu-

lations which were already clustered adequately. A

viable approach is to select a smaller instance of the

same problem, or a similar one, then run the evolu-

tionary algorithm in order to obtain a small popula-

tion. Each individual of the population must be (man-

ually or automatically) labeled, according to what one

believes to be the best clustering. For instance, if the

intention is to solve multimodal problems, then each

cluster corresponds to a different optimum.

A probabilistic model is inferred from pairwise in-

formation about co-membership. Each pair of data

points from the training set has a binary label which is

1 if both points belong to the same cluster and 0 oth-

erwise. Additionally, a pairwise neighborhood must

be deﬁned, which deﬁnes the local region around any

given pair of points. Many alternatives might be

tested. The Gabriel Graph (Urquhart, 1982) already

deﬁnes a neighborhood for a pair of points: it is re-

lated to the smallest hyperspherical region centered in

the the median between the pair of points, which in-

cludes those points. Attributes such as the density of

points in the neirghborhood will be computed. Other

supervised clustering algorithms like in (Kamishima

and Motoyoshi, 2003) also adopt attributes like this.

ICEC 2010 - International Conference on Evolutionary Computation

290

The relation between the density attributes

A(D

, D

) (which are deﬁned for all pairs of points

, D

)) and the co-membership label for all pairs

of data points in the training data set might be mod-

eled by a logistic regression, conditioned to the sat-

isfaction of the assumptions of the logistic model.

The answer of the model is the estimated probability

P(c(D

) = c(D

)) that D

and D

should be together,

J(c(D

) = c(D

)), deﬁned over two data points D

and D

, where c(X) designates the cluster label of a

data point X.

Once the probabilistic model is deﬁned, one can

get a partitioning for unobserved data by following

the supervised clustering approach.

Algorithm 1 shows a general framework which

is designed for the speciﬁc application of partition-

ing the population during the execution of an evo-

lutionary algorithm. It follows a simple agglomer-

ative approach, guided by the probabilistic decision

model

J(D

, D

) for pairs of points D

, D

in a very

straightforward fashion. It puts together in the same

cluster points with higher co-membership evidence

J(D

, D

). For each labeled point D

, the evidence

J(i, .) which motivated the setting of that label is pre-

served as E(D

). A point changes its cluster label only

if the new evidence is greater than the previous great-

est one, stored in E(D

Algorithm 1. A simple supervised clustering ap-

proach (SSC).

Training: The model for

J(D

, D

) is obtained from

full example partitions of some data sets.

Initialization: Each data point is (i) in its own clus-

ter initially, or (ii) cluster labels for some points are

given.

Set all E(D

)s to zero.

while convergence criteria were not met do

Randomly choose a pair of points D

and D

where c(D

) is a smaller cluster than c(D

)

J(D

, D

) > E(D

) and

J(D

, D

) > 0.5 then

c(D

) ← c(D

)

E(D

) ←

J(D

, D

)

end if

end while

Additionally, two points are clustered together

only if the evidence for that is greater than 0.5.

Individuals which stay in the population from one

generation to another can keep their cluster labels.

This preserves relevant information about clustering.

An explicit bias against small clusters is adopted, in

order to minimize the number of ﬁnal clusters.

Convergence criteria might be related to the per-

manence over time of a stable distribution of cluster

labels. Obviously, experimental veriﬁcation will an-

swer how fast is the convergence and how accurate is

the answer.

The incremental aspect of EC is explored, since

the initialization accepts some previously labeled in-

dividuals, which stay in the population due to elitism.

Also, the opportunity for training the algorithm with

small labeled populations is being attended by the su-

pervised architecture.

4 CONCLUSIONS

This paper discusses the application of clustering in

evolutionary computation, and points out opportuni-

ties which can be explored in order to design an effec-

tive clustering algorithm which is specially adapted

for the application.

For illustrating this cross study, an algorithm

is proposed, which explores some of the aspects

pointed out. The proposed algorithm must be em-

pirically compared to state-of-the-art clustering algo-

rithms when applied to the population partitioning

task. The effect on the performance of an evolu-

tionary algorithm will be measured for all algorithms

compared.

Although any conclusions can only be obtained

after validation, there is already some evidence about

the usefulness of this study. In (Emmendorfer and

Pozo, 2009), a k-means clustering algorithm is con-

tinuously applied to the population of a evolutionary

algorithm. Only few incremental steps of k-means are

reported to be enough in order to keep centroids up-

dated. Empirical validation must verify if the same

performance is obtained in the supervised approach

presented here.

REFERENCES

Emmendorfer, L. R. and Pozo, A. T. R. (2009). Effective

linkage learning using low-order statistics and clus-

tering. IEEE Transactions on Evolutionary Computa-

tion, 13(6):1233–1246.

Etxeberria, R. and Larra˜naga, P. (1999). Global opti-

mization using bayesian networks. In Second Sym-

posium on Artiﬁcial Intelligence (CIMAF-99), pages

332–339.

Finley, T. and Joachims, T. (2005). Supervised clustering

with support vector machines. In ICML ’05: Proceed-

ings of the twenty-second international conference on

Machine Learning, pages 217–224.

Iii, H. D., Marcu, D., and Cohen, W. (2005). A bayesian

model for supervised clustering with the dirichlet pro-

ON THE DESIGN OF POPULATIONAL CLUSTERING

291

cess prior. Journal of Machine Learning Research,

6:1577.

Kamishima, T. and Motoyoshi, F. (2003). Learning from

cluster examples. Machine Learninig, 53(3):199–233.

Michalski, R. S. (2000). Learnable evolution model: Evolu-

tionary process guided by machine learning. Machine

Learning, 38(1):9–40.

Miqu´elez, T., Bengoetxea, E., and Larra˜naga, P. L. (2004).

Evolutionary computation based on Bayesian classi-

ﬁers. International Journal of Applied Mathematics

and Compututer Sciences, 14(3):335–349.

Muhlenbein, H. and Paaβ, G. (1996). From recombination

of genes to the estimation of distributions: I binary

parameters. In Parallel Problem Solving from Nature

III, pages 178–187.

Oliveira, A. C. M. and Lorena, L. A. N (2004). Detecting

promising areas by evolutionary clustering search. In

Advances in Artiﬁcial Intelligence. Springer Lecture

Notes in Artiﬁcial Intelligence Series, pages 385–394.

Pe˜na, J., Lozano, J., and Larra˜naga, P. (2005). Globally

multimodal problem optimization via an estimation of

distribution algorithm based on unsupervised learning

of bayesian networks. Evolutionary Compututation,

13(1):43–66.

Romer, R. R., Achan, K., and Frey, B. (2004). Learning to

cluster using local neighborhood structure. In ICML

’04: Proceedings of the twenty-ﬁrst international con-

ference on Machine Learning.

Streichert, F., Ulmer, H., and Zell, A. (2003). A cluster-

ing based niching ea for multimodal search spaces.

In Proceedings of the 6th International Conference on

Artiﬁcial Evolution.

Urquhart, R. (1982). Graph theoretical clustering based

on limited neghbourhood sets. Pattern Recognition,

15(3):173-187.

ICEC 2010 - International Conference on Evolutionary Computation

292