problems is that in most cases they generate a
number of clusters that is much larger than the real
one. Moreover, usually these algorithms do not
stabilize in a cluster solution, this is, they constantly
construct and deconstruct clusters during the
process. To overcome these difficulties and improve
the quality of results the authors proposed an
Adaptive Ant Clustering Algorithm - A2CA. A
modification included in the present approach is a
cooling program for the parameter that controls the
probability of ants picking up objects from the grid.
2.1 Parameters of the Neighborhood
Function
The clusters’ spatial separation on the grid is crucial
so that individual clusters are well defined, allowing
their automatic recovery. Spatial proximity, when it
occurs, may indicate a premature formation of the
cluster (Handl et al., 2006).
Defining the parameters for the neighborhood
function is a key factor in the cluster quality. In the
case of the σ perception radius it is more attractive to
employ larger neighborhoods to improve the quality
of clusters and their distribution on the grid.
However, this procedure is computationally more
expensive, once the number of cells to be considered
for each action grows quadratically with the radius
and it also inhibits the rapid formation of clusters
during the initial distribution phase. A radius of
perception that gradually increases in time
accelerates the dissolution of preliminary small
clusters (Handl et al., 2006). A progressive radius of
perception was also used by (Vizine et al., 2005).
Moreover, after the initial clustering phase,
(Handl et al., 2006) replaced the scalar parameter
2
1
by
occ
N
1
in equation (5), where N
occ
is the
number of grid cells occupied, observed within the
local neighborhood. Thus, only the similarity, not
the density, was not taken into account. Boryczka
(2009), in her algorithm ACAM, proposed to replace
the scalar
2
1
in equation in (5) by the scalar
, in
which
0
is the initial radius of perception.
According to (Handl et al., ,2006), α determines
the percentage patterns on the grid that rated as
similar. The choice of a very small value for α
prevents the formation of clusters on the grid. On the
other hand, choosing a value too large for α results
in the fusion of clusters.
Determining parameter of α is not simple and its
choice is highly dependent on the structure of the
data set. An inadequate value is reflected by an
excessive or extremely low activity in the grid. The
amount of activity is reflected by the frequency of
successful operations in the ant picking and
dropping. Based on these analyses, (Handl et al.,
2006) proposed an automatic adaptation of α.
Boryczka (2009) proposed a new scheme for
adjusting the value of α.
(Tan et al., 2007) examine the scalar parameter
of dissimilarity in Ant Colonies approaches for data
clustering. The authors show that there is no need to
use an automatic adaptation of α. They propose a
method to calculate a fixed α for each database. The
value of α is calculated regardlessly of the clustering
process.
To measure the similarity between patterns,
different metrics are used. (Handl et al., 2006) use
Euclidean distance for synthetic data and cosine for
real data. Boryczka (2009) tested different
dissimilarity measures: Euclidean, Cosine and
Gower measures.
2.2 The Basic Algorithm Proposed by
(Deneubourg et al., 1991)
At an initial phase, patterns are randomly scattered
throughout the grid. Then, each ant randomly
chooses a pattern to pick and is placed at a random
position on the grid.
In the next phase, called the distribution phase, in
a simple loop each ant is randomly selected. This ant
travels the grid running steps of length L in a
direction randomly determined. According to (Handl
et al., 2006), using a large step size speeds up the
clustering process. The ant then, probabilistically
decides if it drops its pattern at this position.
If the decision to drop the pattern is negative,
another ant is randomly chosen and the process
starts over. If the decision is positive, the ant drops
the pattern at its current position on the grid, if it is
free. If this grid cell is occupied by another pattern it
must be dropped at a free neighboring cell through a
random search.
The ant then seeks for a new pattern to pick.
Among the free patterns on the grid, this is, patterns
that are not being carried by any ant, the ant
randomly selects one, goes to its position on the
grid, evaluates of the neighborhood function and
probabilistically decide if it picks this pattern. This
choosing process of a free pattern on the grid runs
until the ant finds a pattern that should be picked.
Only then this phase is resumed, choosing
another ant until a stop criterion is satisfied.
PATTERN CLUSTERING USING ANTS COLONY, WARD METHOD AND KOHONEN MAPS
139