Table 3: Enhancement of the running time of KM, PSO,
PSC, TVAC, and CPSO with PR.
Data clustering (∆
T
)
Data KM PSO PSC TVAC CPSO
iris -84.6 -65.1 -72.5 -79.0 -76.8
wine -82.9 -68.4 -66.8 -82.1 -81.4
breast -79.6 -69.1 -62.9 -82.4 -81.7
Average -82.4 -67.5 -67.4 -81.2 -80.0
Image clustering (∆
T
)
Data KM PSO PSC TVAC CPSO
Lena -76.9 -72.9 -90.2 -90.0 -74.7
baboon -76.2 -72.3 -93.7 -89.6 -73.9
airplane -79.1 -72.1 -93.5 -88.4 -74.0
pepper -78.8 -74.7 -88.0 -87.8 -74.7
goldhill -77.0 -72.0 -90.9 -87.4 -74.1
boots -77.1 -72.2 -86.4 -88.3 -75.4
Average -77.5 -72.7 -90.5 -88.6 -74.5
Table 4: Enhancement of the quality of KM, PSO, PSC,
TVAC, and CPSO with PR.
Data clustering (∆
AR
)
Data KM PSO PSC TVAC CPSO
iris -4.9 0.1 0.8 -3.0 0.1
wine -4.8 -0.4 -0.1 -0.1 -0.1
breast -0.1 0.2 -0.6 -2.0 0.2
Average -3.3 -0.04 0.01 -1.7 0.1
Image clustering (∆
PSNR
)
Data KM PSO PSC TVAC CPSO
Lena 0.1 5.0 3.4 2.6 -0.5
baboon 1.4 3.2 1.3 -1.1 5.2
airplane 0.1 0.2 -1.1 0.5 -0.1
pepper -0.9 1.3 1.0 0.5 -0.9
goldhill 0.2 0.6 0.5 0.4 -0.5
boots 3.2 6.3 -5.0 -2.1 4.0
Average 0.7 2.8 0.02 0.1 1.2
β
φ
(new algorithm) with respect to β
ψ
(original algo-
rithm) in percentage, and it is defined by
∆
β
=
β
φ
− β
ψ
β
ψ
× 100% (8)
Note that for β = D, the larger the value of ∆
β
, the
greater the enhancement; for β = T, the smaller the
value of ∆
β
, the greater the enhancement. In addition,
for DS1, the quality of the clustering result is mea-
sured in terms of the accuracy rate (AR) defined by
AR =
∑
n
i=1
A
i
n
, (9)
where A
i
assumes one of the two values 0 and 1, with
A
i
= 1 representing the pattern x
i
is assigned to the
right cluster and A
i
= 0 representing the pattern x
i
is
assigned to the wrong cluster. For DS2, the quality of
the end result is measured using peak-signal-to-noise
ratio (PSNR).
4.1 The Simulation Results
Our simulation contains KM, PSO, PSC, TVAC,
CPSO, and MPREPSO for both the data and image
clustering problems in terms of both the running time
and the quality (measured, respectively, by AR and
PSNR). The detection operator of MPREPSO consid-
ers a pattern as static if its distance to the centroid is
no larger than γ = µ − σ and if it stays in the same
group for two iterations. Our simulation results show
that k-means (KM) is faster than the other PSO-based
algorithms in most cases. However, our simulation re-
sults show further that all the PSO-based algorithms
give better results than KM for most of the datasets
evaluated.
Tables 3 and 4 compare the proposed algorithm
MPREPSO with the other algorithms in terms of both
the running time and the quality. Table 3 shows that
the proposed algorithm can reduce the computation
time of these clustering algorithms from 67% up to
90% on average, especially for large datasets. For ex-
ample, as the results of Table 4 show, the proposed
algorithm can reduce more of the computation time
of PSO for DS2 than for DS1 because the data size of
DS2 is larger than that of DS1. Moreover, for some
datasets, the proposed method will degrade the qual-
ity of the end results, though by no more than 4%
on average. For the others, the proposed method can
even enhance the quality of the end result by about
0.01% up to 2.78%, especially for DS2. Our obser-
vation shows that these enhancements are due to the
fact that both sampling and multi-start are used by
the proposed algorithm to improve the quality of the
end result. A closer look at the results shows that the
proposed algorithm can reduce most of the computa-
tion time of PSO and its variants in computing fitness
and updating membership of patterns. However, since
each PSO-based algorithm may use different opera-
tors, the amount of time that MPREPSO can reduce
is different.
5 CONCLUSIONS
This paper presents a method, based on the notion of
pattern reduction, to reduce the running time of PSO-
based clustering algorithms. The simulation result
shows that many of the computations on the conver-
gence process of PSO are essentially redundant and
can be detected and eliminated. The simulation re-
sult shows further that the proposed algorithm can
not only significantly reduce the computation time of
PSO-based algorithms for clustering problems, it can
also provide better results than the other algorithms
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
154