Table 3: The running time of the seven clustering initialization methods on the used datasets.
Average Running Time (Mintues)
KM Entropy Density Random Forgy
Value-
Attributes
K-
Prototype
Soybean
2.90 3.11 3.07 2.42 2.37 3.03 3.33
B. Cancer
11.81 12.20 11.89 9.34 9.22 11.84 12.23
Spect Heart 4.82 5.17 4.98 4.23 4.18 4.87 5.21
St. Heart
5.33 5.64 5.37 4.83 4.79 5.32 5.68
Zoo
3.40 3.62 3.43 3.21 3.17 3.41 3.66
Liver
6.91 7.10 6.93 6.67 6.59 6.89 7.24
HServival
6.27 6.68 6.31 5.82 5.66 6.29 6.72
Dermatology
9.62 9.89 9.67 9.32 9.27 9.62 9.95
L. Cancer
3.70 3.87 3.79 3.49 3.23 3.73 3.93
Computer
5.20 5.48 5.37 4.89 4.82 5.33 5.57
centers. Previous work has shown that using
multiple clustering validity indices in a
multiobjective clustering model (e.g., MODEK-
Modes model) yields more accurate results than
using a single validity index. Thus, we proposed to
enhance the performance of MODEK-Modes model
by introducing two new initialization methods.
These two proposed methods are K-Modes
initialization method and entropy initialization
method. The two proposed methods have been tested
using ten benchmark real life datasets obtained from
the UCI Machine Learning Repository. We applied
t-test to check the significance of the results. Based
on the experimental results, the two initialization
methods achieved a significant improvement in the
clustering performance compared to the other
initialization methods. The KM method achieved a
significant improvement in the clustering
performance of 8 datasets, while the entropy method
improved the clustering performance in 7 datasets.
The time and space complexity of our proposed
methods are analyzed, and the comparison with the
other methods demonstrates the effectiveness of our
methods. For further work, the proposed two
initialization methods can be extended to deal with
the numerical datasets by replacing k-modes by the
k-means algorithm.
REFERENCES
Ammar E. Z., Lingras P., 2012, K-modes clustering using
possibilistic membership, IPMU 2012, Part III, CCIS
299, pp. 596–605.
Alvand M., Fazli S., Abdoli F. S., 2012, K-mean
clustering method for analysis customer lifetime value
with LRFM relationship model in banking
services, International Research Journal of Applied
and Basic Sciences, 3 (11): pp. 2294-2302.
Bai L., Liang J., Dang Ch., Cao F., 2012, A cluster centers
initialization method for clustering categorical data,
Expert Systems with Applications, 39, pp. 8022–8029.
Bai L., Lianga J., Dang Ch., Cao F., 2013, A novel fuzzy
clustering algorithm with between-cluster information
for categorical data, Fuzzy Sets and Systems (215), pp.
55–73.
Ball G. H., Hall D. J., 1967, A clustering technique for
summarizing multivariate data, Behavioral Science 2
(2) 153–155.
Bhagat P. M., Halgaonkar P. S., Wadhai V. M., 2013,
Review of clustering algorithm for categorical data,
International Journal of Engineering and Advanced
Technology, 3 (2).
Cao F., Liang J., Bai L., 2009, A new initialization method
for categorical data clustering, Expert Systems with
Applications, 36, pp. 10223–10228.
Cao F., Liang J., Li D., Bai L., Dang Ch., 2012, A
dissimilarity measure for the k-Modes
clustering algorithm, Knowledge-Based Systems 26,
pp. 120–127.
Gonzalez T., 1985, Clustering to minimize the maximum
intercluster distance, Theoretical Computer Science,
38 (2– 3), pp. 293–306.
Jancey R. C., 1996, Multidimensional group analysis,
Australian Journal of Botany, 14 (1), pp. 127–130.
Ji J., Pang W., Zheng Y., Wang Z., Ma Zh., Zhang L.,
2015, A novel cluster center initialization method for
the k-Prototypes algorithms using centrality and
distance, Applied Mathematics and Information
Sciences, No. 6, pp. 2933-2942.
Katsavounidis, C.-C. Kuo J., Zhang Z., 1994, A new
initialization technique for generalized Lloyd iteration,
IEEE Signal Processing Letters, 1 (10), pp. 144–146.
Khan Sh. S., Ahmed A., 2013, Cluster center initialization
algorithm for K-modes clustering, Expert Systems with
Applications, 40, pp. 7444–7456.