K-modes and Entropy Cluster Centers Initialization Methods
Doaa S. Ali, Ayman Ghoneim, Mohamed Saleh
2017
Abstract
Data clustering is an important unsupervised technique in data mining which aims to extract the natural partitions in a dataset without a priori class information. Unfortunately, every clustering model is very sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models. Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model (e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study, we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The two proposed methods are the K-Modes initialization method and the entropy initialization method. The two proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine Learning Repository. Experimental results show that the two initialization methods achieve significant improvement in the clustering performance compared to other existing initialization methods.
References
- Ammar E. Z., Lingras P., 2012, K-modes clustering using possibilistic membership, IPMU 2012, Part III, CCIS 299, pp. 596-605.
- Alvand M., Fazli S., Abdoli F. S., 2012, K-mean clustering method for analysis customer lifetime value with LRFM relationship model in banking services, International Research Journal of Applied and Basic Sciences, 3 (11): pp. 2294-2302.
- Bai L., Liang J., Dang Ch., Cao F., 2012, A cluster centers initialization method for clustering categorical data, Expert Systems with Applications, 39, pp. 8022-8029.
- Bai L., Lianga J., Dang Ch., Cao F., 2013, A novel fuzzy clustering algorithm with between-cluster information for categorical data, Fuzzy Sets and Systems (215), pp. 55-73.
- Ball G. H., Hall D. J., 1967, A clustering technique for summarizing multivariate data, Behavioral Science 2 (2) 153-155.
- Bhagat P. M., Halgaonkar P. S., Wadhai V. M., 2013, Review of clustering algorithm for categorical data, International Journal of Engineering and Advanced Technology, 3 (2).
- Cao F., Liang J., Bai L., 2009, A new initialization method for categorical data clustering, Expert Systems with Applications, 36, pp. 10223-10228.
- Cao F., Liang J., Li D., Bai L., Dang Ch., 2012, A dissimilarity measure for the k-Modes clustering algorithm, Knowledge-Based Systems 26, pp. 120-127.
- Gonzalez T., 1985, Clustering to minimize the maximum intercluster distance, Theoretical Computer Science, 38 (2- 3), pp. 293-306.
- Jancey R. C., 1996, Multidimensional group analysis, Australian Journal of Botany, 14 (1), pp. 127-130.
- Ji J., Pang W., Zheng Y., Wang Z., Ma Zh., Zhang L., 2015, A novel cluster center initialization method for the k-Prototypes algorithms using centrality and distance, Applied Mathematics and Information Sciences, No. 6, pp. 2933-2942.
- Katsavounidis, C.-C. Kuo J., Zhang Z., 1994, A new initialization technique for generalized Lloyd iteration, IEEE Signal Processing Letters, 1 (10), pp. 144-146.
- Khan Sh. S., Ahmed A., 2013, Cluster center initialization algorithm for K-modes clustering, Expert Systems with Applications, 40, pp. 7444-7456.
- Kim K.K., Hyunchul A., 2008, A recommender system using GA K-means clustering in an online shopping market, Expert Systems with Applications, 34, pp. 1200-1209.
- Li T., MA S., Ogihara M., 2004, Entropy-based criterion in categorical clustering, The 21rst International Conference on Machine Learning, Banff, Canada.
- Mukhopadhyay A., Maulik U., 2007, Multiobjective approach to categorical data clustering, IEEE Congress on Evolutionary Computation, pp. 1296 - 1303.
- Pratima D., Nimmakant N. i, 2008, Pattern recognition algorithms for cluster identification problem, Special Issue of International Journal of Computer Science & Informatics, Vol. II, Issue 1 (2), pp. 2231-5292.
- Rahman N., Sarma P., 2013, Analysis of treatment of prostate cancer by using multiple techniques of data mining, International Journal of Advanced Research in Computer Science and Software Engineering 3 (4), pp. 584-589.
- Redmond S. J., Heneghan C., 2007, A method for initialising the k-means clustering algorithm using kdtrees, Pattern Recognition Letters, 28(8), pp. 965-973.
- Serapião B. S., Corrêa G. S. , Gonçalves F. B. , Carvalho V. O., 2016, Combining K-means and K-harmonic with fish school search slgorithm for data clustering task on graphics processing units, Applied Soft Computing, 41, pp. 290-304.
- Soliman O. S. , Saleh D. A., 2015, Multi-objective Kmodes data clustering algorithm using self-adaptive differential evolution, International Journal of Advanced Research in Computer Science and Software Engineering, 5(2), pp. 57-65.
- Terano T., Liu H., Chen A. L.P., 2000, Knowledge discovery and data mining. Current Issues and New Applications, 4th Pacific Asia Conference, PAKDD.
Paper Citation
in Harvard Style
S. Ali D., Ghoneim A. and Saleh M. (2017). K-modes and Entropy Cluster Centers Initialization Methods . In Proceedings of the 6th International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES, ISBN 978-989-758-218-9, pages 447-454. DOI: 10.5220/0006245504470454
in Bibtex Style
@conference{icores17,
author={Doaa S. Ali and Ayman Ghoneim and Mohamed Saleh},
title={K-modes and Entropy Cluster Centers Initialization Methods},
booktitle={Proceedings of the 6th International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,},
year={2017},
pages={447-454},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006245504470454},
isbn={978-989-758-218-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 6th International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,
TI - K-modes and Entropy Cluster Centers Initialization Methods
SN - 978-989-758-218-9
AU - S. Ali D.
AU - Ghoneim A.
AU - Saleh M.
PY - 2017
SP - 447
EP - 454
DO - 10.5220/0006245504470454