Initializing k-means Clustering
Christian Borgelt, Olha Yarikova
2020
Abstract
The quality of clustering results obtained with the k-means algorithm depends heavily on the initialization of the cluster centers. Simply sampling centers uniformly at random from the data points usually yields fairly poor and unstable results. Hence several alternatives have been suggested in the past, among which Maximin (Hathaway et al., 2006) and k-means++ (Arthur and Vassilvitskii, 2007) are best known and most widely used. In this paper we explore modifications of these methods that deal with cases, in which the original methods still yield suboptimal choices of the initial cluster centers. Furthermore we present efficient implementations of our new methods.
DownloadPaper Citation
in Harvard Style
Borgelt C. and Yarikova O. (2020). Initializing k-means Clustering.In Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-440-4, pages 260-267. DOI: 10.5220/0009872702600267
in Bibtex Style
@conference{data20,
author={Christian Borgelt and Olha Yarikova},
title={Initializing k-means Clustering},
booktitle={Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2020},
pages={260-267},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009872702600267},
isbn={978-989-758-440-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Initializing k-means Clustering
SN - 978-989-758-440-4
AU - Borgelt C.
AU - Yarikova O.
PY - 2020
SP - 260
EP - 267
DO - 10.5220/0009872702600267