Estimating the Optimal Number of Clusters from Subsets of Ensembles
Afees Odebode, Allan Tucker, Mahir Arzoky, Stepehen Swift
2022
Abstract
This research estimates the optimal number of clusters in a dataset using a novel ensemble technique - a preferred alternative to relying on the output of a single clustering. Combining clusterings from different algorithms can lead to a more stable and robust solution, often unattainable by any single clustering solution. Technically, we created subsets of ensembles as possible estimates; and evaluated them using a quality metric to obtain the best subset. We tested our method on publicly available datasets of varying types, sources and clustering difficulty to establish the accuracy and performance of our approach against eight standard methods. Our method outperforms all the techniques in the number of clusters estimated correctly. Due to the exhaustive nature of the initial algorithm, it is slow as the number of ensembles or the solution space increases; hence, we have provided an updated version based on the single-digit difference of Gray code that runs in linear time in terms of the subset size.
DownloadPaper Citation
in Harvard Style
Odebode A., Tucker A., Arzoky M. and Swift S. (2022). Estimating the Optimal Number of Clusters from Subsets of Ensembles. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 383-391. DOI: 10.5220/0011275000003269
in Bibtex Style
@conference{data22,
author={Afees Odebode and Allan Tucker and Mahir Arzoky and Stepehen Swift},
title={Estimating the Optimal Number of Clusters from Subsets of Ensembles},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={383-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011275000003269},
isbn={978-989-758-583-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Estimating the Optimal Number of Clusters from Subsets of Ensembles
SN - 978-989-758-583-8
AU - Odebode A.
AU - Tucker A.
AU - Arzoky M.
AU - Swift S.
PY - 2022
SP - 383
EP - 391
DO - 10.5220/0011275000003269