Estimating the Optimal Number of Clusters from Subsets of Ensembles

Afees Odebode, Allan Tucker, Mahir Arzoky, Stepehen Swift

2022

Abstract

This research estimates the optimal number of clusters in a dataset using a novel ensemble technique - a preferred alternative to relying on the output of a single clustering. Combining clusterings from different algorithms can lead to a more stable and robust solution, often unattainable by any single clustering solution. Technically, we created subsets of ensembles as possible estimates; and evaluated them using a quality metric to obtain the best subset. We tested our method on publicly available datasets of varying types, sources and clustering difficulty to establish the accuracy and performance of our approach against eight standard methods. Our method outperforms all the techniques in the number of clusters estimated correctly. Due to the exhaustive nature of the initial algorithm, it is slow as the number of ensembles or the solution space increases; hence, we have provided an updated version based on the single-digit difference of Gray code that runs in linear time in terms of the subset size.

Download


Paper Citation


in Harvard Style

Odebode A., Tucker A., Arzoky M. and Swift S. (2022). Estimating the Optimal Number of Clusters from Subsets of Ensembles. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 383-391. DOI: 10.5220/0011275000003269


in Bibtex Style

@conference{data22,
author={Afees Odebode and Allan Tucker and Mahir Arzoky and Stepehen Swift},
title={Estimating the Optimal Number of Clusters from Subsets of Ensembles},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={383-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011275000003269},
isbn={978-989-758-583-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Estimating the Optimal Number of Clusters from Subsets of Ensembles
SN - 978-989-758-583-8
AU - Odebode A.
AU - Tucker A.
AU - Arzoky M.
AU - Swift S.
PY - 2022
SP - 383
EP - 391
DO - 10.5220/0011275000003269