
Our KluSIM proposal stands out as an efficient
and scalable solution for k-medoids clustering tasks.
The combination of metric access methods, opti-
mized initialization heuristics, and the elimination
of the need for a distance matrix in memory col-
lectively contribute to the outstanding performance
gains. Thus, KluSIM is a powerful tool for scalable
and high-performance clustering tasks, particularly
in scenarios with limited computational resources or
large datasets.
In future work, we intend to explore MAM-based
initialization heuristics to leverage the index structure
in the entire clustering process. We also want to eval-
uate KluSIM with other distance functions.
ACKNOWLEDGEMENT
This research was financed in part by the
Coordenac¸
˜
ao de Aperfeic¸oamento de Pessoal de
N
´
ıvel Superior - Brasil (CAPES) - Finance Code 001
and 12620352/M, by the S
˜
ao Paulo Research Founda-
tion (FAPESP, grants 2016/17078-0, 2020/11258-2),
the National Council for Scientific and Technological
Development (CNPq) and JIT Educac¸
˜
ao.
REFERENCES
Arthur, D. and Vassilvitskii, S. (2007). K-means++ the ad-
vantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete
algorithms, pages 1027–1035.
Barioni, M. C. N., Razente, H. L., Traina, A. J., and
Traina Jr, C. (2008). Accelerating k-medoid-based al-
gorithms through metric access methods. Journal of
Systems and Software, 81(3):343–355.
Bentley, J. L. (1975). Multidimensional binary search trees
used for associative searching. Communications of the
ACM, 18(9):509–517.
Cazzolato, M. T., Scabora, L. C., Zabot, G. F., Gutier-
rez, M. A., Traina Jr, C., and Traina, A. J. (2022).
Featset+: Visual features extracted from public image
datasets. Journal of Information and Data Manage-
ment, 13(1).
Davies, D. L. and Bouldin, D. W. (1979). A cluster separa-
tion measure. IEEE Transactions on Pattern Analysis
and Machine Intelligence, PAMI-1(2):224–227.
Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Con-
cepts and Techniques, 3rd edition. Morgan Kaufmann.
Kaufman, L. (1990). Rousseeuw, pj: Finding groups in
data: An introduction to cluster analysis. Applied
Probability and Statistics, New York, Wiley Series in
Probability and Mathematical Statistics.
Kenger, O. N., Kenger, Z. D.,
¨
Ozceylan, E., and Mru-
galska, B. (2023). Clustering of cities based on
their smart performances: A comparative approach of
fuzzy c-means, k-means, and k-medoids. IEEE Ac-
cess, 11:134446–134459.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Lenssen, L. and Schubert, E. (2024). Medoid silhouette
clustering with automatic cluster number selection.
Information Systems, 120:102290.
Oliveira, P. H., Scabora, L. C., Cazzolato, M. T., Bedo,
M. V., Traina, A. J., and Traina-Jr, C. (2017). Mam-
moset: An enhanced dataset of mammograms. In
Satellite Events of the Brazilian Symp. on Databases.
SBC, pages 256–266.
Omohundro, S. M. (1989). Five balltree construction al-
gorithms. International Computer Science Institute
Berkeley.
Park, H.-S. and Jun, C.-H. (2009). A simple and fast algo-
rithm for k-medoids clustering. Expert systems with
applications, 36(2):3336–3341.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:
Machine learning in python. the Journal of machine
Learning research, 12:2825–2830.
Qaddoura, R., Faris, H., and Aljarah, I. (2020). An efficient
clustering algorithm based on the k-nearest neighbors
with an indexing ratio. International Journal of Ma-
chine Learning and Cybernetics, 11(3):675–714.
Ran, X., Xi, Y., Lu, Y., Wang, X., and Lu, Z. (2023).
Comprehensive survey on hierarchical clustering al-
gorithms and the recent developments. Artificial In-
telligence Review, 56(8):8219–8264.
Schubert, E. and Rousseeuw, P. J. (2021). Fast and eager
k-medoids clustering: O (k) runtime improvement of
the pam, clara, and clarans algorithms. Information
Systems, 101:101804.
Tiwari, M., Zhang, M. J., Mayclin, J., Thrun, S., Piech, C.,
and Shomorony, I. (2020). Banditpam: Almost linear
time k-medoids clustering via multi-armed bandits. In
NeurIPS.
Traina, C., Traina, A., Faloutsos, C., and Seeger, B. (2002).
Fast indexing and visualization of metric data sets us-
ing slim-trees. IEEE Transactions on Knowledge and
Data Engineering, 14(2):244–260.
Vandanov, S., Plyasunov, A., and Ushakov, A. (2023). Par-
allel clustering algorithm for the k-medoids problem
in high-dimensional space for large-scale datasets.
In 2023 19th International Asian School-Seminar on
Optimization Problems of Complex Systems (OPCS),
pages 119–124.
Yan, K., Wang, X., Lu, L., and Summers, R. M. (2017).
Deeplesion: Automated deep mining, categorization
and detection of significant radiology image findings
using large-scale clinical lesion annotations. arXiv
preprint arXiv:1710.01766.
Yianilos, P. N. (1993). Data structures and algorithms for
nearest neighbor. In Proceedings of the fourth annual
ACM-SIAM Symposium on Discrete algorithms, vol-
ume 66, page 311. SIAM.
Zezula, P., Amato, G., Dohnal, V., and Batko, M. (2006).
Similarity search: the metric space approach, vol-
ume 32. Springer Science & Business Media.
ICEIS 2024 - 26th International Conference on Enterprise Information Systems
84