loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Larissa Teixeira 1 ; Igor Eleutério 1 ; Mirela Cazzolato 1 ; 2 ; Marco A. Gutierrez 2 ; Agma J. M. Traina 1 and Caetano Traina-Jr. 1

Affiliations: 1 Institute of Mathematics and Computer Science (ICMC), University of São Paulo (USP), São Carlos, Brazil ; 2 The Heart Institute (InCor), University of São Paulo (USP), São Paulo, Brazil

Keyword(s): Dimensional Data, k-medoids, Clustering, Indexing, Metric Access Method.

Abstract: Clustering algorithms are powerful data mining techniques, responsible for identifying patterns and extracting information from datasets. Scalable algorithms have become crucial to enable data mining techniques on large datasets. In literature, k-medoid-based clustering algorithms stand out as one of the most used approaches. However, these methods face scalability challenges when applied to massive datasets and high dimensional vector spaces, mainly due to the high computational cost in the swap step. In this paper, we propose the KluSIM method to improve the computational efficiency of the swap step in the k-medoids clustering process. KluSIM leverages Metric Access Methods (MAMs) to prune the search space, speeding up the swap step. Additionally, KluSIM eliminates the need of maintaining a distance matrix in memory, successfully overcoming memory limitations in existing methodologies. Experiments over real and synthetic data show that KluSIM outperforms the baseline FasterPAM, wit h a speed up of up to 881 times, requiring up to 3,500 times fewer distance calculations, and maintaining a comparable clustering quality. KluSIM is well-suited for big data analysis, being effective and scalable for clustering large datasets. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.117.166.52

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Teixeira, L.; Eleutério, I.; Cazzolato, M.; A. Gutierrez, M.; J. M. Traina, A. and Traina-Jr., C. (2024). KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 73-84. DOI: 10.5220/0012599900003690

@conference{iceis24,
author={Larissa Teixeira. and Igor Eleutério. and Mirela Cazzolato. and Marco {A. Gutierrez}. and Agma {J. M. Traina}. and Caetano Traina{-}Jr..},
title={KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={73-84},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012599900003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method
SN - 978-989-758-692-7
IS - 2184-4992
AU - Teixeira, L.
AU - Eleutério, I.
AU - Cazzolato, M.
AU - A. Gutierrez, M.
AU - J. M. Traina, A.
AU - Traina-Jr., C.
PY - 2024
SP - 73
EP - 84
DO - 10.5220/0012599900003690
PB - SciTePress