KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method

Larissa Teixeira; Igor Eleutério; Mirela Cazzolato; Mirela Cazzolato; Marco A. Gutierrez; Agma J. M. Traina; Caetano Traina-Jr.

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method

Topics: Big Data, Data Science and Analytics; Data Mining and Knowledge Discovery

In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 73-84, 2024 , Angers, France

Authors: Larissa Teixeira ¹ ; Igor Eleutério ¹ ; Mirela Cazzolato ^{1

;

2} ; Marco A. Gutierrez ² ; Agma J. M. Traina ¹ and Caetano Traina-Jr. ¹

Affiliations: ¹ Institute of Mathematics and Computer Science (ICMC), University of São Paulo (USP), São Carlos, Brazil ; ² The Heart Institute (InCor), University of São Paulo (USP), São Paulo, Brazil

Keyword(s): Dimensional Data, k-medoids, Clustering, Indexing, Metric Access Method.

Abstract: Clustering algorithms are powerful data mining techniques, responsible for identifying patterns and extracting information from datasets. Scalable algorithms have become crucial to enable data mining techniques on large datasets. In literature, k-medoid-based clustering algorithms stand out as one of the most used approaches. However, these methods face scalability challenges when applied to massive datasets and high dimensional vector spaces, mainly due to the high computational cost in the swap step. In this paper, we propose the KluSIM method to improve the computational efficiency of the swap step in the k-medoids clustering process. KluSIM leverages Metric Access Methods (MAMs) to prune the search space, speeding up the swap step. Additionally, KluSIM eliminates the need of maintaining a distance matrix in memory, successfully overcoming memory limitations in existing methodologies. Experiments over real and synthetic data show that KluSIM outperforms the baseline FasterPAM, wit h a speed up of up to 881 times, requiring up to 3,500 times fewer distance calculations, and maintaining a comparable clustering quality. KluSIM is well-suited for big data analysis, being effective and scalable for clustering large datasets. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.229

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Teixeira, L., Eleutério, I., Cazzolato, M., A. Gutierrez, M., J. M. Traina, A., Traina-Jr. and C. (2024). KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 73-84. DOI: 10.5220/0012599900003690

@conference{iceis24,
author={Larissa Teixeira and Igor Eleutério and Mirela Cazzolato and Marco {A. Gutierrez} and Agma {J. M. Traina} and Caetano Traina{-}Jr.},
title={KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={73-84},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012599900003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - KluSIM: Speeding up K-Medoids Clustering over Dimensional Data with Metric Access Method
SN - 978-989-758-692-7
IS - 2184-4992
AU - Teixeira, L.
AU - Eleutério, I.
AU - Cazzolato, M.
AU - A. Gutierrez, M.
AU - J. M. Traina, A.
AU - Traina-Jr., C.
PY - 2024
SP - 73
EP - 84
DO - 10.5220/0012599900003690
PB - SciTePress