Clustering Big Data

Michele Ianni, Elio Masciari, Giuseppe M. Mazzeo, Carlo Zaniolo

2018

Abstract

The need to support advanced analytics on Big Data is driving data scientist’ interest toward massively parallel distributed systems and software platforms, such as Map-Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. By using four stages of successive refinements, CLUBS+ delivers high-quality clusters of data grouped around their centroids, working in a totally unsupervised fashion. Experimental results confirm the accuracy and scalability of CLUBS+ on Map-Reduce platforms.

Download


Paper Citation


in Bibtex Style

@conference{data18,
author={Michele Ianni and Elio Masciari and Giuseppe M. Mazzeo and Carlo Zaniolo},
title={Clustering Big Data},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2018},
pages={276-282},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006858702760282},
isbn={978-989-758-318-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Clustering Big Data
SN - 978-989-758-318-6
AU - Ianni M.
AU - Masciari E.
AU - M. Mazzeo G.
AU - Zaniolo C.
PY - 2018
SP - 276
EP - 282
DO - 10.5220/0006858702760282


in Harvard Style

Ianni M., Masciari E., M. Mazzeo G. and Zaniolo C. (2018). Clustering Big Data.In Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-318-6, pages 276-282. DOI: 10.5220/0006858702760282