Authors:
Nabil El malki
1
;
Franck Ravat
2
and
Olivier Teste
3
Affiliations:
1
Université de Toulouse, UT2, IRIT(CNRS/UMR5505), Toulouse, France, Capgemini, 109 Avenue du Général Eisenhower, Toulouse and France
;
2
Université de Toulouse, UT2, IRIT(CNRS/UMR5505), Toulouse and France
;
3
Capgemini, 109 Avenue du Général Eisenhower, Toulouse and France
Keyword(s):
k-means, Machine Learning, Data Aggregations.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Industrial Applications of Artificial Intelligence
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
The k-means algorithm is one well-known of clustering algorithms. k-means requires iterative and repetitive accesses to data up to performing the same calculations several times on the same data. However, intermediate results that are difficult to predict at the beginning of the k-means process are not recorded to avoid recalculating some data in subsequent iterations. These repeated calculations can be costly, especially when it comes to clustering massive data. In this article, we propose to extend the k-means algorithm by introducing pre-aggregates. These aggregates can then be reused to avoid redundant calculations during successive iterations. We show the interest of the approach by several experiments. These last ones show that the more the volume of data is important, the more the pre-aggregations speed up the algorithm.