(and then clusters) are considered as close. Finally, all
these measures are linked to one another into a map
by region growing clustering, allowing to rebuild the
regions of the feature space. The number of isolated
groups of clusters is somehow the number of regions
of the feature space, and therefore a very good esti-
mate for the optimal number of clusters.
We assessed this methodology over an academic
dataset consisting in fifteen 2D-Gaussian Distribu-
tions, and then over real industrial data. In both cases,
we found back about the number of clusters we had
expected, i.e., a ten in both cases. We tested three
clustering methods (K-Means, SOMs and Bi-Level
SOMs) to show the resilience of our methodology,
which proved to be highly reliable in any context,
even with unsupervised data-driven approaches.
The ECD Test tool is very helpful to prepare the
ground for some sort of next steps. In our next works,
we will endeavor to use it in wider situations, and to
use it so as to get the best clustering as possible on
real datasets. A great and accurate clustering is of
major importance in many contexts, such as multi-
modeling the system under consideration (one local
model for each cluster): this is the solution we are ac-
tually working on, and the reason why we addressed
the problem raised in this paper.
ACKNOWLEDGEMENTS
This paper received funding from the European Union
Horizon 2020 research and innovation program under
grant agreement N°695965 (project HyperCOG).
REFERENCES
Amorim, R. and Hennig, C. (2015). Recovering the number
of clusters in data sets with noise features using fea-
ture rescaling factors. Information Sciences, 324:126–
145.
Defays, D. (1977). An efficient algorithm for a complete
link method. Comput. J., 20:364–366.
Dhillon, I., Guan, Y., and Kulis, B. (2004). Kernel k-means,
spectral clustering and normalized cuts. KDD-2004 -
Proceedings of the Tenth ACM SIGKDD International
Conference on Knowledge Discovery and Data Min-
ing, 551-556.
Dubuisson, M. and Jain, A. (1994). A modified hausdorff
distance for object matching. In Proceedings of 12th
International Conference on Pattern Recognition, vol-
ume 2, pages 566,567,568, Los Alamitos, CA, USA.
IEEE Computer Society.
Goutte, C., Hansen, L., Liptrot, M., and Rostrup, E. (2001).
Feature-space clustering for fmri meta-analysis. Hu-
man brain mapping, 13:165–83.
Hassani, H. and Silva, E. S. (2015). A kolmogorov-smirnov
based test for comparing the predictive accuracy of
two sets of forecasts. Econometrics, 3(3):590–609.
Honarkhah, M. and Caers, J. (2010). Stochastic simula-
tion of patterns using distance-based pattern model-
ing. Mathematical Geosciences, 42:487–517.
Kohonen, T. (1982). Self-organized formation of topolog-
ically correct feature maps. Biological Cybernetics,
43(1):59–69.
Lloyd, S. P. (1982). Least squares quantization in pcm.
IEEE Trans. Inf. Theory, 28:129–136.
Marutho, D., Hendra Handaka, S., Wijaya, E., and Muljono
(2018). The determination of cluster number at k-
mean using elbow method and purity evaluation on
headline news. In 2018 International Seminar on Ap-
plication for Technology of Information and Commu-
nication, pages 533–538.
Molini
´
e, D. and Madani, K. (2021). Characterizing n-
dimension data clusters: A density-based metric for
compactness and homogeneity evaluation. In Pro-
ceedings of the 2nd International Conference on In-
novative Intelligent Industrial Production and Logis-
tics – IN4PL, volume 1, pages 13–24. INSTICC,
SciTePress.
Molini
´
e, D., Madani, K., and Amarger, V. (2022). Cluster-
ing at the disposal of industry 4.0: Automatic extrac-
tion of plant behaviors. Sensors, 22(8).
Molini
´
e, D. and Madani, K. (2022). Bsom: A two-
level clustering method based on the efficient self-
organizing maps. In 6th International Conference on
Control, Automation and Diagnosis (ICCAD). [Ac-
cepted but not yet published by July 29, 2022].
Molini
´
e, D., Madani, K., and Amarger, C. (2021). Identi-
fying the behaviors of an industrial plant: Application
to industry 4.0. In Proceedings of the 11th Interna-
tional Conference on Intelligent Data Acquisition and
Advanced Computing Systems: Technology and Ap-
plications, volume 2, pages 802–807.
Pelleg, D. and Moore, A. (2002). X-means: Extending k-
means with efficient estimation of the number of clus-
ters. Machine Learning.
Rabbani, T., Heuvel, F., and Vosselman, G. (2006). Seg-
mentation of point clouds using smoothness con-
straint. International Archives of Photogrammetry,
Remote Sensing and Spatial Information Sciences, 36.
Rousseeuw, P. (1987). Silhouettes: A graphical aid to the in-
terpretation and validation of cluster analysis. Journal
of Computational and Applied Mathematics, 20:53–
65.
Rybnik, M. (2004). Contribution to the modelling and the
exploitation of hybrid multiple neural networks sys-
tems : application to intelligent processing of infor-
mation. PhD thesis, University Paris-Est XII, France.
Sibson, R. (1973). Slink: An optimally efficient algorithm
for the single-link cluster method. Comput. J., 16:30–
34.
Simard, R. and L’Ecuyer, P. (2011). Computing the two-
sided kolmogorov-smirnov distribution. Journal of
Statistical Software, 39(11):1–18.
Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimat-
ing the number of clusters in a data set via the gap
statistic. Journal of the Royal Statistical Society Se-
ries B, 63:411–423.
ETCIIM 2022 - International Workshop on Emerging Trends and Case-Studies in Industry 4.0 and Intelligent Manufacturing
290