the algorithm gives a total of 218 clusters resulting in
a disastrous ARI.
Multilevel K-means (CardSil), as mentioned
above, stops at the first level and only detects the pairs
of bananas.
Then two groups can be identified:
• the Sil/FS split criterion, which builds more than
50 final clusters. This overclustering leads to
weak ARI scores. The crisp multilevel FS K-
means is the worst, with 25 more clusters than
ECM but the same quality criterion values.
• the Mass25 and Mass100 group determines be-
tween 10 and 14 final clusters. These split criteria
have higher ARI (0.54 with MC CM) and almost
equal non-overlap scores with less clusters, mean-
ing that they perform better than the Sil/FS ones.
To summarise, fuzzy and evidential approach can
improve multilevel clustering results, in spite of noisy
non-linearly separable clusters. However, on this
dataset, ML variants do not achieve the best ARI
(HC’s score), which can be explained by their higher
number of final clusters. But those clusters are very
homogenous (high non-overlap scores).
5 CONCLUSION
In this paper, we have mainly proposed a compari-
son between clustering methods: direct vs multilevel,
then crisp vs fuzzy/evidential. To enhance the fuz-
zy/evidential multilevel algorithms, a new split crite-
rion has also been proposed (Mass a posteriori split
criterion).
Several conclusions were obtained. First, direct
methods may result in a bad structure recognition
due to particular geometry shapes like nested or close
clusters. Agglomerative methods may also be dis-
turbed by connected noisy clusters. This problem
also affects HDBSCAN, despite its ability to cluster
noise. Moreover, the number of clusters obtained by
HDBSCAN can be extremly sensitive to its parameter
minPts on this type of datasets.
An other shortcoming of agglomerative methods,
and spectral clustering as well, is their complexity:
they are not suitable for large datasets because of too
high computing times.
Multilevel approaches like multilevel C-means or
ECM can help to recognize noisy/ambiguous clusters,
what’s more, in a reasonable computing time. On
some datasets, the final clustering appears a lot better
than those obtained with direct methods: the ambigu-
ity between clusters may be better processed working
with several levels.
Then the comparison of split-criteria leads to the
conclusion that those based on soft membership de-
grees can limit the over-clustering, with a K-number
closer to the ground truth.
Nevertheless, multilevel approaches based on K-
means and its fuzzy/evidential extensions are clearly
not perfect. In particular, they are based on a delicate
task, the automatic estimation of the cluster number,
which is repeated frequently.Their parameters are es-
timated in order to obtain a final number of clusters
close to the known number of classes. Another reason
which may disadvantage multilevel methods, is the
difficulty to obtain a fair comparison with other clus-
tering methods, when the final number of clusters dif-
fers. Split criteria thresholds were chosen to make this
number closer to the ground-truth, but it often failed:
they tend to overcluster. But, the fuzzy/evidential
approach provides ambiguity information on clusters
that could be used to perform a fusion and retrieve
original classes. The good non-overlapping scores
obtained in the experiments tend to support this idea.
Further works will therefore investigate the char-
acterization of points and clusters ambiguity in fuzzy
and evidential algorithms, in order to improve each
clustering step, and to drive the merger process to
the building of a more coherent final clustering tree.
Moreover, such an approach would reduce the com-
puting time, by making the spectral embedding step
useless.
ACKNOWLEDGMENTS
This work is a part of the JERICO-S3 project, funded
by the European Commission’s H2020 Framework
Programme under grant agreement No. 871153.
Project coordinator: Ifremer, France.
REFERENCES
Azzalini, A. and Menardi, G. (2013). Clustering Via Non-
parametric Density Estimation: the R Package pdf-
Cluster.
Campello, R. and Hruschka, E. (2006). A fuzzy extension
of the silhouette width criterion for cluster analysis.
Fuzzy Sets and Systems, 157(21):2858–2875.
Denœux, T. (2021). evclust: An R Package for Evidential
Clustering. Bananas dataset.
Gionis, A., Mannila, H., and Tsaparas, P. (2007). Clus-
tering aggregation. ACM Transactions on Knowledge
Discovery from Data, 1(1):4.
Grassi, K., Poisson Caillault, E., and Lefebvre, A. (2019).
Multilevel spectral clustering for extreme event char-
acterization. In OCEANS 2019 - Marseille. IEEE.
Fuzzy and Evidential Contribution to Multilevel Clustering
223