A great area of interest for clustering could be the
modeling of a system. Indeed, each cluster should
contain similar data and thus can be assumed to rep-
resent a local behavior: a local model can therefore
be built upon it. This is the Multi-Models paradigm.
This decomposition of the feature space, and thus of
the system as a whole, is very suitable to an Automa-
tion and Industry 4.0 application; indeed, the corner-
stone of any network – such as an Industry 4.0 Cyber-
Physical System – is to treat every unit as a part of a
wider process. Clustering could help to identify the
elementary units of an industrial system, either phys-
ically (subprocesses) or theoretically (behaviors).
There exists plenty of clustering methods, but they
do not all perform alike. Their accuracy greatly de-
pends on the studied data; as such, two clustering al-
gorithms can be similar while achieving very differ-
ent results. The remaining question would be to know
how to automatically distinguish a good decomposi-
tion from a poor one. This knowledge can be very im-
portant and relevant for the further steps of the mod-
eling process; indeed, how to expect a good model
when the first step, i.e. clustering, has achieved poor
results, and has gathered (very) dissimilar data?
In this paper, we will present a new way to quan-
tify the compactness of a cluster of data, which is very
useful as an indicator of data homogeneity. In the next
section, we will present both some Machine Learning
models and some Machine Learning-based clustering
methods; in section 3, we will detail both that used
and our homogeneity quantifier. Section 4 will dis-
play some examples of results and how one can in-
terpret them for further understanding of the systems.
Finally, section 5 is the conclusion of this paper.
2 STATE OF THE ART
Whether it is artificial or biological (brain), a Neural
Network NN is somehow a graph where the nodes are
a sort of activator and the branches linking the nodes
to one another are a channel, whose span lets a signal
go through with a high or low amplitude, like a nozzle
would do. By analogy with the brain, the nodes are
called the neurons and the channels are the synapses.
Training an Artificial Neural Network means to
adapt ”the span” of its channels; a weighting coeffi-
cient (called weight) is responsible for this adaptation,
which is modified in such a way it fits the known data.
More generally, the Machine Learning paradigm
exploits data and experience to train a system, and
to teach it how to behave within certain situations.
It relies on several approaches: Multi-Layer Percep-
tron MLP (Rumelhart et al., 1986a), Radial Basis
Function RBF (Broomhead and Lowe, 1988), Multi-
Expert System MES (Thiaw, 2008), Multi-Agent Sys-
tem MAS (Rumelhart et al., 1986b), Support Vector
Machine SVM (Boser et al., 1996), etc. The most
common model in use is the MLP, which is somehow
a representation of the biological brain; they are often
long to train however. Thus, some other models can
be preferred, depending on the context of application.
Though less widespread, another paradigm in use
is the Multi-Expert System MES: it is anew a Neural
Network, with the main difference that the neurons
are local experts. The output of the whole model is a
combination of the outputs of the local models. It is
an extension of the very widespread Expert System
paradigm (Buchanan and Feigenbaum, 1978). The
local experts can take many shapes, from expert sys-
tems to Machine Learning-based models (MLP, RBF,
SVM, etc.). Though still depending on the data and
on the context, MESs are capable of achieving great
accuracy, for the sub-models are trained to correctly
model and represent each of the system’s sub-parts.
As a consequence, to use the MES paradigm,
the feature space must be split upstream. As such,
we need a clustering method to make the framework
ready for this step. There exists several clustering
algorithms, mostly based on Machine Learning; this
avoids to have to rely on human experts charged to
manually describe the system, what can be hard and
time-greedy. Machine Learning lets a (smart) algo-
rithm do the job instead; the inquiry is mainly to know
whether or not one can trust the so-obtained clusters.
Once more, clustering can be driven by some
knowledge – supervised learning – or be totally blind
– unsupervised learning. However, supervised learn-
ing requires some often manual expertise of the data
(or given by an unsupervised algorithm upstream) and
is therefore not suitable to a blind data mining pro-
cedure. Among the existing clustering algorithms,
it is worth mentioning: K-Means (Jain, 2010), Self-
Organizing Maps (Kohonen, 1982), Neural Gas (Mar-
tinetz and Schulten, 1991), Fuzzy Clustering (Dunn,
1973) and Support Vector Machine (Boser et al.,
1996). The first three are unsupervised, and are per-
fectly suitable to a data mining application.
The K-Means iteratively aggregate the data points
around some ”seeds” (random points drawn from the
database), and update them again and again until
satisfying some criterion. They are easy to imple-
ment, but are unable to cluster nonlinearly separable
datasets. The Kernel K-Means use a kernel (a pro-
jection of the data from an Euclidean space into a
non-Euclidean one, called kernel space) to compen-
sate that, but they are very resource-consuming and
are prone to achieve only local optima.
IN4PL 2021 - 2nd International Conference on Innovative Intelligent Industrial Production and Logistics
14