It is incrementally computed each time a new data-
point comes, by updating some information associ-
ated to each neuron (e.g. number of data-points as-
sociated to a neuron, the sum of their distances to
this neuron, etc.). If we consider the example of Fig-
ure 3, there are 3 data-points assigned to y
1
(namely
x
1
, x
2
and x
3
), and two neurons that are neighbours
of y
1
(namely y
2
with 4 assigned data-points, and
y
3
with 5 data-points). In this case, the threshold
associated to the neuron y
1
is computed as T
y
1
=
dist(y
1
,x
1
)+dist(y
1
,x
2
)+dist(y
1
,x
3
)+4dist(y
1
,y
2
)+5dist(y
1
,y
3
)
3+4+5
.
As we can see, the proposed threshold is indepen-
dent of parameters and evolves dynamically accord-
ing to the data and the topology of neurons.
3.3 AING Merging Process
Since data is processed online, it is usually common
that algorithms for data stream clustering generate
many cluster representatives. However, this may sig-
nificantly compromise the computational efficiency
over time. Instead of introducing parameters in the
threshold computation to control the number of cre-
ated neurons, AING can eventually reduce the num-
ber of neurons through the merging process. Indeed,
when the number of current neurons reaches an up-
per bound (up bound), some close neurons can be
merged.
The merging process globally follows the same
scheme as previously, but instead of relying on a hard
rule based on a threshold, it uses a more relaxed rule
based on a probabilistic criterion. Saying that ”a neu-
ron y is far enough from its nearest neuron ˜y” is ex-
pressed as the probability that y will not be assigned
to ˜y, according to the formula P
y, ˜y
=
|X
y
|×dist(y, ˜y)
κ
. This
probability is proportional to the distance between the
two neurons (dist(y, ˜y)) and to the number of data-
points assigned to y (|X
y
|), that is, the more y is large
and far from ˜y, the more likely it is to remain not
merged. The probability is in contrast inversely pro-
portional to a variable κ, which means that by incre-
menting κ, any given neuron y will have more chance
to be merged with its nearest neuron. Let
¯
d be the
mean distance of all existing neurons to the center-of-
mass of the observed data-points. κ is incremented by
κ = κ +
¯
d each time the neurons need to be more con-
densed, i.e. until the merging process takes effect and
the number of neurons becomes less than the specified
limit up bound. Note that P
y, ˜y
as specified may be
higher than 1 when κ is not yet sufficiently big; a bet-
ter formulation would be P
y, ˜y
= min(
|X
y
|×dist(y, ˜y)
κ
,1),
to guarantee it to be always a true probability.
The merging process is optional. Indeed,
up bound can be set to +∞ if desired. Alterna-
tively, the merging process can be triggered at any
time chosen by the user, or by choosing the parame-
ter up bound according to some system requirements
such as the memory budget that we want to allocate
for the clustering task, or the maximum latency time
tolerated by the system due to a high number of neu-
rons.
Finally, the code is explicitly presented in Algo-
rithms 1 and 2, which provide an overall insight on
the AING’s method of operation. They both follow
the same scheme described in section 3.1. Algorithm
1 starts from scratch and incrementally processes each
data-point from the stream using the adaptive distance
threshold described in section 3.2. When the num-
ber of current neurons reaches a limit, Algorithm 2 is
called and some neurons are grouped together using
the probabilistic criterion described in section 3.3. We
just need to point out two additional details appearing
in our algorithms:
• If a data-point x is close enough to its two near-
est neurons y
1
and y
2
, it is assigned to y
1
and
the reference vector of this later and its neigh-
bours are updated (i.e. they move towards x) by
a learning rate: ε
b
for y
1
and ε
n
for its neighbours
(lines 15-17 of Algorithm 1). Generally, a too big
learning rate implies instability of neurons, while
a too small learning rate implies that neurons do
not learn enough from their assigned data. Typ-
ical values are 0 < ε
b
1 and 0 < ε
n
ε
b
. In
AING, ε
b
=
1
|X
y
1
|
is slowly decreasing proportion-
ally to the number of data-points associated to y
1
,
i.e. the more y
1
learns, the more it becomes sta-
ble, and ε
n
is simply heuristically set to 100 times
smaller than the actual value of ε
b
(i.e. ε
n
ε
b
)
• Each time a data-point is assigned to a winning
neuron y
1
, the age of edges emanating from this
neuron is increased (line 14 of Algorithm 1).
Let n
max
the maximum number of data-points as-
signed to a neuron. A given edge is then consid-
ered ”old” and thus removed (line 19 of Algorithm
1) if its age becomes higher than n
max
. Note that
this is not an externally-set parameter, it is the cur-
rent maximum number of data-points assigned to
a neuron among the existing ones.
4 EXPERIMENTAL EVALUATION
4.1 Experiments on Synthetic Data
In order to test AING’s behaviour, we perform an ex-
periment on artificial 2D data of 5 classes (Figure
AnAdaptiveIncrementalClusteringMethodbasedontheGrowingNeuralGasAlgorithm
45