Data
Edge
Neuron
(a) original GNG
Data
Edge
Neuron
(b) batch-merge GNG
Data
Edge
Neuron
(c) GNG-merge GNG
Data
Edge
Neuron
(d) average-merge GNG
Figure 5: Training results.
neurons. Their results are shown in Figures 5c and
5d. The GNG-merge lets the neurons collapse. This
is because for the merge only the reference vectors of
the neurons are used. At the beginning of the train-
ing, these vectors are in the center of the data vectors
and are not adapted fast enough to later represent the
data vectors. The average-merge scattered the neu-
rons and broke up clusters. This is because the neu-
rons are merged according to their number, not their
position. So neurons belonging to different clusters
could be merged. For the GNG-merge, we will try to
alter some of the parameters to get better results, but
for now the batch-merge is our favorite.
To measure the quality of the clustering, different
methods were proposed. We used the Dunn (Dunn,
1974), Goodman-Kruskal (Goodman and Kruskal,
1954), C (Hubert and Schultz, 1976), and Davies-
-Bouldin (Davies and Bouldin, 1979) index. The
quality of the clustering only changes when data par-
allelization is used. This is the only method that
changes the original GNG algorithm. In our cases this
meant a decrease—dependingon the merge method—
of the clustering quality.
Using the Goodman-Kruskal index all merge
methods are near the optimum of 1, only the GNG
method (using 16 data threads) is slightly worse. This
means that pairs of neurons inside one cluster mostly
have smaller distances between them than pairs of
neurons of different clusters. The other index operat-
ing on the distances between neurons—the C index—
showed no differentiation. The GNG method again
showed a slightly worse behavior at 16 data threads.
The Dunn index, stating that clusters are well dif-
ferentiated, is overall low. The best values for this in-
dex are gotten with the non-parallel algorithm. Then
using data parallelization the batch merge method
showed the best results at approximately 0.15 (values
greater than 1 are good).
Finally the Davies-Bouldin index was used. It is
a measure for the compactness of the clusters in rela-
tion to their distance. The batch and the GNG merge
method are at the level of the non-parallel GNG al-
gorithm. Only the average merge method showed de-
teriorating values with an increasing number of data
threads.
5 CONCLUSIONS
We have shown, that—theoretically and also
practically—a performance gain of the GNG algo-
rithm through parallelization can be achieved. Data
parallelization has the most potential but also has
its pitfalls in the synchronization methods used. We
also showed that for the used GPU architecture a
further sub-parallelization on neuron and vector level
is advantageous.
We will further explore the possibilities of the
parallelization of the GNG algorithm with regards to
other parallel architectures such as clusters. The up-
coming multi-core CPUs and GPUs, which promise
much larger numbers of computing units are in our
focus, too. (Intel, 2007) (Kowaliski, 2007) (Etengoff,
2009) (Sweeney, 2009)
REFERENCES
Adam, A., Leuoth, S., and Benn, W. (2009). Perfor-
mance Gain of Different Parallelization Approaches
for Growing Neural Gas. In Perner, P., editor, Ma-
chine Learning and Data Mining in Pattern Recogni-
tion, Poster Proceedings.
Ancona, F., Rovetta, S., and Zunino, R. (1996). A Parallel
Approach to Plastic Neural Gas. In Proceedings of the
1996 International Conference on Neural Networks.
Cottrell, M., Hammer, B., and Hasenfuß, A. (2008). Batch
and median neural gas. Elsevier Science.
Davies, D. L. and Bouldin, D. W. (1979). A Cluster Sepa-
ration Measure. Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, PAMI-1(2):224–227.
Dunn, J. C. (1974). Well separated clusters and optimal
fuzzy-partitions. Journal of Cybernetics, 4:95–104.
Etengoff, A. (2009). Nvidia touts rapid GPU performance
boost. http://www.tgdaily.com/content/view/43745/
135/.
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
268