In a possible explanation, the parallel algorithm
proposed in Section 3.3 has an inherently more dis-
tributed behavior than the original sequential one. In
fact, in each iteration, a number of units scattered ran-
domly along the whole mesh is updated, virtually at
the same time, while in the original algorithm only
the winner unit and its direct neighbors are updated
before the next iteration. This distributed behavior
apparently leads to a more effective use of each input
signal, thus permitting a faster convergence, at least
in terms of input signals needed. This aspect requires
further investigation.
5 CONCLUSIONS AND FUTURE
DEVELOPMENTS
In this paper we examined the GPU-based paralleliza-
tion of a generic, growing self-organizing network by
proposing a parallel version of the original algorithm,
in order to increase its level of scalability.
In particular, the parallel version proposed adapts
more naturally to the GPU architecture, taking advan-
tage of its hierarchical memory access through a care-
ful data placement, of the wide onboard bandwidth
through perfectly coalesced memory accesses, and of
the high number of cores with a scalable level of par-
allelization.
An interesting, and somehow unexpected, aspect
that the experiments have revealed is that - parallel
execution apart - the overall behavior of the parallel
algorithm proposed is different form the original, se-
quential one. The parallel version of the algorithm, in
fact, seems to better deal with complex meshes by re-
quiring a smaller number of signals in order to reach
network convergence. This aspect needs to be inves-
tigated more, with more specific and extensive exper-
iments.
The parallelization described in this paper limited
itself to the Find Winners phase and, according to
the experimental results, succeeds in making it less
time-consuming than the Update phase. This means
that future developments of the algorithm proposed
should aim to the effective parallelization of the Up-
date phase as well, in order to improve on perfor-
mances. This requires some care however, as col-
lisions among threads corresponding to signals for
which the same unit is winner must be treated with
care, even in the light of the limited thread synchro-
nization capabilities implemented on GPUs.
Figure 11: times to convergence of the Sequential and Sim-
ulated parallel implementations.
REFERENCES
Amenta, N. and Bern, M. (1999). Surface reconstruction by
voronoi filtering. Discrete & Computational Geome-
try, 22(4):481–504.
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K.,
Houston, M., and Hanrahan, P. (2004). Brook for
gpus: stream computing on graphics hardware. ACM
Transactions on Graphics (TOG), 23(3):777–786.
Campbell, A., Berglund, E., and Streit, A. (2005). Graphics
hardware implementation of the parameter-less self-
organising map. Intelligent Data Engineering and Au-
tomated Learning-IDEAL 2005, pages 5–14.
Edelsbrunner, H. (2006). Geometry and Topology for Mesh
Generation. Cambridge University Press.
Fritzke, B. (1995). A growing neural gas network learns
topologies. In Advances in Neural Information Pro-
cessing Systems 7. MIT Press.
Garcia, V., Debreuve, E., and Barlaud, M. (2008). Fast
k nearest neighbor search using gpu. In Computer
Vision and Pattern Recognition Workshops, 2008.
CVPRW’08. IEEE Computer Society Conference on,
pages 1–6. Ieee.
Garc
´
ıa-Rodr
´
ıguez, J., Angelopoulou, A., Morell, V., Orts,
S., Psarrou, A., and Garc
´
ıa-Chamizo, J. (2011). Fast
image representation with gpu-based growing neural
gas. Advances in Computational Intelligence, pages
58–65.
Harris, M. (2007). Optimizing parallel reduction in cuda.
CUDA SDK Whitepaper.
Hensley, J. (2007). Amd ctm overview. In ACM SIGGRAPH
2007 courses, page 7. ACM.
Hockney, R. W. and Eastwood, J. W. (1988). Computer sim-
ulation using particles. Taylor & Francis, Inc., Bristol,
PA, USA.
Kohonen, T. (1990). The self-organizing map. Proceedings
of the IEEE, 78(9):1464–1480.
Liu, S., Flach, P., and Cristianini, N. (2011). Generic
multiplicative methods for implementing machine
learning algorithms on mapreduce. Arxiv preprint
arXiv:1111.2111.
ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics
642