order of ns) but with an interval between spikes of
the order of hundreds of microseconds (us) or even
milliseconds (ms). This interval allows time
multiplexing of all the pulses generated by neurons
into a common digital bus. Each neuron is identified
with an address related to its position into the array.
Every time a neuron emits a pulse, its address will
appear in the output bus, along with a request signal,
until acknowledge is received (handshake protocol).
The receiver chip reads and decodes the direction of
incoming events and issues pulses for the receiving
neurons.
One of the operations performed by AER
systems, applied to artificial vision and multimedia
systems, is the convolution. The first operation in
the brain cortex consists of convolution for object
edges detection, based on calculations of brightness
gradients. In the design presented by Camunas-
Mesa, Acosta-Jimenez, Serrano-Gotarredona and
Linares-Barranco (2008), a system is described
where a single convolution processor performs all
operations for the whole image.
Based on this idea, and the divide and conquer
premise presented in Montero-Gonzalez, Morgado-
Estevez, Linares-Barranco, et al. (2011), this paper
is arguing that the division of the image into smaller
parts before AER convolution processing in parallel
will reduce the runtime. With this new design a
convolution could be proposed where a
multiprocessor system may perform operations in
less time.
2 METHODOLOGY AND TEST
CASES
The process of experimentation is to verify, through
an exhaustive analysis, which would be the different
runtimes of the convolution of an image. Each
runtime will correspond to different divisions. All
division convolutions are performed in parallel,
instead of performing the convolution of the whole
image.
We have used the Cluster of Research Support
(CRS), part of the infrastructure of the UCA, for
improving Runtimes of the simulation tool AER
TOOL. In order to run this simulator on CRS we
propose a new simulation model parameterized and
adapted to running tests in parallel processors.
2.1 Supercomputer CRS (Cluster of
Research Support)
The CRS is composed of 80 nodes. Each node has 2
Intel Xeon 5160 processors at 3 GHz with 1.33GHz
Front Side Bus. Each processor is Dual Core, so we
have 320 cores available. A total of 640GB of RAM
memory, 2.4TB of scratch and Gigabit Ethernet
communication architecture with HP Procurve
switches allow to obtain a peak performance of 3.75
TFLOPS, information extracted from Technical
support in supercomputing, University of Cadiz,
http://supercomputacion.uca.es/.
In terms of software features, to manage
distributed work, Condor tool is used. Condor is a
job manager system that specializes in calculation-
intensive tasks. The coding for the simulation was
done using MATLAB and AER TOOL simulator for
MATLAB described by Perez-Carrasco, Serrano-
Gotarredona and Acha-Piñero (2009).
Developing this set of tests on a real physical
implementation would be highly expensive. The
supercomputer CRS provides the possibility of an
AER simulation model implementation in parallel
with acceptable runtimes, using the software
installed and existing libraries.
2.2 Test Image and Successive
Divisions
For this simulation we have designed an image in
Adobe Photoshop CS, using gray scale, where the
pixel having the darkest value will have a value
close to 0 and the brightest will be close to 255. The
GIF image size is 128x128 pixel of 256 gray levels.
The idea of dividing the original image and
perform parallel convolution arises from trying to
take advantage of distributed processing systems to
expedite the process. This involves running a series
of tests with different numbers of divisions.
Firstly, we have obtained the process runtimes of
the convolution of the original image without
divisions. Secondly the image has been divided into
4 parts (64x64 pixel each), performing the
convolution in a different processor. Then, the
sequence has been repeated by 16 divisions (32x32
pixel each). Next, using 256 divisions (8x8 pixel
each), and finally we have concluded with 1024
divisions (4x4 pixel each). Conceptually, the
operation would be as shown in Figure 1.
2.3 Topology Diagram Implementation
For this research, parametric model simulation
software has been developed, whose test cases are
specified by variable assignment.
Once the simulation variables are set, the system
runs following the block diagram shown in Figure 2.
SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications
86