Figure 4: The Instructions-block is divided into the source
part and the time management part. The source part is a
memory, which gives an instruction for each value of PC.
The time management increases the PC after some clock
cycles in the clock wire. The number of clock cycles which
are necessary to increase the program-counter (PC) by 1 is
given by each instruction individually.
of the connection from another neuron to this neu-
ron. With this information, the input spike can be
calculated according to equation (4).
• Four inputs to select the parameters a, b, c, d.
The multiplexer Sel input has 2 outputs and 2·n in-
puts, where n is the number of neurons in a neural net-
work. For each neuron in the network, there are two
connections to the multiplexer sel input. One input
carries the information whether the predecessor neu-
ron spikes and one input carries the information, con-
taining the weight of the connection between the two
neurons. When summing up the input spikes from
other neurons, this multiplexer chooses a neuron, and
read in its spike. Then, it selects the next neuron, to
read in its spike and sum all spikes up according to
formula (4). The multiplexer next to mem tmp selects
which number (if any) should be stored in mem tmp.
The instructions-block provides the circuit with
instructions, which control what the circuit does. Fig-
ure 4 provides an overview of the Instructions-block.
The first sub-block is the source-block. This block
consists of a memory, which is initialized with some
predefined content before the circuit begins to run
and it is read only. The 7 bit program counter PC is
connected to the address input of the memory-block
source. For each value of PC, there is a 64 bit value on
the output of the source-block: The instruction. The
instruction is one of the outputs of the instructions-
block, but it is used in the time management-block as
well. In the whole circuit, only some parts of this 64
bit value are used to control the function of blocks,
memories and to control the multiplexers.
The time management is built of two functions.
The clock divider divides the clock by the 4 bit wide
number on its second input. So, the clock divider’s
output end instruction is a clock with a reduced fre-
quency. The counter-block increases its value after
each cycle of the end instruction. It starts with 0 and
counts up. The resulting output signal of the counter
is the program counter (PC), which affects the mem-
ory source. The output end instruction causes the
memories mem 0, mem 1 and mem 2 in Figure 3 to
store the next value. Since the end instruction wire
determines, when an instruction ends, a whole cycle
of end instruction is often just called cycle.
4 CONCLUSIONS
The first attempt was straightforward to implement
and it was easy to understand the underlying func-
tion. But for the goal to implement a lot of neurons
in a relatively small FPGA, this model is not suffi-
cient. The second model described in this paper is
much more efficient, as it makes it possible to im-
plement a model of a neural network 10 times higher
than in traditional implementation. The limiting fac-
tor are the embedded multipliers. In this work, it is
not possible to create the desired macrofunctions just
out of logic and without the use of these multipliers.
However, the system computes much faster than real-
time. While designing this model, there was always a
focus on real-time compatibility, very important when
this network is needed to simulate a group of neurons,
which have to interact with the outside world.
All the designs in this paper work correctly. First,
the hardware design was logically validated with
Modelsim software, then was performed a timing
analysis with Altera TimeQuest software. After all
these software validations, the hardware was tested
in one Altera DE2 board containing one Cyclone II
FPGA, EP2C35F672C6, resulting in a computation
speed 109 times faster than real-time. The resource
usage of the circuit is very high since the processing
of floating point instructions is very resource inten-
sive, making a parallel simulation of more than 10
neurons impossible with the used device (low-cost Al-
tera FPGA). Table 2 shows the resource usage for of
the two models which were described in this paper. It
is important to emphasize that this work was imple-
ment on a low-cost and low-density FPGA Design.
There are some solutions to overcome the hard-
ware limitations: considering that the processing time
is only 10 % of the real time, each module might be
used 10 times, raising the number of neurons in the
network. Another possibility consists in using exter-
nal memories -containing large bandwidth- to store
the synaptic weights, as in the latest FPGAs models.
This work showed an efficient hardware imple-
mentation of a neural model. The current implemen-
tation runs a very accurate simulation of the biologi-
cal reality. One of the results of this project is a neural
model consisting out of 10 neurons, 100 times faster
ICFC 2010 - International Conference on Fuzzy Computation
348