Input
(17 .. 9)
Input
(8 .. 0)
Interval
memory
Gradient
memory
FFD
+
X
FFD
FFD
Output
FFD
Figure 6: Block diagram of an activation function
interpolator.
Since the activation function is realized by a
lookup table, it is easy to change its shape using
different coefficients. The circuit function can be
extended by adding another block with a different
activation function. For example, the activation
function for output layer will be a step function if we
require binary result.
3.4 Testing of Complete Circuit
The circuit was described in VHDL language, the
source code is modular and optimized for the
Spartan-3 family chips from Xilinx and fully
synthesizable. Since most of the circuit is arranged
into pipe-lines, it can work with the clock frequency
up to 133MHz.
The target XC3S-200 chip includes 12 blocks of
RAM and 12 dedicated multipliers. Activation
function requires at least one block RAM and one
multiplier. Neuron data memory needs one block
RAM. For this reason it is possible to implement 10
computing blocks, needing one block of on-chip
RAM for storing weight coefficients and one
multiplier for computing.
The circuit function was tested on an application
recognizing hand-written numbers. A network with
88 neurons in the input layer, 40 neurons in the
hidden layer and 10 in the output layer was realized.
The network model and the training algorithm were
realized in a computer and weight coefficients were
transferred to weight memory blocks on FPGA chip
(Masters, 1993).
Calculation of response of the neural network
require 88 times 4 clock periods for the hidden layer,
40 clock periods for the output layer plus latency of
the activation function of 4 periods. For one
character recognition, 396 clock cycles were needed.
At the working frequency of 133 MHz the circuit
can recognize 336000 numbers per a second. In
comparison to the single core signal processor which
would need for such a calculation at least 3920 clock
cycles, this is an excellent result.
4 CONCLUSIONS
The neural network function is very demanding
regarding to the computing power. In the FPGA
chips, however, it is possible to parallelize the
calculations very efficiently. The designed circuit
speeds up the computation for neural network
approximately 10 times in comparison to signal
processor. With using the biggest chip XC3S-5000
from the Spartan-3 family, it would be possible to
implement up to 100 computing cores into the
circuit and thus increase the computing power
theoretically up to 100 times.
The entire circuit is very modular and allows for
realization of neural networks in various
configurations. From very small networks up to
networks consisting of thousands of neurons and
hundreds of thousands synapses with a very high
computing power kept.
Spartan-3 family chip was chosen for the
implementation because of its low price and good
accessibility of development kits. Its disadvantage is
that it cannot process the MAC command in one
clock cycle. This problem is solved by higher FPGA
families, such as Virtex-5 family chips are able to
process up to 580 Giga MAC per second while their
computing of MAC command is performed within
one clock cycle.
ACKNOWLEDGEMENTS
This research has been supported by the Czech
Ministry of Education in the frame of long term
Program Plan MSM 0021630503 MIKROSYN New
Trends in Microelectronic Systems and
Nanotechnology and by the Czech Science
Foundation as the project GA102/08/1116.
REFERENCES
Ohomondi, Amos, R., Rajapasake, Jagath, C., 2006.
FPGA Implementations of Neural Networks, Springer
Netherlands
Fausett, L., 1994. Fundamentals of Neural Networks,
Prentice Hall New Jersey
Masters, T., 1993. Practical Neural Network Recipes in
C++, Academic Press California
Xilinx, 2006. Spartan-3 FPGA Family: Complete Data
Sheet, Xilinx company
Suhap, S., Becerikili, Y., Yazici, S., 2006. Neural Network
Implementation in Hardware Using PFGAs, 13
th
International Conference, ICONIP 2006, Springer-
Verlag Germany
ICINCO 2009 - 6th International Conference on Informatics in Control, Automation and Robotics
152