amount of resources that is required to update x and
r will be doubled. However, we can double the size
of the system that the SVU can handle by allowing
an extra clock cycle. The completion time for the
SVU is always fixed, unlike the MVU were the
completion time will vary with the size of the matrix
that it uses for the MVM.
As seen in Figure 2, the three main modules used
in the SVU data-path are a Divider, a Vector ALU
(VecALU), and an Accumulator. The operations
performed by these modules are described next.
Divider:
This module is used to calculate α and β,
which are used by VecALU.
VecALU:
This is an arithmetic logic unit (ALU)
that specifically carries out vector-vector or vector-
scalar operations. The residual r, search direction g,
and deformation x are updated here. The module
uses previous values along with α and β to generate
new values. The new value of r is passed to the
Accumulator.
Accumulator: This module basically sums the
elements of the register r2 reg. The result of this
summation is the 2-norm of vector r. Hence, each
element of register r2 Reg is the square of the
corresponding element in r. The divider uses this 2-
norm value in the calculation of α and β.
The SVU-Control controls the flow of
information among the registers and modules in the
SVUs data-path. As seen in Figure 2, there are three
registers, shown by dashed lines, one for the
multiplying vector, while the others are for the MVU
results. These are the three registers used to pass
information between the SVU and the MVU. The
multiplying vector
register g Reg is used for passing
the direction vector to the MVU, while the result
registers,
p
T
Kp Reg and Kp
Reg, are used for
receiving the MVU results (p
T
Kp and Kp).
3.2 MVU Design
This MVU is designed specifically for MVMs, of
the form Kp and p
T
Kp, which may involve sparse
matrices. The design, shown in Figure 3, requires
only the non-zero elements of the matrix to be stored
in the memory. The non-zero elements are stored in
memory as part of a simple 32-bit instruction format,
shown below, that was designed for the MVU.
Further, these non-zero elements are stored in
memory using fixed-point format.
a(1bit) b(1bit) c(9bits) d(21bits)
a 1
st
bit determine s end of matrix.
b 2
nd
bit determine s end of row.
c 3
rd
to 11
th
bits used to determine the column of
the nonzero value.
d last 21 bits give the nonzero value.
Figure 3: MVU Design.
The MVU data-path is pipelined and divided into
three modules, namely, Instruction Fetch module
(IFetch), Instruction Decode module (IDecode), and
Execute module (IExecute).
IFetch: This module just fetch’s the next instruction
from memory and forwards it to IDecode for use.
The instructions are read sequentially with the
addresses gotten from a sequential counter.
IDecode:
The instruction is decoded here using the
format described earlier. It is determined here if the
end of the current row or column (ERC) or the end
of matrix (EM) has been reached. The address of the
next vector element needed for the next
multiplication is also determined here.
IExecute:
This module basically performs the
traditional MVM (i.e. taking the inner product of
each row and the multiplying vector, starting with
the first row) using a set of multipliers and
accumulators. The calculation of p
T
Kp and Kp are
done concurrently, with the appropriate values
stored in the appropriate result registers.
The MVU-Controller controls the flow of
information among the registers and modules in the
MVU data-path. As discussed earlier, the MVU
result registers, and multiplying vector register are
used for passing information between the SVU and
MVU.
4 RESOURCE USAGE AND
PERFORMANCE
FPGAs contain three main resources namely,
multipliers, logic elements and registers. Of these
three, the multipliers are of least abundance. This
makes them the bottleneck of any design for
applications that are heavily dependent on the usage
of multipliers. For this reason, we use the multiplier
usage as the primary measure of our designs
resource usage, as it is the deciding factor in the
maximum size of the system that can be solved on
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
304