HB : HORIZONTAL BUFFER
VB : VERTICAL BUFFER
DE : DIAGONAL ENTRY
TRT : TOP ROW TILE
CT : COMMON TILE
VB 1
DE 3
VB 3
HB 1DE 1
TRT A
DE 2
HB 2
TRT B
HB 4
CT B
DE 4
VB 4
CT A
HB 3
VB 2
GLOBAL_ADD
LEFT_ROW_VAL, ROW_LOCK_VAL
LEFT_ROW_VAL, ROW_LOCK_VAL
TOP_ROW_VAL
TOP_ROW_VAL
Figure 10: Global Connections.
Table 1: Comparison validating the scalability.
Bogdanov’s architecture
size cells slices slices/cell
50×50 2500 5337 2.13
70×70 4900 10,684 2.18
Proposed Extended architecture
size cells slices slices/cell
5×5 of 10×10 2500 5184 2.07
7×7 of 10×10 4900 12,593 2.57
10×10 of 5×5 2500 5,507 2.21
10×10 of 7×7 4900 13,137 2.68
4 CONCLUSIONS
We present hardware building blocks, in a hard-
ware/software codesign solution, for solving large
system of linear equations (SLE) over Galois fields.
For SLEs over GF(2), an important special case, we
present efficient architectures for—a. basis search and
inversion (for tile-based Gaussian elimination), and b.
32 × 32 bit matrix multiplication. Prototyping these
as custom instruction extensions to NIOS-II, we ar-
gue the case for the use of the designs as light weight
extensions to custom or commodity processors for
relevant applications. We see that even when lim-
ited by the 50MHz clock on DE2-70 FPGA board,
the co-design solution can perform at ≈30GOPS. For
large matrix multiplication over GF(2
8
), we present
an adaptation from an earlier reported architecture for
64-bit floating point matrix multiplication. For large
SLE over GF(2), we also present an extension of Bog-
danov’s design, scalable over multiple FPGAs, along
with validating preliminary results indicating over 2.5
Trillion GF(2) operations on a Virtex-5 device.
ACKNOWLEDGEMENTS
The authors sincerely acknowledge Naval Research
Board (NRB), India (Project No. NRB-202/SC/10-
11) and Intel India Research Council for the financial
support covering this work.
REFERENCES
(2008). Altera DE2-70 - Development and Education
Board. Terasic.
(2008). Nallatech BenOne Board. Nallatech.
Bogdanov, A. and Mertens, M. C. (2006). A Parallel Hard-
ware Architecture for fast Gaussian Elimination over
GF(2). In Proceedings of the 14th Annual IEEE Sym-
posium on Field-Programmable Custom Computing
Machines, FCCM ’06, pages 237–248, Washington,
DC, USA. IEEE Computer Society.
Canis, A., Choi, J., Aldham, M., Zhang, V., Kammoona,
A., Anderson, J. H., Brown, S., and Czajkowski, T.
(2011). Legup: High-level synthesis for fpga-based
processor/accelerator systems. In Proceedings of the
19th ACM/SIGDA international symposium on Field
programmable gate arrays, FPGA ’11, pages 33–36.
ACM.
Ditter, A., Ceska, M., and Luttgen, G. (2012). On Parallel
Software Verification Using Boolean Equation Sys-
tems. In SPIN, pages 80–97.
Koc¸, c. K. and Arachchige, S. N. (1991). A fast algorithm
for Gaussian elimination over GF(2) and its imple-
mentation on the GAPP. J. Parallel Distrib. Comput.,
13(1):118–122.
Kumar, V. B. Y., Joshi, S., Patkar, S. B., and Narayanan,
H. (2010). FPGA Based High Performance Double-
Precision Matrix Multiplication. International Jour-
nal of Parallel Programming, 38(3-4):322–338.
Parkinson, D. and Wunderlich, M. (1984). A compact al-
gorithm for gaussian elimination over GF(2) imple-
mented on highly parallel computers. Parallel Com-
put., 1(1):65–73.
Rupp, A., Eisenbarth, T., Bogdanov, A., and Grieb, O.
(2011). Hardware SLE solvers: Efficient build-
ing blocks for cryptographic and cryptanalyticappli-
cations. Integration, 44(4):290–304.
Wang, C.-L. and Lin, J.-L. (1993). A Systolic Architecture
for Computing Inverses and Divisions in Finite Fields
GF(2
m
). IEEE Trans. Comput., 42(9):1141–1146.
Hardware-softwareScalableArchitecturesforGaussianEliminationoverGF(2)andHigherGaloisFields
201