Table 1: Implementation results, resource utilization and speed comparison.
Ref Freq
MHz
Resources Timing (µs) FPGA
160 256 512 1024
Our 269.5 9 DSP Slices+558 Slices 0.39 0.75 2.35 8.41 XC5VLX50T
Mentens (2007)
108 66 MULTs+8192 Slices+66 RAM
Blocs
0.89 1.28 2.33 4.4 XC2VP30
Mentens (2007)
87 68 MULTs+7944 Slices 0.30 0.46 - 1.62 XC2VP30
Mentens (2007)
152 36 MULTs+6650 Slices 0.34 0.53 - 1.82 XC2VP30
McIvor (2004)
76 64 MULTs+4663 Slices - 1.22 - - XC2VP125
McIvor (2003)
76 11617 Slices - - - 13.11 XC2V3000
Kelley (2005)
135 32 MULTs+2593 LUTs+5K RAM - 0.39 - 2.4 XC2V2000
Kelley (2005)
135 8 MULTs+695 LUTs+5K RAM - 0.68 - 8.3 XC2V2000
Koc (1996)
60 Not Applicable - - - 799 Pentium-60
6 CONCLUSIONS
This paper presented the design methodology for
implementing improved SOS MMM for large
integers GF(P) of 32 bit word size in FPGAs using
DSP Slices to achieve area and speed trade off.
The proposed SOS Montgomery Multiplier was
implemented and tested at 269.5MHz with 160, 256,
512 and 1024 bit integers.
The fundamental contribution of this work is to
show that it is possible to design efficient
Montgomery Multipliers without compromising
scalability, portability, time performance and area
efficiency. Our multiplier is comparable to known
Montgomery Multipliers in terms of area-speed
trade off.
REFERENCES
P., Montgomery, 1985. Modular multiplication without
trial division. Mathematics of Computation. vol. 44,
no. 170, pp.519–521.
C.¸ K., Koc, T., Acar, and B., S., Kaliski, 1996. Analyzing
and comparing Montgomery multiplication
algorithms. IEEE Micro. vol. 16, no. 3, pp. 26-33.
C., D., Walter, October 1999. Montgomery exponentiation
needs no final subtraction. Electronic letters. vol. 35,
no. 21, pp. 1831–1832.
C., D., Walter, 1999. Montgomery’s multiplication
technique: How to make it smaller and faster. In C.¸
K., Koc and C., Paar, editors, Proceedings of the 1st
International Workshop on Cryptographic Hardware
and Embedded Systems (CHES), Lecture Notes in
Computer Science, Springer-Verlag. no. 1717, pp. 80–93.
Virtex-5 XtremeDSP Design Considerations User Guide,
April 14, 2006. V1.0, UG193, www.xilinx.com.
C., McIvor, M., McLoone, J., V., McCanny, A., Daly, and
W., Marnane, 2003. Fast Montgomery modular
multiplication and RSA cryptographic processor
architectures. In Proceedings of the 37th Annual
Asilomar Conference on Signals, Systems and
Computers. pp. 379–384.
Nele., Mentens, July, 2007. Secure and Efficient
Coprocessor Design for Cryptographic Applications
on FPGAs. PhD thesis. ISBN 978-90-5682-843-1.
K., Kelley and D., Harris, 2005. Parallelized very high
radix scalable Montgomery multipliers. In Conference
Record of the Thirty-Ninth Asilomar Conference on
Signals, Systems and Computers. pp. 1196–1200.
C., McIvor, M., McLoone, and J., V., McCanny, 2004.
FPGA Montgomery multiplier architectures – a
comparison. In Proceedings of the 12th IEEE
Symposium on Field-Programmable Custom
Computing Machines (FCCM), IEEE Computer
Society. pp. 279–282.
K., Manochehri and S., Pourmozafari, 2004. Fast
montgomery modular multiplication by pipelined CSA
architecture. In Proceedings of the International
Conference on Microelectronics (ICM). pp. 144–147.
D., N., Amanor, V., Bunimov, C., Paar, J., Pelzl, and M.,
Schimmler, 2005. Efficient hardware architectures for
modular multiplication on FPGAs. In Proceedings of
the 15th International Conference on Field
Programmable Logic and Applications (FPL), IEEE.
pp. 539–542.
V., Bunimov, M., Schimmler, and B., Tolg, 2002. A
complexity-effective version of Montgomery’s
algorithm. In Proceedings of the Workshop on
Complexity Effective Designs (WCED).
L., Batina, G., Bruin-Muurling, and S., B., Ors, 2004.
Flexible hardware design for RSA and elliptic curve
cryptosystems. In T. Okamoto, editor, Proceedings of
the RSA Conference – Topics in Cryptography (CT-
RSA), Lecture Notes in Computer Science Springer-
Verlag. vol. 2964, pp. 250–263.
Xilinx Virtex-4 Handbook. August 2, 2004.
SECRYPT 2008 - International Conference on Security and Cryptography
358