part of the SOC design area, so the area of crypto-
graphic module must be constricted to certain size.
The smaller the cryptographic module is, the better.
If the resources consumption is too big, the cost as an
important indicator of the SOC design will increase
too much, especially in the commercial SOC design.
From this aspect, the resources consumption of the
cryptographic hardware design is a very significant
indicator in hardware implementation, and good de-
sign can make cryptographic algorithm using in many
more application areas.
In this paper we improve the architecture of
Huang by adjusting the inner architecture of the pro-
cess elements, in such a way that Montgomery mul-
tiplication consumes fewer resources. Moreover our
architecture also adds the final subtraction( of the
Montgomery algorithm ) but doesn’t require addi-
tional clock cycles and the area is smaller as com-
pared to the architecture of Huang. Based on this
novel Montgomery multiplier, we implemented a new
RSA coprocessor which compared with the previous
work (Shieh et al., 2008) saved nearly 50% of slices,
what’s more, this is the smallest design as we know
in the literature at present. This new module has been
used into our other SOC designs, it works well, it is
a very useful cryptographic module in practical SOC
design.
The rest of the paper is organized as follows. Sec-
tion 2 reviews the preliminary of RSA and Mont-
gomery. Section 3 proposes an improved architecture
of the Montgomery algorithm. Section 4 presents the
architecture of our RSA coprocessor. Section 5 shows
an evaluation, analysis and comparison of our work
and some related works in the literature. The last sec-
tion concludes the whole paper.
2 PRELIMINARIES
2.1 Preliminaries: RSA Algorithm
RSA algorithm is a public-key encryption algorithm
that is used to develop a cryptosystem that offers both
public key encryption and digital signatures (authen-
tication) (Kaya-Koc, 1995). The algorithm is named
after three MIT mathematicians, Rivest, Shamir and
Adleman, who invented it in 1978, and its security
lies in the difficulty of factoring large integers. In the
RSA algorithm, the basic operation is modular expo-
nentiation of large integers. The parameters are n, p
and q, e, and d. The modulus n is the product of dis-
tinct large random primes: n = pq. The public ex-
ponent e is a number in the range 1 < e < φ(n) such
that gcd(e,φ(n)) = 1, where φ(n) is Euler function
of n, given by φ(n) = (p − 1)(q − 1). The private
exponent d is obtained by inverting e modulo φ(n).
d = e
−1
modφ(n), by using the extended Euclidean
algorithm we can get d. The encryption operation is
performed by computing C = M
e
(mod n), where M
is the plain text such that 0 ≤ M < n. The number
computing is the cipher text from which the plain text
M can be computed using M = C
d
(mod n). The RSA
algorithm can be used in many areas, such as sending
encryption messages and producing digital signature
for electronic message.
The modular exponentiation operation is the most
important operation in the RSA algorithm. In paper
(Kaya-Koc, 1995), the author reviewed the modular
exponentiation operation implemented on hardware.
This paper showsthat there are mainly two methods to
complete modular exponentiation, which are LR Bi-
nary Method and RL Binary Method.
Algorithm 1: LR Binary Method.
Input: M, e, n
Output: C := M
e
mod n
1 begin
2 if e
h−1
= 1 then
3 C := M;
4 else
5 C := 1;
6 for i = h− 2 to 0 do
7 C := C·C(mod n) ;
8 if e
i
= 1 then
9 C := C · M(mod n) ;
10 return C;
In Algorithm 1, exponent e is scanned from
the most significant bit(MSB) to the least signif-
icant(LSB). In the scanning process, the modular
squaring is performed for each bit, but the modular
multiplication is only performed when bit is 1. In the
LR Binary Method the squaring and multiplying op-
erations must be performed sequentially. This algo-
rithm takes 2h multiplications in the worst case con-
dition and 1.5h multiplications on average to com-
plete modular exponentiation since the multiplication
doesn’t need to compute when e
i
= 0, where h is the
size of operands e in bits. This implies that only a
single hardware multiplier is needed to perform the
squaring and multiplying. The cryptographic mod-
ule can not consume more hardware resources since
in some designing conditions, such as SOC, there are
many modules on the same chip and the hardware re-
sources is limited.
The RL Binary Method is another method to com-
plete modular exponentiation. Compared with the LR
Binary Method, this method can speed up the expo-
nentiation operation, however, it needs another modu-
SECRYPT 2011 - International Conference on Security and Cryptography
106