SECURING OPENSSL AGAINST MICRO-ARCHITECTURAL

ATTACKS

Marc Joye

Thomson R&D France, Technology Group, Corporate Research, Security Laboratory

1 avenue de Belle Fontaine, 35576 Cesson-S

evign

e Cedex, France

Michael Tunstall

Department of Electrical & Electronic Engineering, University College Cork, Cork, Ireland

Keywords:

RSA, Modular Exponentiation, Micro-Architectural Attacks, Side-Channel Resistant Implementations.

Abstract:

This paper presents a version of the 2

-ary modular exponentiation algorithm that is secure against current

methods of side-channel analysis that can be applied to PCs (the so-called micro-architectural attacks). Some

optimisations to the basic algorithm are also proposed to improve the efﬁciency of an implementation. The

proposed algorithm is compared to the current implementation of OpenSSL, and it is shown that the proposed

algorithm is more robust than the current implementation.

1 INTRODUCTION

Exponentiation algorithms are important for many

public-key cryptographic algorithms, in particular for

computing the modular exponentiation necessary for

RSA (Rivest et al., 1978). It is therefore essential to

ensure that implementations of algorithms requiring

a modular exponentiation are not vulnerable to any

known attacks.

Side-channel attacks can be applied remotely to

a PC, by observing the time taken for a processor

to compute a given function. In addition, they may

observe some micro-architectural features, e.g. the

cache or branch predictor of a processor which is ex-

ecuting the function. This usually requires the exe-

cution of a spy process to observe and manipulate a

processor while it is running. A more detailed de-

scription of different types of side channels that can

be applied to PCs is given in Section 2.

This paper proposes a modiﬁed 2

-ary modular

exponentiation algorithm (the notation used in this pa-

per is taken from (Knuth, 2001)). The proposed algo-

rithm is resistant to all currently known side channels

available to an attacker targeting a PC implementa-

tion. The security of this algorithm is analysed in

terms of its side-channel resistance to the attack meth-

ods presented in Section 2, and some further optimi-

sations to the basic algorithm are also presented.

The proposed algorithm is compared to the cur-

rent secure implementation of modular exponentia-

tion used in OpenSSL (OpenSSL, 2007). It is demon-

strated that some bits of the private exponent risk be-

ing revealed if an attacker is able to modify the cache

and observe the effect on the output. The proposed

algorithm is shown to be more robust than the current

OpenSSL implementation.

The rest of this paper is organised as follows.

The different side channels that can potentially be ex-

ploited to reveal secret information are described in

Section 2. The proposed exponentiation algorithm is

described in Section 3, and some further optimisa-

tions are presented in Section 4. A comparison of the

proposed algorithm with the implementation used in

the current version of OpenSSL is presented in Sec-

tion 5. This is followed by our conclusions in Sec-

tion 6.

Notation: The base of a value is determined by a

trailing subscript, which is applied to the whole word

preceding the subscript. For example, FE

is 254 ex-

pressed in base 16, d = (d

ℓ−1

ℓ−2

,...,d

)

gives a

binary expression for d, and d = (d

ℓ−1

ℓ−2

,...,d

)

gives an expression where each d

, for 0 ≤ i < ℓ, rep-

resents two bits of d.

In all the algorithms described in this paper λ rep-

resents the Carmichael function, where λ(N) is de-

ﬁned for N as the smallest positive integer m such

that a

≡ 1 (mod N) for every integer a that is co-

189

Joye M. and Tunstall M. (2007).

SECURING OPENSSL AGAINST MICRO-ARCHITECTURAL ATTACKS.

In Proceedings of the Second International Conference on Security and Cryptography, pages 189-196

DOI: 10.5220/0002118801890196

 SciTePress

prime to N. In particular, if N = pq is an RSA mod-

ulus then λ(N) = lcm(p − 1,q − 1). The notation φ

represents Euler’s totient function, where φ(N) equals

the number of positive integers less than N which are

coprime to N. If N = pq is an RSA modulus then

φ(N) = (p− 1)(q−1).

2 SIDE-CHANNEL ANALYSIS

There are several different methods of side-channel

analysis that can potentially be applied to an imple-

mentation of the RSA signature scheme. These meth-

ods are summarised below.

2.1 Timing Analysis

The ﬁrst academic publication of side-channel anal-

ysis was an attack that observed the correlation be-

tween guessed bits of a secret and the time required to

compute an algorithm (Kocher, 1996). The principle

target of the timing analysis described was the RSA

signature scheme. This was extended in (Schindler,

2000) to include the RSA signature scheme when

it is calculated using the Chinese Remainder The-

orem (Knuth, 2001) and Montgomery multiplica-

tion (Montgomery, 1985). These attacks were typi-

cally thought of in terms of smart cards, where it is

trivial to observe the execution time of a na

ıvely im-

plemented process.

It was demonstrated in (Brumley and Boneh,

2003) that timing analysis of the computation of

RSA signatures could be conducted across a network

against complex implementations, such as OpenSSL.

This demonstrated the need to consider the possible

side channels that could be exploited in implementa-

tions of cryptographic algorithms on all platforms.

In this paper it is assumed that the underlying mul-

tiplication algorithm used in the exponentiation algo-

rithm is resistant to timing analysis. For example, if

we consider Montgomery multiplication, which con-

tains a conditional modular subtraction, it is pointed

out in (Hachez and Quisquater, 2000; Walter, 1999a;

Walter, 1999b) that this ﬁnal operation can be omit-

ted.

2.2 Cache-Based Side-Channel Analysis

A cache is a small, fast RAM memory whose role is

to buffer the lines of Non-Volatile Memory (NVM)

or external RAM being fetched. When a data or in-

struction word is to be fetched from the NVM or ex-

ternal RAM, the CPU will ﬁrst check whether this

particular word is already in the cache: if yes (this

is a cache hit), the word is fetched directly from the

cache. If, on the contrary, this particular word is not

cached this is a cache miss. The CPU will then fetch

a whole line (e.g. 32 bytes) within which the targeted

word is found. The data in this cache line can then be

accessed rapidly by the CPU, whereas accessing ex-

ternal resources to fetch data takes signiﬁcantly more

time.

Using the cache as a side channel to attack an im-

plementation of a cryptographic algorithm was ﬁrst

proposed in (Tsunoo et al., 2003). Several attacks

have since been published using cache access events

as a side channel (Bernstein, 2005; Bertoni et al.,

2005; Osvik et al., 2006) to derive a secret key used

in implementations of block ciphers, such as DES and

AES. These examples are predominately a speciﬁc

case of timing analysis, where the total number of

cache misses in an algorithm is used to determine in-

formation on the secret key being used.

Another example of using the cache as a side

channel has been termed trace-driven cache analysis,

and was ﬁrst described in (Page, 2002). This attack

functions by observing what cache lines are used by a

process computing a cryptographic algorithm. This is

possible as the cache is open to inspection and mod-

iﬁcation by all processes being run on a PC. Im-

plementations of attacks that exploit this method of

side-channel analysis against PC implementations of

AES are described in (Acıic¸mez and Koc¸, 2006; Os-

vik et al., 2006).

2.3 Branch Prediction Analysis

Modern chips for PCs include branch prediction to

improve overall performance. This involves the inclu-

sion of a Branch Target Buffer (BTB) and a Branch

Predictor (BP). The BTB is a buffer of limited size

that acts as a cache for storing the addresses of previ-

ously executed branches. The BP is an algorithm that

attempts to predict what branches will be taken, based

on previous observations. If a conditional branch is

present in an algorithm (e.g. an if command) the BP

will attempt to predict the outcome of this branch and

load the relevant instructions into the CPU. If the pre-

diction is correct this increases performance, since the

relevant instructions are available. However, if the

prediction is incorrect the CPU is obliged to fetch the

instructions for the other branch. In (Acıic¸mez et al.,

2007c) it is pointed out that this will lead to a differ-

ence in execution time and can therefore be used to

conduct a timing analysis.

More sophisticated attacks are presented

in (Acıic¸mez et al., 2007b; Acıic¸mez et al., 2007c)

that modify the BTB to produce effects that can leak

SECRYPT 2007 - International Conference on Security and Cryptography

190

information more efﬁciently than observing the time

taken to compute an algorithm. Indeed, the most

efﬁcient attack described involves closely observing

the BP during the computation of an RSA signature

by using a spy process that modiﬁes the BTB and

observes the subsequent behaviour. This could

allow an attacker to derive the private key from one

signature generation. An implementation of this type

of attack on a modiﬁed version of the function used

in OpenSSL to generate RSA signatures is described

in (Acıic¸mez et al., 2007b).

Again, it is assumed that the underlying multi-

plication algorithm is not vulnerable to this type of

side-channel analysis, i.e. there are no conditional

branches in the multiplication algorithm and each

multiplication involves exactly the same number of

operations for inputs of a given bit length.

3 SIDE-CHANNEL RESISTANT

-ARY EXPONENTIATION

The algorithm proposed in this paper is a modiﬁed

-ary exponentiation, as deﬁned in (Knuth, 2001).

This is combined with the techniques used to protect

embedded implementations from Differential Power

Analysis (Kocher et al., 1999), where the input val-

ues are multiplied by small random values to mask

the behaviour of the algorithm during execution. This

algorithm is described in Algorithm 1, where ρ is a

small integer that is used to increase the bit length of

N so that it is the same as M

∗

The input Λ is either λ(N) or some multiple

thereof. In the case of RSA we can use φ(N) =

(p−1)(q− 1), or even (e· d − 1) (where e is the pub-

lic exponent), which is a multiple of λ(N). Note that

working with (e·d−1) instead of λ(N) does not have

a large impact on the performance of the algorithm,

since e is usually small (typically e will be equal to 3

or 2

+ 1).

The variable R[1] is set to a value equivalent to

1 mod N and will therefore have no effect on the re-

sult but will involve a multiplication with an integer

modulo N

′

= ρ· N. This means that there is no condi-

tional branching within the exponentiation loop. The

multiplication with a given R[i] can be determined by

calculating a pointer to the relevant variable, assum-

ing that the variables of R[i], for 1 ≤ i ≤ b, are con-

tiguous in memory.

Algorithm 1 also slightly differs from the classical

-ary exponentiation algorithm as the ﬁrst operation

of the while loop is

A ← A

k−1

mod R[0],

Algorithm 1: Secure 2

-ary exponentiation al-

gorithm.

Input: M, d = (d

ℓ−1

ℓ−2

,...,d

)

where

b = 2

for some k ≥ 1, N, ρ, Λ, and two

random values r

and r

(of bit length

|ρ|

Output: S = M

mod N.

∗

= M + r

· N

∗

= (d − 1+ r

· Λ)/2

∗

= 1+ r

· N

′

= ρ · N

R[0] ← N

′

R[1] ← U

∗

mod R[0]

R[2] ← M

∗

mod R[0]

for j = 3 to b do

R[ j] ← R[ j − 1] · R[2] mod R[0]

end

i ← ⌊log

∗

⌋

A ← R[d

∗

]

mod R[0]

i ← i− 1

while (i ≥ 0) do

A ← A

k−1

mod R[0]

A ← A· R[d

∗

+ 1] mod R[0]

A ← A

mod R[0]

i ← i− 1

end

A ← r

· A· R[1] mod R[0]

A ← A/r

return A

rather than

A ← A

mod R[0] .

This can be explained if we suppose that

= M

mod N,

then S = M

mod N can be rewritten as

S = M

(d−1)/2

· M mod N .

This allows d to be replaced with (a randomised rep-

resentation of) (d − 1)/2, when it is multiplied by a

small random at the beginning of the exponentiation.

The last modular multiplication can be moved outside

the while loop reducing the amount of computation

required within the loop. This assumes that d is al-

ways odd, Λ is always even (as is the case for RSA),

and the computation of d

∗

is always possible.

Each random value used has the effect that each

multiplication is randomised by a value whose effect

is equivalent to a multiplication by 1 mod N and is

therefore easily removed at the end. The bit length of

SECURING OPENSSL AGAINST MICRO-ARCHITECTURAL ATTACKS

191

the random values used are often determined by the

algorithm and/or the architecture used. For example,

in software implementations the natural choice would

be to use random values with the same bit length as

the words manipulated by the processor (or a multiple

thereof).

The initialisation of R[1] and R[2] ensures that

these variables always contain a value whose bit

length is similar to the bit length of N

′

. A value with

a constant bit length will, therefore, always be given

to the underlying multiplication algorithm. This re-

moves the possibility of an attacker provoking a situ-

ation that could allow timing analysis by choosing M

as a small integer (chosen-message attack).

The change in d means that, for a ﬁxed value of

d, each execution of the algorithm will behave differ-

ently. It is therefore not possible to derive informa-

tion by observing multiple executions, an attacker is

obliged to attempt to derive d from a single execution.

Also note that the two last instructions, A ← (r

A · R[1] mod R[0])/r

, can also be implemented as

A ← A· R[1] mod N. This choice of instruction will

depend on which instruction is most suitable for a

given implementation.

The security of this algorithm against the side-

channel analysis methods described in Section 2 is as

follows.

Timing Analysis: The algorithm will take a

constant number of operations to execute, i.e.

⌈(log

∗

)/k⌉ sets of k squaring operations and one

multiplication. The only differences in computation

time will be caused by the variable bit length of r

However, there are no data dependent differences in

execution time to allow a timing analysis to take

place. As described in Section 2, it is assumed that the

underlying squaring operation (respectively the mul-

tiplication) will always take the same amount of time

for inputs of a given bit length. The bit length of the

inputs to all the multiplications is identical for all d

∗

because the initialisation steps mean that each R[i], for

1 ≤ i ≤ b, contains a variable with a bit length similar

to N

′

Cache-Based Side-Channel Analysis: The result

of the calculation of the powers of M

∗

will be stored

in the cache. In a multi-threaded system it would

be potentially possible to exploit this, by determining

how an implementation behaves with different values

of d. This possibility is removed by the masking of

the input variable with small random variables. In

particular, the modiﬁcation to d means that the cache

lines accessed for a given value of M will vary unpre-

dictably from one execution to another.

If an attacker is able to produce a trace of the cache

accesses it is potentially possible to determine some

information on d

∗

, as each value of d

∗

will cause

the algorithm to access different cache lines. An at-

tacker may therefore be able to determine d

∗

which

will give a value that is equivalent to d when used as

an exponent modulo N. A trick that can remove this

side channel is used in the current implementation of

OpenSSL and is described in Section 5.

Branch Prediction Analysis: As mentioned previ-

ously, there is no conditional branching within the

(main loop of) the algorithm, and it will, therefore,

not be possible to determine any bits of d, or d

∗

, by

observing the behaviour of the branch predictor. The

required variable can be accessed by calculating an

offset from the beginning of R[0], if R[0] to R[b] are

stored in contiguous memory.

It would be reasonable to assume that an attacker

can determine at what point the conditional jumps

used in the for and while loops occur (Acıic¸mez,

2007). As stated above, it is assumed that each squar-

ing operation (respectively the multiplication) will al-

ways take the same amount to time to calculate for

inputs of a given bit length. The bit length of each

R[i], for 1 ≤ i ≤ b, is identical, and an attacker will,

therefore, not be able to derive any information by

choosing M as a small integer.

This side channel can also be removed by un-

rolling the loops, either in the source code or by using

the compiler. However, this would require the imple-

mentation of a different function for each bit length of

interest, and that the most signiﬁcant bit of r

is set to

one so that the bit length of d

∗

is constant.

4 FURTHER OPTIMISATIONS

Another version of Algorithm 1 is presented in Algo-

rithm 2, and contains some further optimisations that

can make an implementation more efﬁcient in terms

of speed and memory required. It is possible to com-

bine N

′

and R[0] (as used in Algorithm 1) in memory

to reduce the memory that is required to implement

the proposed algorithm. This can be achieved by ob-

serving that

R[1] ← 1+ r

· N mod N

′

whose purpose is to allow a multiplication by 1 mod

N to take place, can also be written as

R[1] ← r

· N − 1 mod N

′



≡ −1 (mod N)



This is because it is always followed by a squaring

(namely, A ← A

mod R[0]).

SECRYPT 2007 - International Conference on Security and Cryptography

192

However, letting N

′

= ρ · N, this requires that

the while loop is modiﬁed to take into account this

change in Algorithm 2. Each R[i], for 1 ≤ i < b, there-

fore contains M

∗

/2 mod N

′

—and R[0] contains a

value that is equivalent to −1/2 (mod N), and after

the multiplication operation the result is corrected by

doubling A. Provided that N is odd (which is always

the case for RSA moduli), this can be implemented

on a processor that manipulates words of ω bits by

calculating

′′

= 2

ω−1

· ρ· N

where ρ is a small odd random integer that is used to

increase the bit length of N · 2

ω−1

and to randomise

the value of −1/2 (mod N). Indeed, since N and ρ

are assumed to be odd, it follows that



′′





ρ· N



ρ· N − 1

≡ −1/2 (mod N)

and N

′′

mod 2

= 2

ω−1

. In other words, this will cre-

ate a value for N

′′

where the least signiﬁcant word is

ω−1

and the remaining upper words represent a ran-

domised value for −1/2 (mod N). In order to be re-

sistant to side-channel analysis, the precomputed val-

ues of M

∗

/2, for 1 ≤ i < b, are computed modulo

′

= ρ · N and so are represented with the same num-

ber of words as (ρ· N −1)/2, which is written in R[0].

As presented, Algorithm 2 assumes a little-endian

representation; if all R[i] are stored in continuous

memory, R[0]

−

denotes the memory location starting

one word before R[0]. Nevertheless, it can easily be

adapted to accommodate a big-endian representation.

In Algorithm 2 the modulus N

′′

is always an even

number. This excludes the use of Montgomery mul-

tiplication, and will require the use of an alternative,

such as Barrett or Quisquater multiplication (Barrett,

1987; Quisquater, 1992).

5 COMPARISON WITH OPENSSL

The algorithm used in OpenSSL

for the constant

time implementation of a modular exponentiation is

the classical 2

-ary exponentiation algorithm, and

uses Montgomery multiplication. Each M

mod N,

for 0 ≤ i < 2

, are computed and stored in their Mont-

gomery representation. This uses more memory than

the proposed algorithm as the modulus cannot be

stored in the same memory.

To make the cache accesses behave in a deter-

ministic manner for all possible values of d, the val-

ues of M

mod N, for 0 ≤ i < 2

, are mapped so that

At the time of writing the most recent release of

OpenSSL was version 0.9.8e.

Algorithm 2: Secure 2

-ary exponentiation al-

gorithm. (II)

Input: M, d = (d

ℓ−1

ℓ−2

,...,d

)

where

b = 2

for some k ≥ 1, N odd, random

odd value ρ, Λ, processor word-size in

bits ω, 2 random values r

and r

(of bit

length |ρ|

), and a random value r

(of

bit length ω).

Output: S = M

mod N.

∗

= (M/2 mod N) + r

· N

∗

= (d − 1+ r

· Λ)/2

′

= ρ · N

R[0] ← N

′

R[1] ← M

∗

A ← R[1] + R[1] mod R[0]

for j = 3 to b do

R[ j] ← R[ j − 1] · A mod R[0]

end

i ← ⌊log

∗

⌋

A ← 2R[d

∗

] + r

· R[0]

R[0]

−

← 2

ω−1

· R[0]

A ← A mod R[0]

−

A ← A

mod R[0]

−

i ← i− 1

while (i ≥ 0) do

A ← A

k−1

mod R[0]

−

A ← A· R[d

∗

] mod R[0]

−

A ← A+ A mod R[0]

−

A ← A

mod R[0]

−

i ← i− 1

end

A ← 2r

· A· R[1] mod R[0]

−

A ← A/r

return A

the choice of any arbitrary M

mod N will access the

same cache lines. This is achieved by selecting 2

to be the same as the number of bytes available in

each cache line. One cache line can then be used

to store one byte of each M

mod N, for 0 ≤ i < 2

i.e. if we consider a cache to be a matrix of bytes,

where the number columns is the cache line size, each

mod N is stored in column i+ 1.

No timing analysis can be conducted based on the

use of the cache as the same number of cache lines

will be accessed for each loop of the algorithm. It

also prevents trace-based cache analysis as the same

cache lines will be accessed for all possible values of

the private exponent. This requires careful implemen-

tation, as it is important that the same byte from each

mod N, for 0 ≤ i < 2

, is stored on the same cache

SECURING OPENSSL AGAINST MICRO-ARCHITECTURAL ATTACKS

193

line.

If someone were to take the OpenSSL source and

compile it on a platform with a non-standard cache

line size (the default in OpenSSL is 32 bytes, and

the classical 2

-ary exponentiation algorithm), with-

out modifying the source, there could be some po-

tential security problems. If, for example, this was

implemented on a platform with a cache line size of

16 bytes, then the ﬁrst cache line would contain the

ﬁrst byte of each each M

mod N, for 0 ≤ i < 2

, and

the second cache line would contain the the ﬁrst byte

of M

mod N, for 2

≤ i < 2

. This pattern contin-

ues for the bytes stored in the following cache lines.

If an attacker is able to determine which set of cache

lines are used for each multiplication (i.e. odd or even

numbered cache lines) some bits of d can be deter-

mined. More precisely, an attacker would be able to

determine the most signiﬁcant bit of each window of

k bits.

This problem can be avoided by using the algo-

rithm proposed in this paper, as an attacker may be

able to determine some bits of d

∗

but this will not pro-

vide any information on d. However, in an implemen-

tation of the proposed algorithm it is still necessary

to use the memory mapping described above, so that

the same cache lines are accessed for each M

mod N,

for 0 ≤ i < 2

. Otherwise a trace-based cache analy-

sis can potentially reveal d

∗

, which is equivalent to d

when used as an exponent modulo N.

This paper does not claim that this represents

a security ﬂaw in the current implementation of

OpenSSL. Indeed, the use of 16-byte cache lines

is considered in the source, but requires the cache

line size to be declared. Not all programmers would

be aware of the security issues surrounding micro-

architectural attacks.

The default implementation of RSA in OpenSSL

uses the blinding scheme given in (Chaum, 1985), and

described in Algorithm 3. The proposed algorithm

will provide a more efﬁcient implementation, as Al-

gorithm 3 requires that t

mod N and t

−1

mod N are

stored in memory and periodically updated. More-

over, each time a new t is required a modular inverse

needs to be calculated which will increase the time

required to compute Algorithm 3.

The proposed algorithm will also provide a more

secure implementation, since the exponent is ran-

domised. The appendix describes a theoretical at-

tack that could break the current implementation of

OpenSSL, where Algorithm 3 is used, but would not

be able to break the proposed algorithm. This is possi-

ble because an attacker is required to derive the entire

value of d

∗

in one attack to break the proposed al-

gorithm. In the current implementation of OpenSSL,

Algorithm 3: Chaum’s blinding scheme.

Input: M, d, e where e· d ≡ 1 (mod φ(N)), N,

a random value t where 0 ≤ t ≤ N − 1

and is coprime to N.

Output: S = M

mod N.

A ← M ·t

mod N

A ← A

mod N

A ← t

−1

· A mod N

return A

the repeated use of the same value of d could allow

information on different bits of d to be derived from

separate attacks.

6 CONCLUSION

This paper presents a side-channel resistant version of

the 2

-ary exponentiation algorithm for calculating a

modular exponentiation. This algorithm is presented

in Algorithm 1, and an optimised version is presented

in Algorithm 2.

In summary the advantages of the proposed algo-

rithm over the default settings of the implementation

used in OpenSSL are:

1. The proposed algorithm requires less memory

than the current implementation of OpenSSL as

the modulus N can be stored in the same mem-

ory as M

mod N. It is also not necessary to

store a pair t

mod N and t

−1

mod N in mem-

ory, as smaller random values can be used that

only have mild constraints. Moreover, it is shown

in (Acıic¸mez et al., 2007a) that the calculation

of the modular inverse necessary for this blinding

method could be vulnerable to side-channel anal-

ysis.

2. If the source is compiled by a na

ıve programmer

there is less chance of a bug compromising the

security of the exponentiation algorithm. An ex-

ample of this is given in Section 5.

3. The proposed algorithm is more secure against

other attacks than the current implementation of

OpenSSL. A theoretical attack is described in

the appendix that could compromise the security

of the current implementation of OpenSSL, even

when the current blinding scheme is considered.

The proposed algorithm cannot be attacked in this

manner.

SECRYPT 2007 - International Conference on Security and Cryptography

194

REFERENCES

Acıic¸mez, O. (2007). Private communication.

Acıic¸mez, O. and Koc¸, C. K. (2006). Trace-driven cache

attacks on AES. Cryptology ePrint Archive, Report

2006/138.

http://eprint.iacr.org/2006/138/

Acıic¸mez, O., Gueron, S., and Seifert, J.-P. (2007). New

branch prediction vulnerabilities in OpenSSL and nec-

essary software countermeasures. Cryptology ePrint

Archive, Report 2007/039, 2007,

http://eprint.

iacr.org/

Acıic¸mez, O., Koc¸, C. K., and Seifert, J.-P. (2007a). On

the power of simple branch prediction analysis. Cryp-

tology ePrint Archive, Report 2006/351, 2006,

http:

//eprint.iacr.org/

Acıic¸mez, O., Koc¸, C. K., and Seifert, J.-P. (2007b). Pre-

dicting secret keys via branch prediction. In Topics in

Cryptology — CT-RSA 2007, volume 4377 of Lecture

Notes in Computer Science, pages 225–242. Springer-

Verlag.

Bao, F., Deng, R. H., Han, Y., Jeng, A., Narasimhalu, A. D.,

and Ngair, T. (1997). Breaking public key cryptosys-

tems on tamper resistant devices in the presence of

transient faults. In Security Protocols, volume 1361 of

Lecture Notes in Computer Science, pages 115–124.

Springer-Verlag.

Barrett, P. (1987). Implementing the Rivest-Shamir-

Adleman public-key encryption algorithm on a stan-

dard digital processor. In Advances in Cryptology —

CRYPT0 ’87, volume 267 of Lecture Notes in Com-

puter Science, pages 311–323. Springer-Verlag.

Bernstein, D. J. (2005). Cache timing attacks

on AES.

http://cr.yp.to/antiforgery/

cachetiming-20050414.pdf

Bertoni, G., Zaccaria, V., Breveglieri, L., Monchiero, M.,

and Palermo, G. (2005). AES power attack based on

induced cache miss and countermeasures. In Interna-

tional Symposium on Information Technology: Cod-

ing and Computing — ITCC 2005, pages 586–591.

IEEE Computer Society.

Brumley, D. and Boneh, D. (2003). Remote timing attacks

are practical. In 12

USENIX Security Symposium,

pages 1–14.

Chaum, D. (1985). Security without identiﬁcation: transac-

tion systems to make big brother obsolete. Communi-

cations of the ACM, 28(10):1030–1044.

Hachez, G. and Quisquater, J.-J. (2000). Montgomery ex-

ponentiation with no ﬁnal subtractions: Improved re-

sults. In Cryptographic Hardware and Embedded Sys-

tems — CHES 2000, volume 1965 of Lecture Notes in

Computer Science, pages 293–301. Springer-Verlag.

Joye, M., Quisquater, J.-J., Bao, F., and Deng, R. H.

(1997). RSA-type signatures in the presence of tran-

sient faults. In Cryptography and Coding, volume

1355 of Lecture Notes in Computer Science, pages

155–160. Springer-Verlag.

Knuth, D. (2001). The Art of Computer Programming, vol-

ume 2, Seminumerical Algorithms. Addison–Wesley,

third edition.

Kocher, P. (1996). Timing attacks on implementations of

Difﬁe-Hellman, RSA, DSS, and other systems. In Ad-

vances in Cryptology — CRYPTO ’96, volume 1109

of Lecture Notes in Computer Science, pages 104–

113. Springer-Verlag.

Kocher, P., Jaffe, J., and Jun, B. (1999). Differential power

analysis. In Advances in Cryptology — CRYPTO ’99,

volume 1666 of Lecture Notes in Computer Science,

pages 388–397. Springer-Verlag.

Montgomery, P. (1985). Modular multiplication without

trial division. Mathematics of Computation, 44:519–

521.

OpenSSL (2007). Open source toolkit for SSL/TLS.

http:

//www.openssl.org

Osvik, D. A., Shamir, A., and Tromer, E. (2006). Cache

attacks and countermeasures: the case of AES. In

Topics in Cryptology — CT-RSA 2006, volume 3860

of Lecture Notes in Computer Science, pages 1–20.

Springer-Verlag.

Page, D. (2002). Theoretical use of cache memory

as a cryptanalytic side-channel. Cryptology ePrint

Archive, Report 2002/169.

http://eprint.iacr.

org/2002/169/

Quisquater, J.-J. (1992). Encoding system according to

the so-called RSA method, by means of a micro-

controller and arrangement implementing this system.

U.S. Patent Number 5,166,978. Also presented at the

rump session of EUROCRYPT ’90.

Rivest, R., Shamir, A., and Adleman, L. M. (1978). Method

for obtaining digital signatures and public-key cryp-

tosystems. Communications of the ACM, 21(2):120–

126.

Schindler, W. (2000). A timing attack against RSA with the

Chinese remainder theorem. In Cryptographic Hard-

ware and Embedded Systems — CHES 2000, volume

1965 of Lecture Notes in Computer Science, pages

109–124. Springer-Verlag.

Tsunoo, Y., Saito, T., Suzaki, T., Shigeri, M., Miyauchi, H.

(2003). Cryptanalysis of DES implemented on com-

puters with cache. In Cryptographic Hardware and

Embedded Systems — CHES 2003, volume 2779 of

Lecture Notes in Computer Science, pages 62–76.

Springer-Verlag.

Walter, C. D. (1999a). Montgomery exponentiation needs

no ﬁnal subtractions. Electronic Letters, 35(21):1831–

1832.

Walter, C. D. (1999b). Montgomery’s multiplication tech-

nique: How to make it smaller and faster. In Crypto-

graphic Hardware and Embedded Systems — CHES

’99, volume 1717 of Lecture Notes in Computer Sci-

ence, pages 80–93. Springer-Verlag.

APPENDIX

In this appendix a theoretical attack on the current ver-

sion of OpenSSL is described. The attack assumes

that an attacking process is running concurrently with

SECURING OPENSSL AGAINST MICRO-ARCHITECTURAL ATTACKS

195

the exponentiation algorithm, that can read and mod-

ify arbitrary addresses in RAM. If this process is able

to modify the values of M

mod N, for 0 ≤ i < 2

, be-

fore they are used to calculate a modular exponentia-

tion an attack can be envisaged based on (Bao et al.,

1997; Joye et al., 1997).

An attacker can, arbitrarily, choose some M

mod

N, for 1 ≤ i < 2

, and overwrite this value in memory

with M

mod N. This has the effect of replacing all b-

digits whose value is i with zero (note that b = 2

). An

attacker can then seek to determine how many digits

were changed from i to zero.

If, for example, the j-th and k-th b-digits of d are

changed from i to zero, then the expected signature S

′

from a message M will satisfy the following equation:

′

≡ (M

)

−(i·b

)

· (M

)

−(i·b

)

(mod N) (†)

where e is the public exponent. A more complex

equivalence can be determined where an attacker has

set a chosen M

mod N to M

mod N, since more dig-

its will be changed than are considered in the above

example.

If, for a chosen i, each instance where the b-digit

is equal to i is replaced with zero, this could, poten-

tially, allow an attacker to determine where in d each

b-digit is equal to i. This could be achieved by calcu-

lating the result of S

′

/M mod N for all of the possible

combinations of changed digits.

For example, if we consider RSA signature gen-

eration using a 1024-bit modulus calculated using

the 2

-ary modular exponentiation algorithm (as cur-

rently used in OpenSSL). There will be ⌈1024/5⌉ =

205 loops in the modular exponentiation algorithm.

If, for an arbitrary i (for 1 ≤ i < b), M

mod N is

changed to M

mod N, this will, statistically, be ex-

pected to affect ⌈1024/5⌉/2

= 6.4 loops, i.e. on av-

erage 6.4 b-digits, that are normally equal to i, will

be set to zero. In order to determine which groups

of ﬁve bits an equation similar to Equation (†) can be

determined for each of the



205



= 2

41.3

possible com-

binations that cover the expected number of groups of

ﬁve bits that have changed.

This is likely to be computationally infeasible

because of the number of possible changes in d,

each of which require the generation of the result of

′

/M mod N. However, this expected number of sig-

natures can be signiﬁcantly reduced if an attacker is

able to divide this process into stages, i.e. make a

change half way through the modular exponentiation

and derive some information, and then repeat the at-

tack and make a change before the exponentiation to

complete the attack for a given value of i.

This attack is still valid if the blinding scheme

described in Algorithm 3 is used, as an attacker can

overwrite some arbitrary M

· t

mod N with a value

equivalent to 1 mod N. No knowledge of t is required

since d and N are not modiﬁed during the blinding

scheme.

This problem can be avoided by using the algo-

rithm proposed in this paper. The attack is still valid,

but an attacker will only be able to determine some

bits of one instance of d

∗

and this will not provide

any information on d.

SECRYPT 2007 - International Conference on Security and Cryptography

196