Runtime Attestation for IAAS Clouds

Jesse Elwell

, Angelo Sapello

, Alexander Polylisher

, Giovanni Di Crescenzo

, Abhrajit Ghosh

Ayumu Kubota

and Takashi Matsunaka

Vencore Labs, Basking Ridge, N.J., U.S.A.

KDDI Research, Saitama, Japan

Keywords:

Security, Virtualization, Cloud Infrastructure, Infrastructure-as-a-Service (IAAS).

Abstract:

We present the RIC (Runtime Attestation for Iaas Clouds) system which uses timing-based attestation to

verify the integrity of a running Xen Hypervisor as well as the guest virtual machines running on top of it. As

part of the RIC system we present a novel attestation technique which includes not only the guest operating

system’s static code and read-only data sections but also the guest OS’ dynamically loadable kernel modules.

These attestations are conducted periodically at run-time to provide a stronger guarantee of correctness than

that offered by load-time veriﬁcation techniques. A system such as RIC can be used in cloud computing

scenarios to verify the environment in which the cloud services ultimately run. Furthermore we offer a method

to decrease the performance impact that this process has on the virtual machines that run the cloud services

since these services often have very strict performance and availability requirements. This scheme effectively

extends the root of trust on the cloud machines from the Xen hypervisor upward to include the guest OS that

runs within each virtual machine. This work represents an important step towards secure cloud computing

platforms which can help cloud providers offer new services that require higher levels of security than are

possible in cloud data centers today.

1 INTRODUCTION

In recent years computer systems have seen a sub-

stantial increase in the number of attacks performed

against them. One reason for this increase is that the

ubiquitous use of these systems for activities such as

banking and e-commerce, handling business and/or

military secrets, processing medical records, etc. of-

fers an environment where an attacker who manages

to successfully compromise the security of these sys-

tems can ﬁnd it quite lucrative. According to (Syman-

tec, 2015) there was a 23% increase in the number of

breaches in 2014 when compared to 2013.

Recently there has been a growing interest in a

technique known as attestation to help protect sys-

tems against modiﬁcation by an attacker. Attestation

is a process in which software that runs on a system

is measured (veriﬁed) to provide assurance of its in-

tegrity to other entities. In the most common case

the software that is critical to the secure operation of

a given system, for example the operating system, is

measured and it’s integrity is attested to another sys-

tem.

While the communication portion of different

attestation systems are similar, there are different

types of veriﬁcation. Among the most popular are

hardware-based veriﬁcation which is typically sup-

ported by the Trusted Platform Module (TPM) (tpm,

2007) and timing-based veriﬁcation which is imple-

mented in software (Seshadri et al., 2005). Hardware-

based veriﬁcation leverages the fact that a dedicated

hardware agent exists to perform measurements (e.g.

the TPM) that attackers cannot tamper with to inﬂu-

ence the measurement process. Timing-based veri-

ﬁcation, as its name suggests, relies on the amount

of time taken to perform a measurement to detect at-

tacks that attempt to change the measurement pro-

cess. In this paper we are primarily concerned with

timing-based veriﬁcation although some of the ideas

presented here may also be applicable to hardware-

based veriﬁcation. The relative merits of the two ap-

proaches are discussed in (Seshadri et al., 2005) and

(Ghosh et al., 2014).

Veriﬁcation can be performed at different times to

handle different types of attacks. For example the

hardware-based veriﬁcation supported by the TPM

performs veriﬁcation once at load-time and is capable

of detecting modiﬁcations made to the binary either

Elwell, J., Sapello, A., Poylisher, A., Di Crescenzo, G., Ghosh, A., Kubota, A. and Matsunaka, T.

Runtime Attestation for IAAS Clouds.

DOI: 10.5220/0006804002330247

In Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018), pages 233-247

ISBN: 978-989-758-295-0

233

before load time (e.g. on disk) or during the load-

ing process. When compared to load-time veriﬁca-

tion, run-time veriﬁcation offers additional protection

against attacks that modify software while it runs, for

example if an attacker adds a malicious kernel mod-

ule to a running OS. Run-time veriﬁcation however

requires some extra care to handle some of the dif-

ferences that can arise even when running the exact

same binary (kernel) on two different machines. For

example dynamic (run-time) linking can place load-

able kernel modules at different addresses, leading to

differing jump/branch targets and references to exter-

nal variables. The RIC system solves this problem

allowing the system to perform periodic run-time ver-

iﬁcation of the Xen hypervisor and the guest operat-

ing system’s kernel and kernel modules in a manner

that accounts for these address changes.

Attestation in any form can add a non-trivial

amount of overhead to a running system. For load-

time attestation this overhead comes from the fact

that measurement must take place before the software

can be run. Run-time attestation on the other hand

can require the software being measured to be paused

prior to and during measurement. For long running

programs the overhead of performing a single load-

time attestation is negligible, however run-time veri-

ﬁcation is more costly even for such programs since

they are measured periodically throughout their life-

time. The principal component of this overhead usu-

ally arises from the need to measure possibly large

areas of memory.

Existing work has presented the basic primitives

necessary to perform timing-based veriﬁcation (Se-

shadri et al., 2005) and to apply timing-based veri-

ﬁcation to a running Xen hypervisor’s kernel for the

purpose of securing virtual machines (VMs) running

cloud services and operating above Xen (Ghosh et al.,

2014). In this paper we assume the XSWAT system

proposed in (Ghosh et al., 2014) as our baseline sys-

tem and present techniques to allow such systems to

measure more of the critical software involved in sup-

porting cloud services including the guest VM kernels

and their corresponding modules. An overview of the

RIC system is depicted in Figure 1. This effectively

extends the system’s root of trust to include the guest

operating system’s kernel and modules. Due to the

increased overhead from measuring the guest VM’s

kernel and modules in addition to the Xen hypervisor

during each attestation we also present a technique for

parallelizing the hashing component of attestation to

minimize the performance impact that this technique

has on the software that runs on the system.

Owing to a lack of Ethernet support in Xen,

XSWAT was devised to attest over a serial link. The

use of serial communication links for attestation on a

large scale is expected to be infeasible owing to the

additional costs involved. To investigate the feasabil-

ity of performing attestation over available Ethernet

infrastructure, a specialized PCI Ethernet card driver

was developed for RIC and its performance studied.

The rest of the paper is organized as follows: Sec-

tion 2 introduces the threat model and assumptions

used throughout this work, Section 3 contains details

about proﬁling the operating systems of guest VMs

in preparation for run-time attestation which is de-

scribed in Section 4, Section 5 explains our work in

speeding up hashing using parallelism, Section 6 de-

scribes the Ethernet driver developed for RIC, Sec-

tion 7 presents the experimental results collected from

the RIC system, Section 8 discusses related work and

Section 9 offers concluding remarks. Finally, a formal

proof of the security of the parallel hashing scheme

presented in Section 5 is offered in Appendix 1.

2 THREAT MODEL &

ASSUMPTIONS

The threat model used in this work is similar to that

of that of the XSWAT system with a notable excep-

tion. While the XSWAT system assumes that an at-

tacker has network access to and can compromise

guest OSes (including the privileged Domain 0), the

system discussed in this work is designed to detect

these guest OS compromises. As such we assume

only that the attacker has network access to the cloud

machine. RIC is designed to detect attacks that com-

promise a guest VM’s OS. This in turn helps to protect

the hypervisor by detecting attacks such as VM es-

cape (Wojtczuk, 2008) that rely on modifying a guest

VM’s OS to launch an attack against the hypervisor.

Effectively, the RIC system limits an attacker’s abil-

ity to place attack code in any software layer other

than the guest VM’s user-space.A malicious guest OS

can still be used to attack the hypervisor, however to

avoid detection the attacker cannot rely on modifying

the guest VM’s OS kernel to do so. For example, the

attacker could utilize a Return Oriented Programming

(ROP) (Shacham, 2007) attack against the guest OS

to execute code with OS privilege to launch an attack

against the hypervisor. Attacks such as ROP attacks

launched against the guest VMs or the hypervisor are

not covered in our threat model, detection and/or mit-

igation of such attacks is left for future work.

The proposed system also defends against multi-

processor attacks albeit in a different way than the

XSWAT system. Rather than halting all but the boot-

strap processor during an attestation request, the pre-

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

234

Verifier

Remote

Attestation

Protocol

Verifier Machine

Hardware

Attestation

System

Hypervisor

Code

Hypervisor

Guest OS

Cloud Service

Cloud Machine

Figure 1: RIC Attestation System Overview.

viously unused processors are now utilized to speed

up the hash computation. This means that those pro-

cessors continue to be unavailable for use by an at-

tacker which is guaranteed and veriﬁed as part of the

attestation protocol described in Section 4.

As in (Seshadri et al., 2005; Ghosh et al., 2014)

we assume that the attacker does not have physical

access to the cloud machine and thus cannot tamper

with the hardware for example by replacing the CPU

with a faster one. We believe that this is a reason-

able assumption to make since cloud providers pre-

sumably have physical security measures in place in

the interest of protecting their reputation as a secure

cloud provider.

The process of proﬁling VMs, described in Sec-

tion 3, presents a race condition in that an attesta-

tion request that completes successfully attests that

the VM kernel’s code and read-only data have not

been modiﬁed since it was proﬁled. However if an

attacker is able to modify the VM’s kernel code or

read-only data before the VM is proﬁled then subse-

quent veriﬁcation requests will succeed even though

the attack code may already be in place. Therefore

we assume that the VMs are proﬁled before the at-

tacker gains access to them, for example by proﬁling

VMs before the cloud machine is connected to a pub-

lic network.

Attacks that only require access to the user-level

application that offers a cloud service are not covered

in our threat model. An example of such an attack

would be an SQL injection attack launched against a

website that uses an SQL database as a backend to

store data.

3 PROFILING A VIRTUAL

MACHINE’S OPERATING

SYSTEM

In this section we provide a detailed explanation of

the improvements made to the XSWAT system that

allow a guest VM’s OS kernel and modules to be mea-

sured in addition to the Xen hypervisor’s kernel.

In order to measure (i.e. hash) a running OS, a

strategy is needed to hash the various parts of that

OS. Similarly to the measurement of the hypervisor,

the sections of the OS that can be reliably measured

are the code and read-only data sections. Measuring

the code and read-only sections of the kernel proper

is complicated by the existence of structures such

as SMP locks and jump tables. Additionally, kernel

modules which are dynamically inserted into a kernel

at run-time, run with the same privilege as the ker-

nel proper. This means that if the modules are left

un-measured, an attacker could use a malicious ker-

nel module to perform an attack. Similar to the kernel

proper, each module has its own code and read-only

data sections, which means that we can measure those

sections to attest the correctness of the module.

When dealing with the modules an additional

complication arises and that is that kernel modules

must be relocatable so they can be ﬂexibly inserted

into the kernel’s address space alongside other ker-

nel modules. Due to this fact, kernel modules must

be compiled in a position independent manner. This

means that branch/jump targets and references to ex-

ternal symbols must be left in a state that allows them

to be adjusted when they are loaded (insertion time

for kernel modules). The challenge here is that ideally

we would like to measure the module’s code and read-

only data sections, including branch/jump targets and

external references, in such a way that the measure-

ment is reproducible on the veriﬁer machine.

To solve this problem we design and make use of

a VM proﬁler that generates a signature of a given

VM’s kernel and modules that is used during run-time

to attest the correctness of the VM’s OS. The follow-

ing subsections explain in detail how the VM proﬁler

works.

Runtime Attestation for IAAS Clouds

235

3.1 Obtaining VM Data

The proﬁler can work in an online mode or an of-

ﬂine mode. In either case it needs the state of the

VM while it is running with all modules it might en-

counter during normal operation loaded. In online

mode the proﬁler uses the name of the VM to request

pages of memory from the VM via the Xen hyper-

visor. In ofﬂine mode the proﬁler uses a core-dump

of the VM obtained by the user to obtain information

about the VM. This is the safer mode of operation

since it avoids any potential race conditions. Since it

cannot be core-dumped, Domain-0 can only be pro-

ﬁled in online mode.

3.2 Proﬁling the Static Sections of the

Kernel

The proﬁler ﬁrst needs to proﬁle the static portions

of the VMs kernel. That is the code and read-only

data of the kernel. Since the Linux kernel can load

itself to different addresses and the kernel can vary in

location and size of the code and read-only data sec-

tions, the proﬁler needs information about the layout

of the kernel. This is obtained via the System.map ﬁle

produced during the compilation of the kernel. Now

knowing the start and end of the code and read-only

data sections of the kernel, the proﬁler can request

the appropriate pages of memory from the running

VM (or core ﬁle), make adjustments for SMP locks

and the jump table (see Section 3.3.3) and produce a

SHA-1 hash of these sections. As an optimization for

signature matching, a SHA-1 hash is also produced

of the ﬁrst page of the read-only data section of the

kernel to act as a thumb print to select potential sig-

nature candidates during attestation. This ﬁrst page

happens to contain version numbers and build infor-

mation about the kernel making it very likely to be a

distinguishing characteristic of the VM. Our choice of

the SHA-1 hashing algorithm is explained further in

Section 5.

3.3 Proﬁling the Modules

Once the static portions of the kernel have been pro-

ﬁled, the proﬁler moves on to generating a proﬁle of

the kernel modules. In order to access modules and

module information the proﬁler needs additional in-

formation about the kernel layout. In particular it

needs to obtain the head pointer of the module list

whose location is speciﬁed in the the System.map

ﬁle. In addition to this pointer, the layout of the

module structure is needed. Unfortunately this is not

maintained in any distributed ﬁle and must be ob-

tained from the kernel source. For most kernels a

generic source can be downloaded from kernel.org

since the module structure rarely changes. Once

found the source will need to be conﬁgured in the

same way as the loaded kernel. The proﬁler will auto-

matically perform this step if the kernel conﬁguration

ﬁle is provided. Since parsing kernel source can be

quite complicated and the kernel build system is al-

ready designed to do this, the proﬁler compiles a tiny,

simple program called dump locs against the kernel

source to get the required information.

Once the the module list and module information

is obtained a further complication exists. Namely,

each time a VM boots it may load modules at differ-

ent locations and in a different order. Given this need,

modules must use relocatable code. In the module

ﬁle is a relocation table that tells the kernel module

loader what parts of the static kernel, other modules

and even the module itself, to which the module will

need references. Further, these references may be ab-

solute or relative and 32-bits or 64-bits. Absolute ref-

erences to itself or other modules and relative refer-

ences to the kernel or other modules may change on

each boot. Therefore, the proﬁler needs information

about these relocations to properly proﬁle the mod-

ules. This means the proﬁler must have access to the

module ﬁles from the VM’s drive.

3.3.1 Understanding Relocatable Code and the

Relocation Table

The relocation table consists of entries of the follow-

ing format:

Section Location Type Symbol Reference

The section ﬁeld is the section of the module

where the relocation needs to go (i.e. ”.text”,

”.rodata”, etc.). The location ﬁeld is the offset

within the section where the relocation should be

placed. The type ﬁeld is the type of relocation to per-

form and is one of: 32-bit signed absolute reference,

64-bit absolute reference, 32-bit relative reference or

64-bit relative reference. Finally the symbol reference

ﬁeld is a description of the symbol to which the relo-

cation should reference speciﬁed as a symbol name

and offset (such as printk-0x00000004).

Each symbol exported by the kernel and other

modules is maintained in a list in the kernel and has a

unique name. Both exported and local symbols in the

module itself are listed in the module ﬁle. The Linux

module loader uses these lists to calculate the correct

reference type to the requested symbol and place the

reference in the module at the desired location.

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

236

Table 1: Relocation Offsets

Kernel Self Other Module

Relative +self.base addr no adjustment +self.base addr - mod.base addr

Absolute no adjustment -self.base addr -mod.base addr

3.3.2 Dealing with Relocations

The issue with relocations is that with each reboot of

the VM a simple hash of the module’s code and read-

only data sections would change. To overcome this

there are a couple of options. One would be to sim-

ply zero out all the relocatable references. Although

fairly straight forward, this presents a potential secu-

rity hole as these relocations are often calls to code in

other modules and the static kernel. By zeroing out

the relocations, an attacker could modify a module to

reference its own code and the VM attestation code

would not be able to detect this change. Therefore,

we use a more complex solution of adjusting all re-

locations in the module in such a way that it would

appear that the module and any module it references

were loaded at address 0. In this way the informa-

tion about which module was being referenced and

the offset of the reference is preserved.

Rather than attempting to look up relocations we

take advantage of the fact that the module loader has

already done this for us. We know the base address

of each module (from the module list parsing) and we

know where the kernel starts. To determine what a re-

location maps to we simply ﬁnd the relocation in the

VMs loaded module code, adjust for the relative posi-

tion in the case of relative relocations, and determine

whose memory space the relocation points to. We

store which module (or kernel) this relocation points

to then use Table 1 to adjust it.

As you can see, some relocations require no ad-

justment. Therefore, as an optimization we delete

these relocations from our list so they need not be

considered at attestation time.

The information stored to the proﬁle has the fol-

lowing format:

Location Size Type Reference Number

Location is the offset into the module’s memory at

which the relocation is performed. Size indicates how

many bytes the relocation spans (typically 4 for 32-bit

or 8 for 64-bit, but other values are possible as indi-

cated in the next section. Type indicates whether this

is absolute or relative (or a special case of zeroing out

as described in Section 3.3.3). Reference Number is a

number indicating which module the relocation points

to (possibly itself or a special value for the kernel).

It may seem that an attacker could exploit this

method by pointing the relocation to the same off-

Module A/B

Module A

Module B

Module B’

0x0000

0x0800

0x1000

0x1200

0x2000

0x2200

+0x0A00

+0x1A00

0x0000

0x0800

0x0200

-0x0600

0x1200

+0x0A00

Figure 2: Failed Exploitation of Module Relocation.

set in a different module (most likely its own) since

the base address of the referenced module is removed.

However, since the adjustment is made both at proﬁl-

ing time and attestation time and at attestation time

the base address removed is based on the module ref-

erence number, pointing to a different module will

still give a different result and cause the attack to be

detected. Furthermore, the addition of an unproﬁled

module will raise a ﬂag in the attestation code and

cause attestation to fail. Therefore even though we

make it appear as though all modules were loaded at

address 0 to get a consistent hash, attestation is still

just as rigorous as it would be if we left the reloca-

tions alone.

As a more concrete example see Figure 2. In this

ﬁgure module A is loaded at address 0x0000 and has a

reference at offset 0x0800 to code at offset 0x0200 in

module B which is loaded at address 0x1000. An at-

tacker has loaded module B’ to address 0x2000 which

has attack code located at offset 0x0200. The attacker

hopes that by removing the base address his new re-

location will look identical to the valid one. However,

since the attestation code removes the base address of

module B, while the valid result should be -0x0600,

the attacked relocation gives a result of +0x0A00 and

the hash of module A will be incorrect.

3.3.3 Other Issues with Relocations and

Solutions

One ﬁnal set of issues comes with adjusting reloca-

tions on loaded modules. One is that the referenced

code or data may have been unloaded already. An-

other is that the relocation may have been overwritten

by code patching that occurred during module loading

and further relocation may have moved with a patch.

Relative Text Relocations: Relative relocations

Runtime Attestation for IAAS Clouds

237

are always relative to the address of the relocation.

However, in the x86 architecture relative jumps and

calls are relative to the next program counter address.

For this reason almost all relative relocations in the

text section of a module are of the form symbol-4.

An edge case exists when the relocation is to the ﬁrst

possible address in the target in which case the pro-

ﬁler would, without special consideration, incorrectly

identify which module is being pointed to and iden-

tify the previous memory space instead. To overcome

this we add 4 to all text relocations. This may seem to

create a different edge case, but in practice the orig-

inal problem is observed often, while the new edge

case does not appear to happen.

References to Unloaded Sections: Once a mod-

ule has ﬁnished loading and has been initialized, the

kernel unloads all .init.* sections. As a result any

relocations to this code or data will be broken and

cause the proﬁler to misidentify the relocation. Since

these relocations are not used after module initializa-

tion (this must be true or the module would be using a

broken pointer), we simply look at the symbol name

in the relocation table and delete (zero out) any relo-

cation to a symbol starting with .init.

Alternate Instructions: To allow ﬂexibility of

the kernel and modules to be loaded on a number

of different x86 processors but at the same time al-

low that same code to take advantage of advanced

processor features, the Linux kernel uses a system

of alternative instruction patching. That is, the main

code section contains the most generic implementa-

tion that is guaranteed to work on all x86-64 proces-

sors, then in a separate section more processor spe-

ciﬁc code to replace that code with exists. These re-

placements along with a table specifying what proces-

sor features to look for to do the replacements allows

the kernel to patch a module at load-time to be as ef-

ﬁcient as possible. This isn’t a problem for the static

kernel since the replacements are always the same for

the same hardware. However, for modules there may

have been relocations in the replaced code and in the

replacement code. Again, rather than try to identify

the processor features and do the patching ourselves,

we take advantage of the fact that the kernel has al-

ready done this. We take each piece of potential re-

placement code in the module ﬁle and adjust the ref-

erences for the possible end location of this replace-

ment. Then we compare the replacement code to the

code found in the loaded module’s text section. If it

matches, then the replacement must have been per-

formed and we should delete any relocations in the

original code range and remap any relocations from

the replacement code into this range.

Paravirtualized Instructions: Similar to the al-

ternative instruction patching more recent Linux ker-

nels contain native bare-metal hardware code in the

main text section and can patch that code with calls

to a hypervisor if paravirtualization is being utilized.

However, unlike the alternative instructions where the

replacement is packaged with the module. The par-

avitualized operation (PV-op) patches are packaged in

the kernel itself and the module only indicates which

PV-op should go where. Due to the complexity of this

system we check to see if the original code is present

and if not mark the code for zeroization at attestation

time.

SMP Locks: On multi-core processors the lock

opcode preﬁx causes a performance penalty even if

only a single core is currently running. Therefore,

as an optimization the Linux kernel and its modules

specify a table of where these lock preﬁxes exist in the

code. If the system is in single processor/core mode,

then these lock preﬁxes are replaced with no-op in-

structions. This is especially concerning to our envi-

ronment as VMs may be started with a variable num-

ber of VCPUs. To deal with this, we simply verify

that the bytes whose addresses are listed in the SMP

locks table are either a lock preﬁx (0xf0) or a no-op

instruction (0x90) and then zero out the byte.

Jump Table: Originally intented to optimize the

removal of debug code when the kernel is not in debug

mode, the kernel jump table speciﬁes the locations of

conditional branch points where the condition vari-

able changes very infrequently. Rather than using a

conditional branch instruction in the code, the com-

piler writes an unconditional jump or no-op instruc-

tions based on the initial value of the condition vari-

able. The condition variable is changed using a func-

tion call which identiﬁes the affected branch points

and rewrites the instruction(s). We deal with this in a

way similar to SMP locks. We ﬁrst verify that the tar-

geted code is either a valid no-op sequence or a jump

to the correct location. Once veriﬁed it is zeroed out

to ensure a consistent hash result.

4 RUN-TIME ATTESTATION

In this section we provide details of how the VM pro-

ﬁles described in the previous section are incorpo-

rated into the run-time attestation protocol. The basic

idea of the attestation protocol is similar to that of the

Pioneer (Seshadri et al., 2005) and XSWAT (Ghosh

et al., 2014) systems. The attestation protocol is

shown in Figure 3 and it works as follows.

First the VM proﬁles are sent to both the veriﬁer

and the cloud machine. On the veriﬁer machine a pro-

gram is used to convert each individual VM’s proﬁle

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

238

Checksum code

Send function

Hash function

Verification function

Verifier

VM type

profiles

Verifier Machine

Checksum code

Send function

Hash function

Verification function

Cloud Machine

Hypervisor

Hypervisor code and

static data

kernel

Kernel

module 1

Kernel

module k

Challenge

Checksum

Hypervisor

Hash

VM Hash

Hash Hypervisor

and Profiles

Hash VMs

minimal VM

type profiles

Self Checksum

Figure 3: RIC Attestation Protocol.

into a signature after which the signatures of all the

VMs (including Dom-0) that will run on the cloud

machine are stored in a ﬁle. On the cloud machine

each proﬁle undergoes this conversion to a signature

as part of the process of loading it into the memory

space of the Xen hypervisor. In addition to the VM

proﬁles, in order for the veriﬁer to compute a known

correct hash of the hypervisor the veriﬁer requires the

xen-syms ﬁle that is generated during compilation of

Xen.

Attestations are initiated by the trusted veriﬁer

machine shown on the left side of Figure 3 who sends

a challenge to the cloud machine which is to be at-

tested. This challenge consists of a random nonce to

be incorporated into the self-checksum performed by

the cloud machine which is described below. The ver-

iﬁcation program that is run on the veriﬁer machine

is supplied with a maximum wait time and chooses a

uniform random value between one and this speciﬁed

maximum number of seconds to wait between attesta-

tion requests. This process assures than attacker can-

not predict when attestations will be performed thus

making it difﬁcult to hide evidence of an attack be-

forehand.

The cloud machine begins by ﬁrst disabling all

maskable interrupts as well as replacing the non-

maskable interrupt (NMI) handler with a handler that

contains only a return instruction. This is done to

ensure that an attacker has no means to hijack one

of the cores during attestations and essentially leaves

the attestation code in complete control of the cloud

machine for the duration of the attestation. Next,

the cloud machine performs a self-checksum over

the veriﬁcation function which contains the critical

code necessary to carry out the rest of the attestation.

In addition to the veriﬁcaiton function other critical

pieces are also included in the checksum, including

the newly replaced NMI handler. This ensures that if

an attacker attempts to place the attack code in a NMI

handler and later trigger an NMI, this change will be

detected by the resulting checksum mismatch. The

veriﬁcation function includes the checksum code it-

self, the send function used to communicate with the

veriﬁer, and the hash function that is used to measure

the rest of the software. Furthermore the challenge

that was sent to the cloud machine from the veriﬁer is

also included in the checksum to eliminate the possi-

bility of a replay attacks. The self-checksum provides

a root of trust in software on the cloud machine which

is extended further upward through the software stack

by the rest of the attestation process. Once the check-

sum has been completed by the cloud machine the re-

sult is sent to the veriﬁer using the send function. The

veriﬁer having the challenge and a correct copy of the

veriﬁcation function computes the same checksum to

compare against the result it receives from the cloud

machine.

The ﬁnal portion of the attestation protocol is per-

forming the hashes that attest the correctness of the

hypervisor and guest OSes. Since the hashes do not

change for a speciﬁc Xen hypervisor binary and the

corresponding VMs that will run on top of it, the ver-

iﬁer’s verify program can compute all of the neces-

sary hashes once during initialization and store them

for later comparisons. The cloud machine on the other

hand must compute a set of fresh hashes during each

attestation. The ﬁrst item to be hashed is the hypervi-

sor and includes its code section, read-only data sec-

tion and additionally the VM signatures which reside

in Xen’s address space. Once this hash is ﬁnished,

it is sent via the send function to the veriﬁer. De-

pending on the performance budget allocated to attes-

tation and the number of running VMs either all of

the VMs or a randomly selected subset of them are to

Runtime Attestation for IAAS Clouds

239

be hashed next. For each selected VM, the guest’s OS

kernel code and read-only data sections are hashed.

Then for each module that is currently running inside

that guest VM, the relocation adjustment described in

the previous section is performed and then a hash is

computed over the module’s memory space. It should

be noted that these relocations and the resulting hash-

ing are performed on a fresh copy of the VM mod-

ule content to avoid having to undo all of the mod-

iﬁcations once the hash is ﬁnished. Once the attes-

tation request has been completed the cloud machine

cleans up by restoring the normal NMI handler and

re-enabling maskable interrupts. Finally, the trusted

veriﬁer can compare the checksum and hashes that it

received from the cloud machine to the copies that it

has computed locally to detect any mismatches and

thus report them to the cloud owners.

5 PARALLELIZING HASH

COMPUTATIONS

In this section we provide the details of our work in

speeding up the RIC attestation process by paralleliz-

ing the hash computations that are utilized. Due to

the fact that the guest OS is now hashed in addition to

the Xen hypervisor the amount of time spent hashing

during each attestation request represents the major-

ity of the time taken to complete each attestation. The

original XSWAT system doesn’t allow an attacker to

utilize extra processors by halting all but the bootstrap

processor (which performs all the computations se-

quentially) during requests so all but one of the pro-

cessors in a multi-core system are unused. Making

use of these otherwise idle processors comes with no

loss of security so long as the attacker is not allowed

to use those processors to hide evidence of an attack

during attestation, which is guaranteed by the attesta-

tion protocol as discussed in Section 4.

5.1 Chunk-based Hashing

We call the general idea of our hashing scheme chunk-

based hashing. Our prototype uses the SHA-1 hash-

ing algorithm, which is inherently a sequential hash-

ing algorithm due to its ”chained” nature. The output

of each hash block is used as an input to the hash of

the next block. This sequential process is depicted at

the top of Figure 4. In this work we chose the SHA-1

hashing algorithm due to export control restrictions.

However chunk-based hashing can be adapted to use

other hash functions such as SHA-2 or SHA-3 which

are considered to be more secure. The speciﬁc choice

of a hashing algorithm is not central to the ideas pre-

sented in this work and a cloud provider wishing to

implement this system should choose a hashing al-

gorithm whose strength (i.e. its collision-resistance

property) ﬁts their needs.

Chunk-based hashing in contrast to the chained

serial hashing works as follows. Each block of the

input data is treated as a separate block of data and

hashed individually. Assuming that the input data

contains N blocks (64 bytes each for SHA-1) this pro-

duces N output hashes (20 bytes each). These out-

put hashes, which we refer to as intermediate hashes,

are then concatenated and treated as the input to the

next level of hashing. Similarly to the top level, the

intermediate hashes are divided into M blocks (with

M < N) and again each is hashed individually pro-

ducing M output hashes. This process continues until

a level consists of a single block of input data and thus

results in a single hash. An example of this process

that shows a single level of this hashing scheme is de-

picted at the bottom of Figure 4. This hashing scheme

readily allows parallelization of the hash computa-

tions since a separate hash is computed for each input

block, which does not depend on any other block(s).

Assuming the use of a cryptographically secure

hash function this construct is no less secure than

hashing sequentially. In this model the attacker only

has the ability to modify the initial input data and is

not able to directly manipulate the intermediate hash

results. Intuitively if the attacker wants to cause a col-

lision in the ﬁnal hash, which we will call the N

level hash, they can cause a collision in any of the

N − 1 levels below the N

level hash. In theory this

presents more opportunity for the attacker to cause

a collision in the ﬁnal hash, however as previously

mentioned the attacker can only modify the initial in-

put data. This means that the attacker needs to make

some modiﬁcation to the initial input data that results

in either an immediate collision in the ﬁrst level of

hashes or an output value that will cause a collision in

the next level(s) of hashes. Given a cryptographically

secure hashing algorithm which includes a collision

resistance property, ﬁnding even an immediate colli-

sion in the ﬁrst level of hashing is difﬁcult. Further-

more since the attacker only has indirect control over

the inputs to the rest of the levels of the hash causing

a collision in these levels is also quite difﬁcult. For a

more rigorous proof of the security of this approach

see Appendix 1.

5.2 Coordinating Hashes

The chunk-based hashing design presented in the pre-

vious subsection is coordinated using a lock-less work

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

240

Hash Hash

Hash

state

state output

Message block 1 Message block 2 Message block N

……

Hash

Input block 1

Input block 2

……

Concatenate

Input block 1

……

Hash

Input block N-1

Input block N

……

Concatenate

Input block M

Figure 4: Sequential vs Parallel Hashing.

queue. The work queue is managed by a single mas-

ter thread and hashes are carried out by a number of

worker threads that watch the queue for items to hash,

hash them, and write the resulting hashes into a buffer.

The work queue consists of two arrays, one to hold

work queue items and one to hold the results of the in-

termediate hashes. Each work queue item represents

a single hash to be performed and includes all of the

necessary data required to perform the hash such as

the location of the input data, where to put the hash

output, the size of the input etc. The necessary ca-

pacity of the work queue can be computed during its

initialization based on the maximum amount of data

that it needs to be able to process. This means that the

arrays can be allocated once during initialization time

rather than at the beginning of each hash calculation.

It is necessary to keep track of the number of

workers currently working on each item to avoid use-

after-free errors that arise from freeing buffers that

hold input data while hashes of that data are being

computed. Our implementation uses the atomic in-

crement and decrement instructions offered by the

x86-64 architecture to update the worker count safely

across multiple processors.

After the master thread sets up each work queue

item they can be in one of the following three states:

The UNCLAIMED state which means that it is ready

to be picked up and handled by one or more work-

ers. When a worker picks up an item the worker

changes its state to CLAIMED (and increments the

worker count). Finally when a worker completes the

hash denoted by an item it transitions to the COMPLETE

state to signal that the corresponding result is ready.

To avoid all of the workers repeatedly picking up

the same queue item to hash, workers start their search

for an item to hash based on the processor ID of

the processor on which they execute. If the worker

checks all of the items and does not ﬁnd one in the

UNCLAIMED state and it has been conﬁgured to do so

(at compile time) it begins its search again this time

searching for items in the CLAIMED state. This offers

some possible redundancy in the computation of each

item’s hash. There are number of reasons why redun-

dancy might provide an advantage. The ﬁrst would be

that it offers fault tolerance. If for any reason a worker

is unable to complete the hash it will be completed by

another worker. This may also provide a performance

boost in the presence of heterogeneous cores, for ex-

ample ARM’s big.LITTLE technology (Greenhalgh,

2011) or Dynamic Voltage and Frequency Scaling

(DVFS) (Semeraro et al., 2002; Le Sueur and Heiser,

2010) where a slower core may pick up an item ﬁrst

but a faster core that becomes available later may be

able to ﬁnish it faster. Due to the lock-less design of

the work queue, even though it is not explicitly al-

lowed, more than one worker can process the same

hash item due to race conditions in setting and read-

ing a queue item’s state. While this does happen it

is rather infrequent. It should be noted that the ab-

sence of locking does not have an effect on the cor-

rectness of the output and corresponding change of

state to COMPLETE when an item is ﬁnished since the

result being written and the state change are identical

for any workers who happen to write them simultane-

ously.

5.3 Improvements and Optimizations

The ﬁrst improvement to the above design addresses

two issues and this improvement is where the name

chunk-based hashing comes from. The ﬁrst issue is

the amount of extra data that needs to be hashed due

to this scheme. This corresponds directly to the num-

ber of intermediate hashes. For a 64MB ﬁle the in-

crease in the amount of data that needs to be hashed

is roughly 1.5X for a total of 100MB (64MB of input

Runtime Attestation for IAAS Clouds

241

10%

15%

20%

25%

30%

35%

40%

45%

50%

1 2 4 8 16 32 64 128 256 512 1024

Hashed Data Size Increase (%)

Chunk Size (blocks)

Figure 5: Data Size Increase vs Chunk Size.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

1 2 4 8 16 32 64 128 256 512 1024

Speedup

Chunk Size (blocks)

-1.4

-20.8

-193.9

-1965.4

Figure 6: Speedup vs Chunk Size.

data and 32MB of intermediate hashes). In addition

to the fact that more data needs to be hashed over-

all, each call to the SHA-1 hashing function comes

with various setup and tear down overheads. If this

setup and tear down cost is incurred for each individ-

ual block of input data and each block of intermediate

hashes the overall performance degrades quite signif-

icantly. Indeed if each queue item handles only a sin-

gle SHA-1 block a performance loss of about 2000X

is observed for a 64MB ﬁle when compared to the

standard sequential version.

Our solution to this problem is, rather than each

queue item handling a single 64 byte block, they in-

stead are each assigned a chunk consisting of multiple

blocks. Figures 5 and 6 show the reduction in added

data and speedup attained compared to the sequential

hashing respectively as the chunk size is increased.

The optimal number of blocks in each chunk is a func-

tion of the size of the input data and the number of

processors available. To this end the work queue dy-

namically determines the optimal chunk size at each

level given these two values using Equation 1. The

size of each chunk is then computed using this value

and the size of the input data.

chunks =

data size

2 × NUM CPUs

(1)

The second improvement arises from the observa-

tion that if the master thread only performs queue op-

erations, it spends a signiﬁcant amount of time wait-

ing for the workers to ﬁnish their work before it can

insert more items into the queue or declare the hash

complete. Addressing this problem is relatively sim-

ple and consists of allowing the master thread to help

computing hashes between queue management oper-

ations.

The ﬁnal optimization that we implemented is

similar in nature to an optimization commonly made

to the quick sort sorting algorithm. It is common in

quick sort implementations to use a different sorting

algorithm (such as insertion sort) when the partitions

become smaller than a certain size. In this case, when

the input data is sufﬁciently small, the regular se-

quential hashing technique out performs the parallel

version. We experimentally determined the point at

which the parallel version actually out performs the

regular implementation for the speciﬁc testing ma-

chine that we used to be 16KB of input data. This

information is used in two cases: 1.) if the initial data

passed to our algorithm is less than 16KB the work

queue isn’t used at all and the data is simply hashed

sequentially and 2.) when the intermediate hashes at

a given level are smaller than 16KB the master thread

signals to the workers that the hash is ﬁnished and

computes the ﬁnal hash sequentially.

6 ETHERNET DRIVER

XSWAT was devised to perform attestation over a se-

rial link due to a lack of Ethernet support in Xen.

While use of serial communication links may be in-

feasible on a large scale, most cloud operator environ-

ments make use of Ethernet based management net-

works that could be leveraged for attestation. Towards

this end, a specialized PCI Ethernet driver was de-

veloped for the Xen hypervisor and incorporated into

RIC. The driver handles hardware interrupts from the

Ethernet card, and, on the upstream path, either iden-

tiﬁes a frame as a RIC frame and handles it, or passes

the frame to the VM (usually Xen’s Domain 0) con-

taining the appropriate Ethernet drivers. No down-

stream interference from the VMs is expected since

all VMs are stopped when attestation is in progress.

Linux code for the Ethernet driver was adapted for

this purpose.

An incoming attestation request generates an in-

terrupt that is caught by the RIC driver which con-

ﬁrms that the source of the interrupt is in fact the Net-

work Interface Card (NIC). If it is further determined

that the interrupt signaled a receive (RX) event, Xen’s

Domain 0 VM is paused. The NIC registers are read

for the address of the packet in Domain 0 memory,

the packet is read from this address and is matched

against the byte signature for an attestation request.

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

242

In case there is no match, Domain 0 is unpaused oth-

erwise the attestation code is triggered.

Once the attestation request processing is com-

pleted at the cloud machine, RIC then initiates the

process of creating an outbound attestation response.

Attestation response data is inserted into an outbound

packet which needs to be inserted into DMA accessi-

ble memory. Since it is hard to allocate DMA mem-

ory from Xen, an existing packet in DMA memory is

saved along with the state of the Transmission (TX)

ring buffer and is then overwritten with the outbound

attestation response. Once the response packet has

been transmitted using DMA, the saved packet is re-

stored, the TX ring buffer is restored to its previous

state and Domain 0 is unpaused.

Both inbound attestation request processing and

outbound attestation response processing are de-

signed to have low overheads to reduce the possibility

of introducing any jitter to the checksum process. The

evaluation results presented in Section 7 show that

no signiﬁcant jitter is introduced by the RIC Ethernet

driver.

7 EXPERIMENTAL RESULTS

In this section we present the results of our experi-

ments in which we quantify the overhead introduced

by hashing VMs in addition to the Xen hypervisor and

the savings provided by utilizing the parallel hash-

ing technique to perform these hashes. The machines

used to capture our performance results are Dell R620

Xeon systems with 12 cores. Each core supports two

hyperthreads resulting in a total of 24 total hyper-

threads. The cloud machine and the veriﬁer are con-

nected using ethernet via two switches, one for man-

agement tasks (including the attestation trafﬁc) and

one for non-management tasks. The testbed setup is

shown in Figure 7. A serial link between the veriﬁer

and the cloud machine was also used to run attestation

transactions for the purpose of comparison against at-

testation performance over Ethernet.

7.1 Parallel Hashing Results

To measure the performance gained by using the par-

allel hashing scheme without any of the additional

overheads from performing relocations etc. we ran a

user-space implementation of the parallel hashing al-

gorithm. The user-space implementation reads a ﬁle

ﬁlled with random data into memory, launches the

workers as pthreads (in the parallel case), and then

measures the time that it takes to hash the contents of

the ﬁle. For these tests we utilized all of the logical

snmp-client

em1

verifier

cloud machine

em2

Management Switch

Non-Management Switch

eth0

em1

br0

Dom0

RIC

com1

eth1

DomU

Figure 7: RIC Testbed Setup.

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB 8MB 16MB 32MB

Time (s)

Data Size

Sequential Parallel

Figure 8: Parallel vs Sequential: Timing for Various Data

Sizes.

4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB 8MB 16MB 32MB

Speedup

Data Size

Figure 9: Parallel vs Sequential: Speedup.

processors (hyperthreads) by using 23 worker threads

and the single master thread. These measurements do

not include the initialization of the work queue since

that action takes place only once during boot of the

Xen hypervisor and is reused from one hash to an-

other. The work queue however does need to be reset

after each hash and this overhead is included in the

timings presented. Figure 8 shows the amount of time

it takes the sequential and parallel hashing schemes

to complete a hash of the given size for increasing

data sizes. Figure 9 shows the speedup achieved by

the parallel version when compared to the sequen-

tial version. Each data point in both graphs repre-

sents the average of 100 tests. As can be seen from

the graphs more data being hashed results in a larger

speedup for the parallel version which levels off at

Runtime Attestation for IAAS Clouds

243

around the 10X speedup mark. This 10X speedup

mark is ﬁrst attained with hashes around the 1-2MB

mark in size. We believe that the reason that the maxi-

mum speedup achieved is not closer to 24X (the num-

ber of hyperthreads) is that the hyperthreads are do-

ing identical work and thus there is a large amount of

contention for micro-architectural resources that are

shared amongst the hyperthreads that share the same

physical core. To conﬁrm this we disabled hyper-

threading in the BIOS options and reran the experi-

ments. The results were very similar to the results

with hyperthreading enabled.

We also analyzed how using the parallel hash-

ing technique affects the RIC attestation process as

a whole. Particularly we were interested in whether

or not the conditions of the hashes in the attestation

protocol are suitable to achieve the 10X speedup that

we observed for the larger data sizes above.

As can be seen in Table 2 the checksum time was

not affected by the changes to the hashing algorithm

as expected. The hypervisor hash experiences an op-

timal speedup of about 10X. The VM hash does not

attain quite as much speedup as the hypervisor for a

few reasons. First, due to the way timings are taken,

the VM hash times also include the work of perform-

ing relocations as described in Section 3. In addition

to this the VM hashing is less ideal than the hyper-

visor hash in that the hypervisor hash consists of a

small number of larger hashes, while the VM contains

a larger number of hashes some of which are as small

as 4KB.

7.2 Ethernet Attestation Results

The performance of attestation over Ethernet was

compared against that of attestation over a serial com-

munication link in the context of a memory copy at-

tack. In a memory copy attack, an adversary main-

tains a copy of the unmodiﬁed Attestation System in

memory while using a malicious version of the code

to compute the self checksum. While servicing an at-

testation request, the malicious code uses the unmodi-

ﬁed code to compute the correct self checksum for the

attestation system while hiding any evidence of cloud

software environment tampering. We refer the reader

to (Seshadri et al., 2005) for additional details.

The goal of this set of experiments was to de-

termine whether conducting attestation over Ether-

net would impact (a) the jitter of the self checksum

process and therefore (b) cause false negatives in the

face of a memory copy attack. In each of the exper-

iments described in this section, 100 attestation re-

quests were successfully completed to compute each

data point. The average interval between successive

attestation requests was set to 1 second. The num-

ber of checksum loop iterations was varied between

1,024,000 and 2,048,000 and exactly one randomly

selected VM was hashed during the attestation pro-

cess. The set of candidate VMs for hashing included

Domain 0 and benchmark applications were run on

each VM other than the Domain 0 VM. Each VM was

allocated 2GB RAM and assigned 1 CPU except for

Domain 0 which was assigned 2 CPUs. A total of 11

VMs were run on the cloud machine.

Table 3 shows checksum and attestation times

over Ethernet and Serial interfaces. In all cases,

the time taken over Serial is somewhat higher than

that over Ethernet due to the former being somewhat

slower than the latter. In all cases the checksum jitter

is comparable across the two interface types.

The table also shows the time taken to execute a

memory copy attack for a given number of checksum

loop iterations. There is a clear separation between

the checksum time and the time taken to conduct a

memory copy attack indicating that Ethernet based

attestation is fairly robust against these attacks. Self

checksum performance was also assessed in the pres-

ence of cross trafﬁc representative of network man-

agement applications. This cross trafﬁc was generated

by using an SNMP v2 client running on the snmp-

client machine shown in Figure 7 sending trafﬁc to

an SNMP server running in Domain 0. Even with this

cross trafﬁc, self checksum times remained stable pro-

viding further evidence of the robustness of Ethernet

based attestation as supported by RIC.

8 RELATED WORK

The RIC system is similar to the Checkmate (Ko-

vah et al., 2012), Pioneer (Seshadri et al., 2005),

MT-SRoT (Yan et al., 2011), and HyperSentry (Azab

et al., 2010) systems and represents a direct exten-

sion of the XSWAT (Ghosh et al., 2014) system. RIC,

while similar in spirit to these works, offers important

improvements to both the security and performance

of these systems. While each of these works uses

timing-based attestation techniques to create a root of

trust in software, this is the ﬁrst of these works to ex-

tend that root of trust beyond the most privileged soft-

ware layer upwards towards the user-space applica-

tions. This has important implications that are of par-

ticular interest to cloud providers. Using this technol-

ogy a cloud provider can be assured that none of the

hypervisors or operating systems on which their cloud

services ultimately run have been compromised.

The hash parallelization scheme presented in

this paper is somewhat similar in nature to

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

244

Table 2: XSWAT Results.

Activity

Sequential

Time (ms)

Portion of Request

Parallel

Time (ms)

Portion of Request Speedup

1024K Checksum

Iterations

13.41

30.81%

13.16

72.55% 1.0

Xen Hyervisor

Hash

4.32

9.92%

0.44

2.43% 9.8

VM Hash 25.80

59.27%

4.54

25.03% 5.7

Total 43.53 18.14

2.4

Table 3: Ethernet Self Checksum Results.

Self

Checksum

Type

Iterations

Mean

Time

(ms)

Std.

Deviation

Ethernet

1024K

13.56 0.08

2048K

25.56 0.09

Serial

1024K

13.85 0.08

2048K

25.84 0.09

Memory

Copy Attack

1024K

15.03 0.2

2048K

29.03 0.23

Merkle trees (Merkle, 1982), hash lists, and hash

chains (Lamport, 1981). The general principle behind

all of these constructs is to, perhaps repeatedly, apply

a cryptographic hash function ﬁrst to the input data

and then to the output(s) that are created from this

step. Each construct aims to solve a different problem

whether it be increasing security or efﬁciency. Most

similar to our work is Merkle trees which were origi-

nally designed as a scheme to generate digital signa-

tures, however they can be applied to other problems.

In fact one of the most common uses of Merkle trees

today is to authenticate large data structures or ﬁles

in an efﬁcient manner. For example Merkle trees are

used in peer-to-peer networks to allow only part of

a ﬁle (or group of ﬁles) which is being downloaded

from a given peer to be authenticated without the need

to have the entire ﬁle. Merkle trees help peer-to-peer

networks to be more efﬁcient since if a single piece

of a ﬁle is corrupted, either due to transmission errors

or an attempted forgery, the Merkle tree allows the

client to determine the speciﬁc block of the ﬁle that

was corrupted and download only that block again

rather than re-downloading the entire ﬁle. Our so-

lution on the other hand is not concerned with iden-

tifying which piece of the measured data has been

corrupted and instead is focused on speeding up the

computation of the large hashes that the system must

perform. To accomplish this, our technique is de-

signed to run in a cloud environment which very often

runs on a multi-core platform and thus is designed to

explicitly take advantage of the availability of these

cores to compute hashes in a parallel fashion. Re-

cently, some other hashing constructions have been

proposed (Maurer and Tessaro, 2007; Haitner et al.,

2015) which are more similar in spirit to chunk-based

hashing and could potentially require less rounds of

parallel hashes resulting in a faster hash computation.

Due to the modular design of the RIC system, the

chunk-based hashing algorithm could be replaced by

one of these constructs to further improve its perfor-

mance and this possibility is being investigated.

9 CONCLUSION

In this paper we have described a method to proﬁle

and attest the correctness of a guest VM’s operating

system. This includes attesting the integrity of jump

and branch targets as well as relocations in the dy-

namically lodabale kernel modules. In addition, to

combat the added overhead of hashing the VM’s guest

OS, we also propose a parallel hashing technique that

can increase the performance of hashing large areas

of memory by 10X on a 12-core machine.

REFERENCES

(2007). TPM Main Speciﬁcation. Accessed online

at: http://www.trustedcomputinggroup.org/resources/

tpm main speciﬁcation.

Azab, A. M., Ning, P., Wang, Z., Jiang, X., Zhang, X.,

and Skalsky, N. C. (2010). Hypersentry: enabling

stealthy in-context measurement of hypervisor in-

tegrity. In Proceedings of the 17th ACM conference on

Computer and communications security, pages 38–49.

ACM.

Damg

ard, I. B. (1990). A design principle for hash func-

tions. In Advances in Cryptology–CRYPTO 1989 Pro-

ceedings, pages 416–427. Springer.

Ghosh, A., Sapello, A., Poylisher, A., Chiang, C. J., Kub-

ota, A., and Matsunaka, T. (2014). On the Feasibility

of Deploying Software Attestation in Cloud Environ-

ments. In Cloud Computing (CLOUD), 2014 IEEE 7th

International Conference on, pages 128–135. IEEE.

Runtime Attestation for IAAS Clouds

245

Greenhalgh, P. (2011). Big. little processing with arm

cortex-a15 & cortex-a7. ARM White paper, pages 1–8.

Haitner, I., Ishai, Y., Omri, E., and Shaltiel, R. (2015). Par-

allel hashing via list recoverability. In Advances in

Cryptology–CRYPTO 2015, pages 173–190. Springer.

Kovah, X., Kallenberg, C., Weathers, C., Herzog, A., Al-

bin, M., and Butterworth, J. (2012). New results

for timing-based attestation. In Security and Pri-

vacy (SP), 2012 IEEE Symposium on, pages 239–253.

IEEE.

Lamport, L. (1981). Password authentication with inse-

cure communication. Communications of the ACM,

24(11):770–772.

Le Sueur, E. and Heiser, G. (2010). Dynamic voltage and

frequency scaling: The laws of diminishing returns.

In Proceedings of the 2010 international conference

on Power aware computing and systems, pages 1–8.

USENIX Association.

Maurer, U. and Tessaro, S. (2007). Domain extension of

public random functions: Beyond the birthday bar-

rier. In Advances in Cryptology–CRYPTO 2007, pages

187–204. Springer.

Merkle, R. C. (1982). Method of providing digital signa-

tures. US Patent 4,309,569.

Merkle, R. C. (1990). A certiﬁed digital signature. In

Advances in Cryptology–CRYPTO 1989 Proceedings,

pages 218–238. Springer.

Semeraro, G., Magklis, G., Balasubramonian, R., Al-

bonesi, D. H., Dwarkadas, S., and Scott, M. L.

(2002). Energy-efﬁcient processor design using mul-

tiple clock domains with dynamic voltage and fre-

quency scaling. In High-Performance Computer Ar-

chitecture, 2002. Proceedings. Eighth International

Symposium on, pages 29–40. IEEE.

Seshadri, A., Luk, M., Shi, E., Perrig, A., van Doorn, L.,

and Khosla, P. (2005). Pioneer: verifying code in-

tegrity and enforcing untampered code execution on

legacy systems. ACM SIGOPS Operating Systems Re-

view, 39(5):1–16.

Shacham, H. (2007). The geometry of innocent ﬂesh on the

bone: Return-into-libc without function calls (on the

x86). In Proceedings of the 14th ACM Conference on

Computer and Communications Security, pages 552–

561. ACM.

Symantec (2015). 2015 Internet Security

Threat Report. Accessed online at:

https://www4.symantec.com/mktginfo/whitepaper/

ISTR/21347932 GA-internet-security-threat-report-

volume-20-2015-social v2.pdf.

Wojtczuk, R. (2008). Subverting the xen hypervisor. Black

Hat USA, 2008.

Yan, Q., Han, J., Li, Y., Deng, R. H., and Li, T. (2011).

A software-based root-of-trust primitive on multicore

platforms. In Proceedings of the 6th ACM Symposium

on Information, Computer and Communications Se-

curity, pages 334–343. ACM.

[Appendix 1]

In this section we formally deﬁne the chunk-based

hashing construction, and prove it cryptographically

secure, under the assumption that so is the basic hash

function used as a component. The notion of security

we use is that of collision-resistance (also known as

collision-intractability) as typically done in the cryp-

tography literature.

Informally speaking, a function H is collision-

resistant if it is hard for an efﬁcient adversary, who

is given the full code of H, to ﬁnd two inputs x, x

such that x 6= x

and H(x) = H(x

), unless with ex-

tremely small probability. Let H : {0, 1}

→ {0, 1}

denote a collision-resistant hash function, for integers

b > L > 0 (e.g., if H=SHA256, then L=256). Based

on H, we deﬁne a chunk-based hash function cbH by

repeated parallel applications of H to the current in-

put sequence, until the latter is compressed to a single

block. The current input sequence is ﬁrst set to the in-

put to the chunk-based hash function, and from then

on as the output of the parallel applications of H, af-

ter concatenation and padding. As deﬁned so far, this

function can be proved to be collision-resistant if so is

H and if the function is only evaluated on inputs of the

same, predeﬁned, length. To efﬁciently extend this

function to one that remains collision-resistant even

when it is evaluated on inputs of different lengths,

function cbH pads the computed single block with the

input length, and then uses this string as the input for

one last hashing step using H. The computed block

will be the output of hash function cbH. Padding the

sequence of input blocks with the input length has

been used for similar reasons in Merkle-Damgaard’s

fully-sequential paradigm for extension of the domain

of collision-resistant hash functions (Merkle, 1990;

Damg

ard, 1990).

Let N be an integer > 0, and consider the chunk-

based hash function cbH : {0, 1}

→ {0, 1}

deﬁned

as follows. On input a sequence of N blocks x =

x[1], ..., x[N], each of length b, cbH works as showin

in Algorithm 1.

Algorithm 1: Chunk-Based Hashing Algorithm.

1: Set i = 1, N(1) = N

2: repeat

3: Let y[ j] = H(x[ j]), for j = 1, ..., N(i)

4: Concatenate y[1], ..., y[N(i)] into M(i) blocks

t[1], ..., t[M(i)], each of length b, possibly

padding the last block to length b

5: Set x[ j] = t[ j], for j = 1, ..., M(i)

6: Set i = i + 1

7: Set N(i) = M(i − 1)

8: until M[i] = 1

9: Set b-bit blocks u = x[1] and v = N ∗ b, and set

z = H(u|v)

10: Output: z

We now claim and prove the main result for func-

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

246

tion cbH.

Theorem 1 . If H is collision-resistant then so is cbH

To prove this theorem, we prove the equiva-

lent contrapositive version: if cbH is not collision-

resistant then so is H. This is done by showing that if

an efﬁcient adversary algorithm cbA ﬁnds a collision

in cbH, then we can construct an efﬁcient adversary

algorithm A that ﬁnds a collision in H, with the same

probability. Let (x, x

) be the collision found by cbA in

cbH, where x = x[1], ..., x[N] and x

= x

[1], ..., x

We distinguish two cases: N 6= N

, and N = N

Case N 6= N

: Note that z = H(x[1]|N ∗ b) and

z = H(x

[1]|N

∗ b), where x[1] and x

[1] here denote

the values computed at the end of step 8 of cbH on

input, respectively, x and x

. Because N 6= N

, the pair

(x[1]|N ∗ b, x

[1]|N

∗ b) is a collision for H.

Case N = N

. Note that z = H(x[1]|N ∗ b) and

z = H(x

[1]|N ∗ b), where x[1] and x

[1] are the values

computed at the end of step 8 of cbH on input, respec-

tively, x and x

. Let i(max) denote the highest index

such that x[ j] 6= x

[ j], for j = 1, ..., M(i(max)), where

x[1], ..., x[M(i(max))] and x

[1], ..., x

[M(i(max))] here

denote the values computed at the i(max)

execution

of the repeat loop in step 2 of cbH on input, respec-

tively, x and x

. By the maximality of i(max), we

obtain that H(x[ j]) = H(x

[ j]), for at least one value

j in {1, ..., M(i(max))}, which implies that the value

(x[ j], x

[ j]) satisfying this condition is a collision for

Runtime Attestation for IAAS Clouds

247