EVALUATING AND MODELING VIRTUALIZATION

PERFORMANCE OVERHEAD FOR CLOUD ENVIRONMENTS

Nikolaus Huber, Marcel von Quast

Software Design and Quality, Karlsruhe Institute of Technology, Am Fasanengarten 5, Karlsruhe, Germany

Michael Hauck

Forschungzentrum Informatik FZI, Karlsruhe, Germany

Samuel Kounev

Software Design and Quality, Karlsruhe Institute of Technology, Am Fasanengarten 5, Karlsruhe, Germany

Keywords:

Virtualization, Modeling, Benchmarking, Performance.

Abstract:

Due to trends like Cloud Computing and Green IT, virtualization technologies are gaining increasing impor-

tance. They promise energy and cost savings by sharing physical resources, thus making resource usage more

efﬁcient. However, resource sharing and other factors have direct effects on system performance, which are

not yet well-understood. Hence, performance prediction and performance management of services deployed

in virtualized environments like public and private Clouds is a challenging task. Because of the large vari-

ety of virtualization solutions, a generic approach to predict the performance overhead of services running

on virtualization platforms is highly desirable. In this paper, we present experimental results on two popular

state-of-the-art virtualization platforms, Citrix XenServer 5.5 and VMware ESX 4.0, as representatives of the

two major hypervisor architectures. Based on these results, we propose a basic, generic performance predic-

tion model for the two different types of hypervisor architectures. The target is to predict the performance

overhead for executing services on virtualized platforms.

1 INTRODUCTION

In recent years, due to trends like Cloud Comput-

ing, Green IT and server consolidation, virtualiza-

tion technologies are gaining increasing importance.

Formerly used to multiplex scarce resources such as

mainframes (Rosenblum and Garﬁnkel, 2005), nowa-

days virtualization is again used to run multiple vir-

tual servers on a single shared infrastructure, thus in-

creasing resource utilization, ﬂexibility and central-

ized administration. Because this technology also al-

lows sharing server resources on-demand, it promises

cost savings and creates new business opportunities

by providing new delivery models, e.g., Infrastructure

as a Service or Software as a Service.

According to the International Data Corpora-

tion (IDC), 18% of all new servers shipped in the

fourth quarter of 2009 were virtualized (IDC, 2010)

and the server virtualization market is expected to

grow 30% a year through 2013 (IT world, The IDG

Network, 2008). However, the adoption of server

virtualization comes at the cost of increased system

complexity and dynamics. The increased complex-

ity is caused by the introduction of virtual resources

and the resulting gap between logical and physical re-

source allocations. The increased dynamics is caused

by the lack of direct control over the underlying physi-

cal hardware and by the complex interactions between

the applications and workloads sharing the physical

infrastructure introducing new challenges in systems

management.

Hosting enterprise services on virtualized plat-

forms like Cloud environments requires an efﬁcient

performance management strategy at the application

level. Service-Level Agreements (SLAs), e.g., perfor-

mance guarantees such as service response time ob-

jectives, need to be respected. On the other hand, the

target is to utilize server resources efﬁciently in or-

der to save administration and energy costs. Thus,

providers of virtualized platforms are faced with

563

Huber N., von Quast M., Hauck M. and Kounev S..

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD ENVIRONMENTS.

DOI: 10.5220/0003388905630573

In Proceedings of the 1st International Conference on Cloud Computing and Services Science (CLOSER-2011), pages 563-573

ISBN: 978-989-8425-52-2

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

questions such as: What performance would a new

service deployed on the virtualized infrastructure ex-

hibit and how much resources should be allocated to

it? How should the system conﬁguration be adapted

to avoid performance problems arising from chang-

ing customer workloads? In turn, customers using

virtualized resources are interested in a service’s per-

formance behavior when, e.g., moving it to a Cloud

Computing environment or when migrating it from

one platform to another.

Answering such questions for distributed, non-

virtualized execution environments is already a com-

plex task (Menasc

e et al., 1994). In virtualized en-

vironments, this task is even more complicated be-

cause resources are shared. Moreover, since changes

in the usage proﬁles of services may affect the en-

tire infrastructure, capacity planning has to be per-

formed continuously during operation. Proactive per-

formance management, i.e., avoiding penalties by act-

ing before performance SLAs are violated, requires

predictions of the application-level performance un-

der varying service workloads. Given that compu-

tation details are abstracted by an increasingly deep

virtualization layer, the following research questions

arise: i) What is the performance overhead when vir-

tualizing execution environments? ii) Which are the

most relevant factors that affect the performance of

a virtual machine? iii) What are the differences in

performance overhead on different virtualization plat-

forms? iv) Can the performance-inﬂuencing factors

be abstracted in a generic performance model?

Previous work on performance evaluation of vir-

tualization platforms focuses mainly on comparisons

of speciﬁc virtualization solutions and techniques,

e.g., container-based virtualization versus full virtu-

alization (Barham et al., 2003; Padala et al., 2007;

Soltesz et al., 2007; Qu

etier et al., 2007). Other work

like (Apparao et al., 2008; Tickoo et al., 2009; Iyer

et al., 2009) investigates core and cache contention ef-

fects. (Koh et al., 2007) predict the performance infer-

ence of virtualized workloads by running benchmarks

manually. (Huber et al., 2010) propose an approach

on a systematic and automated experimental analysis

of the performance-inﬂuencing factors of virtualiza-

tion platforms applied to the Citrix XenServer 5.5.

In this paper, we use the automated experimen-

tal analysis approach from (Huber et al., 2010) as

a basis. We extend this approach and evaluate its

applicability to VMware ESX 4.0, another industry-

standard platform with a different hypervisor archi-

tecture (Salsburg, 2007). The main goal is to build

a generic model which enables the prediction of per-

formance overheads on different virtualization plat-

forms. To this end, we evaluate various performance-

inﬂuencing factors like scheduling parameters, differ-

ent workload types and their mutual inﬂuences, and

scalability and overcommitment scenarios on the two

different types of hypervisor architectures. At the

same time, we evaluate the portability of automated

experimental analysis approach to other platforms.

Finally, we summarize the results of both case studies

and formulate a basic generic model of the inﬂuences

of various parameters on the performance of virtual-

ized applications. This model shall provide the means

for estimating the performance of a native application

when migrated to a virtualized platform or between

platforms of different hypervisor architectures. In ad-

dition, this model can be used for capacity planning,

e.g., by Cloud providers, to estimate the number of

virtual machines (VMs) which can be hosted.

The contributions of this paper are: i) an in-depth

experimental analysis of the the state-of-the-art Cit-

rix XenServer 5.5 virtualization platform covering

performance-inﬂuencing factors like scheduling pa-

rameter, mutual inﬂuences of workload types etc.,

ii) an evaluation of these results on VMware ESX 4.0,

another representative virtualization platform with a

different hypervisor architecture, iii) an experience

report on the migration of virtual machines and the

automated administration of virtualization platforms,

iv) a basic model capturing the general performance-

inﬂuencing factors we have identiﬁed.

The remainder of this paper is organized as fol-

lows. Section 2 provides an overview of the auto-

mated experimental analysis we use. Additionally,

it presents further experimental results on the Citrix

XenServer 5.5. An evaluation of our results based on

repeated experiments on VMware ESX 4.0 is given in

Section 3. In Section 4, we present our performance

prediction model. Section 5 discusses related work,

followed by a conclusion and an outlook on future

work in Section 6.

2 AUTOMATED EXPERIMENTAL

ANALYSIS

Because virtualization introduces dynamics and in-

creases ﬂexibility, a variety of additional factors can

inﬂuence the performance of virtualized systems.

Therefore we need to automate the experiments and

performance analysis as much as possible. In this

section we give a brief summary of the generic ap-

proach to automated experimental analysis of virtual-

ized platforms presented in (Huber et al., 2010). We

brieﬂy explain the experimental setup as well as the

general process that is followed. Furthermore, we ex-

plain the experiment types in more detail because they

CLOSER 2011 - International Conference on Cloud Computing and Services Science

564

are used as a basis for our evaluation. We also sum-

marize the previous results of (Huber et al., 2010) and

enrich them with more more ﬁne-grained results and

analyses in Section 2.5.

2.1 Experimental Setup

The experimental setup basically consists of a Mas-

terVM and a controller. From a static point of view,

the MasterVM serves as a template for creating multi-

ple VM clones executing a benchmark of choice (see

Section 2.4). It contains all desired benchmarks to-

gether with a set of scripts to control the benchmark

execution (e.g., to schedule benchmark runs). A sec-

ond major part is the controller which runs on a ma-

chine separated from the system under test. From a

dynamic point of view, the controller clones, deletes,

starts, and stops VMs via the virtualization layer’s

API. Furthermore, it is responsible for collecting, pro-

cessing and visualizing the results. It also adjusts the

conﬁguration (e.g., the amount of virtual CPUs) of the

MasterVM and the created clones as required by the

considered type of experiment.

2.2 Experiment Types

Several types of experiments are executed, targeted

at the following categories of inﬂuencing factors:

(a) virtualization type, (b) resource management con-

ﬁguration, and (c) workload proﬁle.

For category (a), an initial set of experiments is

executed to quantify the performance overhead of the

virtualization platform. The number of VMs and

other resource management-related factors like core

afﬁnity or CPU scheduling parameters are part of cat-

egory (b). The inﬂuence of these factors is investi-

gated in two different scenarios, focused on scalabil-

ity (in terms of number of co-located VMs), and over-

commitment (in terms of allocating more resources

than are actually available). For scalability, one in-

creases the number of VMs until all available physical

resources are used. For overcommitment, the number

of VMs is increased beyond the amount of available

resources. Finally, for category (c) a set of bench-

marks is executed focusing on the different types of

workloads. For a more detailed description of the ex-

periment types as well as a benchmark evaluation, we

refer to (Huber et al., 2010).

2.3 Experimental Environment

We conducted our experimental analysis in two differ-

ent hardware environments described below. In each

considered scenario Windows 2003 Server was the

native and guest OS hosting the benchmark applica-

tion, unless stated otherwise.

Environment 1. This environment is a standard

desktop HP Compaq dc5750 machine with an

Athlon64 dual-core 4600+, 2.4 GHz. It has 4 GB

DDR2-5300 of main memory, a 250 GB SATA HDD

and a 10/100/1000-BaseT-Ethernet connection. The

purpose of this environment was to conduct initial ex-

periments for evaluating the overhead of the virtual-

ization layer. This hardware was also used to run ex-

periments on a single core of the CPU by deactivating

the second core in the OS.

Environment 2. To evaluate the performance when

scaling the number of VMs, a SunFire X4440 x64

Server was used. It has 4*2.4 GHz AMD Opteron

6 core processors with 3MB L2, 6MB L3 cache each,

128 GB DDR2-667 main memory, 8*300 GB of se-

rial attached SCSI storage and 4*10/100/1000-BaseT-

Ethernet connections.

2.4 Benchmark Selection

Basically, any type of benchmark can be used in

the automated experimental analysis. Only the

scripts to start and stop the benchmark and to ex-

tract the results must be provided. For CPU and

memory-intensive workloads, two alternative bench-

marks have been discussed in (Huber et al., 2010):

Passmark PerformanceTest v7.0

(a benchmark used

by VMware (VMware, 2007)) and SPEC CPU2006

(an industry standard CPU benchmark). Both bench-

marks have a similar structure consisting of sub-

benchmarks to calculate an overall metric. Bench-

mark evaluation results in (Huber et al., 2010) showed

that both Passmark and SPEC CPU show similar re-

sults in terms of virtualization overhead. However, a

SPEC CPU benchmark run can take several hours or

even to complete. Since passmark has much shorter

runs, the authors use Passmark in their experiments

and repeat each benchmark run 200 times to obtain

a more conﬁdent overall rating and to gain a picture

of the variability of the results. In addition to Pass-

mark, the Iperf benchmark

is used to measure the

network performance. It is based on a client-server

model and supports the throughput measurement of

TCP and UDP data connections between both end-

points.

Passmark PerformanceTest: http://www.passmark.com/

products/pt.htm

SPEC CPU2006: http://www.spec.org/cpu2006/

Iperf: http://iperf.sourceforge.net/

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD

ENVIRONMENTS

565

2.5 Experiment Results

In (Huber et al., 2010), several benchmarks (CPU,

memory, network I/O) were automatically executed

to analyze the performance of native and virtualized

systems. The results of a case study with Citrix

XenServer 5.5 showed that the performance overhead

for CPU virtualization is below 5% due to the hard-

ware support. However, memory and network I/O vir-

tualization overhead amounts up to 40% and 30%, re-

spectively. Further experiments examined the perfor-

mance overhead in scalability and overcommitment

scenarios. The results for both the scalability and

overcommitment experiments showed that the perfor-

mance behavior of Citrix XenServer 5.5 meets the ex-

pectations and scales very well even for more than

100 VMs. Moreover, the measurements showed that

performance loss can be reduced if one assigns the

virtual CPUs to physical cores (called core pinning or

core afﬁnity). For a detailed discussion of the previ-

ous results, we refer to (Huber et al., 2010).

These previous ﬁndings are now extended by our

more recent in-depth measurement results. At ﬁrst,

we compare the performance overhead of virtualiza-

tion for different workload types. Second, we present

performance overheads in scaled-up and overcommit-

ment scenarios. Finally, we investigated the impact of

network I/O in more detail.

Overhead of Virtualization. In these experiments,

we investigate the performance degradation for CPU,

memory and I/O in more detail by looking at the

sub-benchmark results of each benchmark metric.

The new measurement results of Figure 1 depict the

ﬁne-grained sub-benchmark results for the Passmark

CPU Mark results normalized to the native execu-

tion. The results demonstrate that ﬂoating point op-

erations are more expensive (up to 20% performance

drop for the Physics sub-benchmark) than the other

sub-benchmark results. However, the overall perfor-

mance drops are still in the range of 3% to 5%.

Total

(CPU Mark) Integer Math

Floating Point

Math

Find Prime

Numbers SSE Compression Encryption Physics String Sorting

Legend

native (1 CPU core)

virtualized (1 CPU core)

native (2 CPU cores)

virtualized (2 CPU cores)

normalized

0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10

Figure 1: Sub-benchmark results of the Passmark CPU

Mark metric.

Looking at the ﬁne-grained memory benchmark

results depicted in Figure 2, one can see that for

the memory-intensive workloads the main cause for

the overall performance drop stems from the alloca-

tion of large memory areas. For the Large RAM sub-

benchmark, performance overhead is almost 97%.

The problem was, that to replicate our VM template

in the CPU overcommitment scenarios, we could only

assign 256MB main memory to each VM because

memory overcommitment is currently not supported

by Citrix XenServer 5.5. However, when increasing

this size to 3GB main memory in separate, indepen-

dent experiment with one VM at a time, the perfor-

mance overhead for large memory accesses is only

65% instead of 97%, which also improves the over-

all memory benchmark results slightly. Hence, in-

creasing memory allocation can signiﬁcantly improve

performance for memory-intensive workloads, espe-

cially if expensive swapping can be avoided.

Total

(Memory Mark)

Allocate

Small Block

Read

Cached

Read

Uncached Write Large RAM

Legend

native (1 CPU core)

virtualized (1 CPU core)

native (2 CPU cores)

virtualized (2 CPU cores)

normalized

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Figure 2: Sub-benchmark results of the Passmark Memory

Mark metric.

Finally, Figure 3 shows our results for disk I/O

intensive workloads . With the Passmark Disk mark

benchmark, we measured a performance overhead of

up to 28%. A more detailed look at the benchmark

results shows that most of the performance overhead

is caused by sequential read requests, which achieve

only 60% of the native performance, whereas for

write request’s the performance overhead does not

exceed 20%.

The performance increase for the random seek

benchmark can be explained by the structure of

the virtual block device, a concept used in Citrix

XenServer 5.5 for block oriented read and write, min-

imizing administration overhead and thus decreasing

access times.

Scalability and Overcommitment. Previous ex-

perimental results showed that the performance over-

head can be reduced for CPU workload when us-

ing core afﬁnity. The following more detailed mea-

surements give more insights and explain why core

CLOSER 2011 - International Conference on Cloud Computing and Services Science

566

Total

(Disk Mark)

Sequential

Read

Sequential

Write

Random Seek

(read/write)

Legend

native (1 CPU core)

virtualized (1 CPU core)

native (2 CPU cores)

virtualized (2 CPU cores)

Subbenchmarks Festplatte (lokal)

normalized

0.6 0.8 1.0 1.2 1.4

Figure 3: Sub-benchmark results for local disk access of the

Passmark Disk Mark metric.

●

●●

●

840 860 880 900 920 940

VMs

CPU Mark Rating

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

(a)

●

●●

●

●●

●

●●

●

●●

●

800 850 900 950

VMs

CPU Mark Rating

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

(b)

Figure 4: CPU benchmark results for 24 VMs executed

without (a) and with (b) core afﬁnity.

afﬁnity improves performance. Figure 4(a) shows

the boxplot of 24 VMs simultaneously executing the

CPU benchmark without core afﬁnity. There are

clearly two categories of VMs, one category (1, 11-

19) performing signiﬁcantly different from the other

(2-10, 20-24). This behavior is not observed with

core afﬁnity (see Figure 4(b)), when each VM is exe-

cuted on a separate physical core. This indicates that

XenServer’s scheduler does not distribute the VMs on

the cores equally. In general, it demonstrates that mul-

ticore environments still are a challenge for hypervi-

sor schedulers. In Section 3, we investigate if this

effect is observable on other hypervisor architectures,

too.

With core afﬁnity enabled, for scalability and

overcommitment the performance drops linear with

the scaled amount of VMs and inversely proportional

to the overcommitment factor, respectively. For ex-

ample, assume c is the amount of physical cores.

If provisioning x · c amount of virtual CPUs, perfor-

mance roughly drops by

in each of our experimental

environments (single core, dual core, 24 cores).

Network I/O. We conducted further experiments

with the network I/O benchmark Iperf to gain more

insight on the performance overhead of XenServer’s

credit-based scheduler and its performance isolation.

More precisely, the goal of these experiments was to

demonstrate how the additional overhead introduced

by the hypervisor to handle I/O requests is distributed

among the running VMs. To this end, we executed

four VMs, two VMs running CPU Mark and two VMs

with Iperf. We pinned them pairwise on two physical

cores, i.e., core c

executed a pair of the CPU VM

and Iperf VM and a different available core c

the

other pair. The CPU benchmark was executed on both

VMs, simultaneously and the network I/O benchmark

was started separately on one VM. This symmetric

setup allows us to compare the results of VMs exe-

cuted on c

(where the Dom0 is executed) with the

results on the different core

0 20 40 60 80 100

400 500 600 700 800 900

Network I/O performance effects

# Measurement

CPU Mark Rating

cpu mark measurements on core

Figure 5: CPU benchmark results of two VMs executed on

core

and core

when network I/O is received by the VM

running on core

One would expect that there is no performance ef-

fect on the VMs running on c

when the Iperf VM on

executes network I/O. However, the results show

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD

ENVIRONMENTS

567

Table 1: Mutual performance degradation for different workload types on Citrix XenServer 5.5.

V M

CPU CPU Mem CPU Mem Disk CPU Mem Disk

V M

CPU Mem Mem Disk Disk Disk Net Net Net

46.71% 50.64% 50.33% 23.35% 24.82% 31.16% 52.88% 52.85% 3.07%

52.44% 45.93% 49.04% 1.49% -0.09% 45.99% 40.46% 42.18% 33.31%

that the performance of the VM running the CPU

benchmark on c

drops up to 13% when the VM on

is executing the network I/O benchmark. Figure 5

depicts the CPU benchmark results of VMs executed

on core

(o) and core

(+). The second VM on core

receives the network load, hence the benchmark rat-

ing of the VM sharing this core drops signiﬁcantly.

But also the benchmark rating of the VM on core

drops, although its paired VM is idle. Because VMs

executed on other cores than c

did not exhibit this

behavior, this indicates that Dom0 mainly uses c

handle I/O. This causes a slight performance drop for

VMs simultaneously executed on c

, i.e., about 1% on

average. However, this drop could further increase if

further machines on other cores receive network load.

We did not execute these scalability experiments as

this would require further network interfaces to gen-

erate network load on further Iperf VMs. However,

this is an interesting question for future work as well

as the question if this effect can be observed on other

hypervisors too, which will be discussed in the fol-

lowing section.

Mutual Inﬂuences of Workload Types. Target of

these experiments is to identify the mutual inﬂuences

of VMs sharing their resources and serving different

workload types. To this end, we pinned two VMs

V M

and V M

on the same physical core other then

core

to avoid interferences with Dom0. Then, we

ran an experiment for each possible combination of

benchmark types. As a result, we calculate the rel-

ative performance drop as r = 1 − (r

), where r

is the interference result and r

the result measured

when executing the benchmark on an isolated VM.

Table 1 summarizes the results for all combinations

of workload types. Note that we did not run network

vs. network experiments because a different hardware

environment with additional hardware would have

been required.

The results show that there are no signiﬁcant mu-

tual inﬂuences of CPU and memory intensive work-

loads. The performance drop for both benchmarks

is reasonably equal and the drop also ﬁts the expec-

tation that each VM receives only half of its perfor-

mance compared to isolated execution. Explanations

are the similarity of both workload types in terms of

the used resources (memory benchmarks require CPU

as well) and the hardware support for CPU virtual-

ization. An interesting observation is that the Disk

benchmark is not inﬂuenced by other workload types

except of when executed vs. the disk benchmark. This

indicates that on Citrix XenServer 5.5, disk intensive

workloads do not compete for resources of CPU and

memory intensive workloads. This can be explained

with the similar reason as for the virtualization over-

head of the Disk mark result: the concept used in Cit-

rix XenServer 5.5 for block oriented read and write to

minimize administration overhead. With this concept,

disk workload can be passed through without requir-

ing major hypervisor intervention.

3 EVALUATION

To evaluate the validity of the conclusions from our

analysis of XenServer on other hypervisor architec-

tures, we conducted the same experiments on an-

other popular industry standard platform, VMware

ESX 4.0, which has a different type of hypervisor ar-

chitecture. This section compares the experiment re-

sults of both platforms. We also provide a short ex-

perience report on the portability of the automated

experimental analysis approach, which is of general

interest when migrating from one virtualization plat-

form to another as well as for automatic administra-

tion of virtualization platforms.

3.1 Platform Comparison

The following discussion and comparison of the re-

sults of our measurements on VMware ESX 4.0 has

a similar structure as Section 2.5. However, because

of unavailable driver support for the HP Compaq ma-

chine (see Section 3.2), we were only able to install

VMware ESX 4.0 and to conduct our experiments on

the SunFire machine.

Overhead of Virtualization. After repeating the

experiments on VMware ESX 4.0, we calculate

the relative delta between the two platforms as

V MwareESX 4.0 − CitrixXenServer 5.5

V MwareESX 4.0

. The results in Ta-

ble 2 show almost identical results for the CPU and

memory benchmarks because both virtualization plat-

CLOSER 2011 - International Conference on Cloud Computing and Services Science

568

forms use the hardware virtualization support. How-

ever, for the I/O benchmarks, VMware ESX 4.0 per-

forms better. The reason for this is that in Citrix

XenServer 5.5, all I/O workload is handled by the

separate driver domain Dom0, which is less efﬁcient

monolithic architecture of VMware ESX 4.0. Hence,

it is important to distinguish these architectural differ-

ences when generalizing the results for the I/O perfor-

mance overhead.

Table 2: Relative deviation of CPU, memory, disk I/O and

network I/O benchmark results.

Benchmark rel. Delta

CPU Mark 0.15%

Memory Mark 0.19%

Disk Mark 19.14%

Iperf, outgoing 13.91%

Iperf, incoming 15.94%

Scalability and Overcommitment. Concerning

the performance behavior of VMware ESX 4.0

when scaling up and overcommitting, respectively,

we observe a similar trend to the one on Citrix

XenServer 5.5. Figure 6 shows this trend for the over-

commitment scenario. As one can see, both platforms

behave similarly. The results for scalability are simi-

lar with VMware ESX 4.0 performing slightly better.

Another observation was that on VMware ESX 4.0,

using core afﬁnity did not result in any performance

improvements. This indicates an improved hypervi-

sor scheduling strategy which takes care of multicore

environments and the cache and core effects observed

in Section 2.5.

0.0 0.2 0.4 0.6 0.8 1.0

Overcommitment factor

CPU Mark Rating (nomralized)

Hypothesis (1/x)

XenServer

vSphere

CPU overcommitment

1 2 3

Figure 6: Performance behavior for CPU overcommitment

for Citrix XenServer 5.5 and VMware ESX 4.0.

Network I/O. Analyzing the results of the network

I/O experiments repeated on VMware ESX 4.0 shows

some further advantages of the monolithic architec-

ture and that the concept of a separate management

VM (Dom0) has a slight performance drawback. For

example, we did not observe the effect of the Dom0

discussed in Section 3.1 (Network I/O). Hence, on

VMware ESX 4.0, the additional overhead for I/O vir-

tualization is distributed more evenly than in Citrix

XenServer 5.5.

Mutual Inﬂuences of Workload Types. We re-

peated the same experiments to determine the mutual

inﬂuences of workload types on VMware ESX 4.0.

Table 3 lists the results. For CPU and memory in-

tensive workloads, the observations are comparable

to the ones for Citrix XenServer 5.5: both workload

types have a similar effect on each other caused by

their similarities.

However, there is a big difference to Citrix

XenServer 5.5 for disk intensive workload. For

VMware ESX 4.0, we observe a high performance

degradation of the disk workload independent of the

other workload type. For example, if the disk bench-

mark is executed with CPU or memory benchmark,

disk benchmark results drop almost 50%, whereas

CPU and memory benchmark results suffer from only

10% and 20% performance loss, respectively. One ex-

planation is that VMware’s virtual disk concept is dif-

ferent from Xen and in this concept both VMs com-

pete for CPU time assigned by the hypervisor, thus

conﬁrming the differences in both hypervisor archi-

tectures. However, CPU and memory suffer from less

performance degradation when running against disk

workload than in Citrix XenServer 5.5.

3.2 Portability of the Automated

Analysis

The ﬁrst problem when applying the automated ex-

perimental analysis approach to VMware ESX 4.0

was that we could not install it on the HP Compaq

machine because of the lack of driver support. Sup-

porting only commodity hardware keeps the mono-

lithic hypervisor’s footprint low. Because of Cit-

rix XenServer 5.5’s architecture, further drivers can

be easily implemented in the Dom0 domain while

still keeping the hypervisors footprint small. Hence,

we could not repeat the measurements for VMware

ESX 4.0 on the HP Compaq machine. The next chal-

lenge we faced was that the VMs of VMware ESX 4.0

are usually intended to be managed via graphical ex-

ternal tools, which hinders an automated approach.

Fortunately, a special command line interface, which

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD

ENVIRONMENTS

569

Table 3: Mutual performance degradation for different workload types on VMware ESX 4.0.

V M

CPU CPU Mem CPU Mem Disk CPU Mem Disk

V M

CPU Mem Mem Disk Disk Disk Net Net Net

47.03% 46.64% 49.23% 10.02% 17.21% 44.53% 9.95% 35.32% 14.87%

48.21% 40.29% 51.34% 49.56% 45.53% 44.82% 65.02% 54.56% 32.74%

must be activated separately, can be used for automa-

tion.

As both platforms support the Open Virtualization

Format (OVF) for virtual machines in theroy, port-

ing the MasterVM should be easy. Although this

standardized XML schema for OVF is in fact imple-

mented on both platforms, they use a different XML

tag semantic to describe the VM geometry (like its

partitions etc.). Hence, the export of a VM from

Citrix XenServer 5.5 to VMware ESX 4.0 works in

theory, but practically only with additional tools and

workarounds, involving manual XML editing.

Once migrated, one can reuse the concept of au-

tomated experimental analysis and experiment types,

but one has to adapt the scripts to the new API. For ex-

ample, the credit-based scheduler parameter capacity

is named differently on VMware ESX 4.0.

In summary, the concept itself is portable but re-

quires some manual adjustments. This mainly stems

from the fact that currently there is no standardized

and working virtual machine migration mechanism

across different virtualization platforms.

3.3 Summary

By migrating and repeating our automated analysis

to VMware ESX 4.0, we were able to conﬁrm the

results for CPU and memory intensive workloads as

well as the observed trends in the scalability and over-

commitment scenarios. However, the experiments

also showed that there are differences when handling

I/O intensive workloads. In these scenarios, VMware

ESX 4.0’s performance behavior and performance

isolation is better. However, this product has high

licensing costs. Moreover, it is intended for graphi-

cal administration which makes automized adminis-

tration more difﬁcult.

4 MODELING THE

PERFORMANCE-INFLUENCING

FACTORS

Having analyzed two major representative virtualiza-

tion platforms, we now structure the performance-

inﬂuencing factors and capture them in a basic math-

ematical performance model allowing one to predict

the performance impacts of virtualized environments.

4.1 Categorizing the

Performance-inﬂuencing Factors

This section categorizes the performance-inﬂuencing

factors of the presented virtualization platforms. The

goal is to provide a compact hierarchical model of

performance-relevant properties and their dependen-

cies. We capture those factors that have to be con-

sidered for performance predictions at the applica-

tion level, i.e., that have a considerable impact on the

virtualization platform’s performance, and we struc-

ture them in a so-called feature model (Czarnecki and

Eisenecker, 2000). In our context, a feature corre-

sponds to a performance-relevant property or a con-

ﬁguration option of a virtualization platform. The

goal of the feature model is to capture the options that

have an inﬂuence on the performance of the virtual-

ization platform in a hierarchical structure. The fea-

ture model should also consider external inﬂuencing

factors such as workload proﬁle or type of hardware.

The model we propose is depicted in Figure 7.

The ﬁrst performance-inﬂuencing factor is the vir-

tualization type. Different techniques might cause

different performance overhead, e.g., full virtualiza-

tion performs better than other alternatives because of

the hardware support. In our feature model, we distin-

guish between the three types of virtualization: i) full

virtualization, ii) para-virtualization and iii) binary

translation. Furthermore, our experiments showed,

that another important performance-inﬂuencing fac-

tor is the hypervisor’s architecture. For example, a

monolithic architecture exhibited better performance

isolation.

Several inﬂuencing factors are grouped under re-

source management conﬁguration. First, the CPU

scheduling conﬁguration has a signiﬁcant inﬂuence

on the virtualization platform’s performance and is

inﬂuenced by several factors. The ﬁrst factor CPU

allocation reﬂects the number of virtual CPUs allo-

cated to a VM. Most of the performance loss of CPU

intensive workloads comes from core and cache infer-

ences (Apparao et al., 2008). Hence, the second factor

is core afﬁnity, specifying if virtual CPUs of VMs are

assigned to dedicated physical cores (core-pinning).

CLOSER 2011 - International Conference on Cloud Computing and Services Science

570

Virtualization Platform

Binary Translation

Para-Virtualization

Full Virtualization

Virtualization Type

exclusive OR

Legend

inclusive OR

Resource Management

Configuration

CPU Scheduling

CPU Allocation

Core Affinity

Workload Profile

I/O

CPU

Memory

Network

Disk

CPU Priority

Memory Allocation

Number of VMs

Resource

Overcommitment

VMM Architecture

Dom0

Monolitic

e.g. cap=50

e.g. mask=1,2

e.g. vcpu=4

Figure 7: Major performance-inﬂuencing factors of virtualization platforms.

The third factor reﬂects the capability of assigning

different CPU priorities to the VMs. For example,

the Xen hypervisor’s cap parameter or VMware’s lim-

its and ﬁxed reservations parameters are CPU pri-

ority conﬁgurations. In addition, the level of re-

source overcommitment inﬂuences the performance

due to contention effects caused by resource shar-

ing. Finally, the memory allocation and the number

of VMs inﬂuence the resource management conﬁg-

uration, too. Managing virtual memory requires an

additional management layer in the hypervisor. The

number of VMs has a direct effect on how the avail-

able resources are shared among all VMs.

Last but not least, an important inﬂuencing fac-

tor is the workload proﬁle executed on the virtual-

ization platform. Virtualizing different types of re-

sources causes different performance overheads. For

example, CPU virtualization is supported very well

whereas I/O and memory virtualization currently suf-

fer from signiﬁcant performance overheads. In our

model we distinguish CPU, memory and I/O intensive

workloads. In the case of I/O workload, we further

distinguish between disk and network intensive I/O

workloads. Of course, one can also imagine a work-

load mixture as a combination of the basic workload

types.

4.2 Performance Model

Based on the results of Section 2.5 and Section 3

we now propose a basic mathematical performance

prediction model (e.g., based on linear regression).

We focus on the performance-inﬂuencing factors for

which similar results were observed on the two virtu-

alization platforms considered. These are the over-

head for CPU and memory virtualization, the per-

formance behavior in scalability scenarios and the

performance behavior when overcommitting CPU re-

sources. Our model is intended to reﬂect the perfor-

mance inﬂuences of the factors presented in the previ-

ous section. It can be used to predict the performance

overhead for services to be deployed in, e.g., Cloud

Computing or virtualized environments in general.

In our experiments, performance is measured as

the amount of benchmark operations processed per

unit of time, i.e., the throughput of the system. This

is not directly transferable to system utilization or re-

sponse times, as the benchmarks always try to fully

utilize the available resources. Therefore, in the fol-

lowing, we refer to throughput as the system perfor-

mance. We calculate a performance overhead fac-

tor o which can be used to predict the performance

virtualized

= o · p

native

, where o can be replaced by a

formula of one of the following sections.

Overhead of Virtualization. The following basic

equations allow to predict the overhead introduced

when migrating a native system to a virtualized plat-

form. These equations assume that there are no inﬂu-

ences by other virtual machines, which we consider

later below.

For CPU and memory virtualization, we cal-

culate the overhead factors o

cpu

and o

mem

1 −

relative deviation

100

using the measured relative devi-

ation values. One can use our automated approach

to determine these factors for any other virtualization

platform to derive more speciﬁc overhead factors.

For I/O overhead, we recommend to measure the

performance overhead for each speciﬁc virtualization

platform using our automated approach because the

evaluation showed that there are signiﬁcant differ-

ences between different virtualization platforms and

their implementations, respectively.

Scalability. To model the performance-inﬂuence of

scaling-up CPU resources, we use a linear equation.

The performance overhead is deﬁned as o

scal

= a +b ·

virt

, where c

virt

is the number of virtual cores. The

coefﬁcients a and b are given in Table 4. We distin-

guish between scenarios without core afﬁnity and sce-

narios, where the virtual CPUs are pinned to the phys-

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD

ENVIRONMENTS

571

ical cores in an equal distribution. These equations

give an approximation of the performance degrada-

tion when scaling-up which is independent of the vir-

tualization platform. However, this approximation

is only valid until you reach the amount of physi-

cal cores available. The overcommitment scenario is

modeled in the next section. Moreover, the coefﬁ-

cients of determination show that the linear trend ﬁts

very well, except for the CPU with afﬁnity scenario.

Table 4: Coefﬁcients a, b for the linear equations for CPU

and memory performance when scaling-up and the corre-

sponding coefﬁcient of determination.

Scenario a b R

CPU 1.008 -0.0055 0.9957

Memory 1.007 -0.0179 0.9924

CPU (w. afﬁnity) 1.003 -0.0018 0.7851

Memory (w. afﬁnity) 1.002 -0.0120 0.9842

Overcommitment. When considering a scenario

with overcommitted CPU resources, we can approxi-

mate the performance overhead as o

overc

, where

x is the overcommitment factor. The overcommit-

ment factor is determined by

virt

phy

, the ratio of the

provisioned virtual cores c

virt

and available physical

cores c

phy

. Note that for CPU overcommitment this

dependency between the performance overhead and

the overcommitment factor is independent of the vir-

tualization platform and the amount of executed VMs.

Our experiments on two leading industry standard

virtualization platforms demonstrated that the perfor-

mance overhead simply depends on the ratio of virtual

and physical cores. This dependency is valid at the

core level, i.e., if you pin two VMs with one virtual

core each on a single physical core, you experience

the same performance drop.

5 RELATED WORK

There are two groups of existing work related to

the work presented in this paper. The ﬁrst group

deals with benchmarking and performance analysis

of virtualization platforms and solutions. The second

group is related to this work in terms of modeling the

performance-inﬂuencing factors.

(Barham et al., 2003) present the Xen hypervisor

and compare its performance to a native system, the

VMware workstation 3.2 and a User-Mode Linux at

a high level of abstraction. They show that the per-

formance is practically equivalent to a native Linux

system and state that the Xen hypervisor is very scal-

able. (Qu

etier et al., 2007; Soltesz et al., 2007; Padala

et al., 2007) follow similar approaches by bench-

marking, analyzing and comparing the properties of

Linux-VServer 1.29, Xen 2.0, User-Mode Linux ker-

nel 2.6.7, VMware Workstation 3.2. and OpenVZ,

another container-based virtualization solution. (Ap-

parao et al., 2008) analyze the performance charac-

teristic of a server consolidation workload. Their re-

sults show that most of the performance loss of CPU

intensive workloads is caused by cache and core in-

terferences. However, since the publication of these

results, the considered virtualization platforms have

changed a lot (e.g., hardware support was introduced)

which renders the results outdated. Hence, the results

of these works must be revised especially to evaluate

the inﬂuences of, e.g., hardware support. Moreover,

the previous work mentioned above does not come up

with a model of the performance-inﬂuencing factors

nor does it propose a systematic approach to quan-

tify their impact automatically. Such a generic frame-

work to conduct performance analyses is presented in

(Westermann et al., 2010). This framework allows

adding adapters to benchmark, monitor, and analyze

the performance of a system. The framework has

been applied to the performance analysis of message-

oriented middleware, however, the adapters currently

do not support the analysis of performance properties

of virtualization platforms or Cloud Computing envi-

ronments.

The second area of related work is the model-

ing of virtualization platforms or shared resources.

(Tickoo et al., 2009) identify the challenges of mod-

eling the contention of the visible and invisible re-

sources and the hypervisor. In their consecutive work

based on (Apparao et al., 2008; Tickoo et al., 2009),

(Iyer et al., 2009) measure and model the inﬂuences

of VM shared resources. They show the importance

of shared resource contention on virtual machine per-

formance and model cache and core effects, but no

other performance-inﬂuencing factors. Another in-

teresting approach to determine the performance in-

terference effects of virtualization based on bench-

marking is (Koh et al., 2007). They explicitly con-

sider different types of workloads (applications) and

develop performance prediction mechanisms for dif-

ferent combinations of workloads. Unfortunately, this

approach is neither automated nor evaluated on differ-

ent platforms.

6 CONCLUSIONS AND

OUTLOOK

In this paper, we conducted ﬁne-grained experiments

and in-depth analyses of the Citrix XenServer 5.5

CLOSER 2011 - International Conference on Cloud Computing and Services Science

572

based on the results of (Huber et al., 2010). We mi-

grated this approach to VMware ESX 4.0 and eval-

uated the validity of the previous ﬁndings. In sum-

mary, the results showed that CPU and memory virtu-

alization performance behavior is similar on both sys-

tems as well as CPU scalability and overcommitment.

However, the results also indicated a deviation when

it comes to I/O virtualization and scheduling. In these

cases, VMware ESX 4.0 provides better performance

and performance isolation than Citrix XenServer 5.5.

We evaluated the portability of the automated experi-

mental analysis approach. Finally, we presented a ba-

sic model allowing to predict the performance when

migrating applications from native systems to virtual-

ized environments, for scaling up and overcommitting

CPU resources, or for migrating to a different virtual-

ization platform. As a next step, we plan to study the

performance overhead for mixed workload types and

their mutual performance inﬂuence in more detail. In

addition, we will use our model as a basis for future

work in the Descartes research project (Descartes Re-

search Group, 2010; Kounev et al., 2010). For exam-

ple, we will integrate our results in a meta model for

performance prediction of services deployed in dy-

namic virtualized environments, e.g., Cloud Comput-

ing.

ACKNOWLEDGEMENTS

This work was funded by the German Research Foun-

dation (DFG) under grant No. 3445/6-1.

REFERENCES

Apparao, P., Iyer, R., Zhang, X., Newell, D., and

Adelmeyer, T. (2008). Characterization & Analysis

of a Server Consolidation Benchmark. In VEE ’08.

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T.,

Ho, A., Neugebauer, R., Pratt, I., and Warﬁeld, A.

(2003). Xen and the Art of Virtualization. In Sym-

posium on Operating Systems Principles.

Czarnecki, K. and Eisenecker, U. W. (2000). Generative

Programming. Addison-Wesley.

Descartes Research Group (2010). http://www.descartes-

research.net.

Huber, N., von Quast, M., Brosig, F., and Kounev, S.

(2010). Analysis of the Performance-Inﬂuencing Fac-

tors of Virtualization Platforms. In DOA’10.

IDC (2010). Virtualization Market Accelerates Out of the

Recession as Users Adopt ”Virtualize First” Men-

tality. http://www.idc.com/getdoc.jsp?containerId

=prUS22316610.

IT world, The IDG Network (2008). Gartner’s data

on energy consumption, virtualization, cloud.

http://www.itworld.com/green-it/59328/gartners-

data-energy-consumption-virtualization-cloud.

Iyer, R., Illikkal, R., Tickoo, O., Zhao, L., Apparao, P., and

Newell, D. (2009). VM3: Measuring, modeling and

managing VM shared resources. Computer Networks,

53(17):2873 – 2887.

Koh, Y., Knauerhase, R. C., Brett, P., Bowman, M., Wen,

Z., and Pu, C. (2007). An analysis of performance in-

terference effects in virtual environments. In ISPASS.

Kounev, S., Brosig, F., Huber, N., and Reussner, R. (2010).

Towards self-aware performance and resource man-

agement in modern service-oriented systems. In Int.

Conf. on Services Computing.

Menasc

e, D. A., Almeida, V. A. F., and Dowdy, L. W.

(1994). Capacity Planning and Performance Model-

ing - From Mainframes to Client-Server Systems.

Padala, P., Zhu, X., Wang, Z., Singhal, S., and Shin, K. G.

(2007). Performance evaluation of virtualization tech-

nologies for server consolidation. HP Labs Tec. Re-

port.

etier, B., N

eri, V., and Cappello, F. (2007). Scalability

Comparison of Four Host Virtualization Tools. Jounal

on Grid Computing, 5(1):83–98.

Rosenblum, M. and Garﬁnkel, T. (2005). Virtual machine

monitors: current technology and future trends. Com-

puter, 38(5):39–47.

Salsburg, M. (2007). Beyond the Hypervisor Hype. In

CMG-CONFERENCE, volume 2, page 487.

Soltesz, S., P

otzl, H., Fiuczynski, M. E., Bavier, A., and

Peterson, L. (2007). Container-based operating sys-

tem virtualization: a scalable, high-performance al-

ternative to hypervisors. SIGOPS Oper. Syst. Rev.,

41(3):275–287.

Tickoo, O., Iyer, R., Illikkal, R., and Newell, D. (2009).

Modeling virtual machine performance: Challenges

and approaches. In HotMetrics.

VMware (2007). A performance comparison of hy-

pervisors. http://www.vmware.com/pdf/hypervisor

performance.pdf.

Westermann, D., Happe, J., Hauck, M., and Heupel, C.

(2010). The Performance Cockpit Approach: A

Framework for Systematic Performance Evaluations.

In SEAA’10.

EVALUATING AND MODELING VIRTUALIZATION PERFORMANCE OVERHEAD FOR CLOUD

ENVIRONMENTS

573