Improving Cloud Survivability through Dependency
based Virtual Machine Placement
Min Li
1
, Yulong Zhang
1
, Kun Bai
2
, Wanyu Zang
1
, Meng Yu
1
and Xubin He
3
1
Computer Science, Virginia Commonwealth University, Richmond, U.S.A.
2
IBM T.J. Watson Research Center, Cambridge, U.S.A.
3
Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, U.S.A.
Keywords:
Cloud Computing, Virtual Machine Placement, Security, Survivability.
Abstract:
Cloud computing is becoming more and more popular in computing infrastructure and it also introduces
new security problems. For example, a physical server shared by many virtual machines can be taken over
by an attacker if the virtual machine monitor is compromised through one of the virtual machines. Thus,
collocating with vulnerable virtual machines, or “bad neighbours”, on the same physical server introduces
additional security risks. Moreover, the connections between virtual machines, such as the network connection
between a web server and its back end database server, are natural paths of attacks. Therefore, both virtual
machine placement and connections among virtual machines in the cloud have great impact over the overall
security of cloud. In this paper, we quantify the security risks of cloud environments based on virtual machine
vulnerabilities and placement schemes. Based on our security evaluation, we develop techniques to generate
virtual machine placement that can minimize the security risks considering the connections among virtual
machines. According to the experimental results, our approach can greatly improve the survivability of most
virtual machines and the whole cloud. The computing costs and deployment costs of our techniques are also
practical.
1 INTRODUCTION
Cloud computing is becoming predominant in com-
puting infrastructure since it provides the flexibil-
ity and cost-effectiveness hardly found in traditional
computing platforms. The key technique of cloud
computing is resource sharing and dynamic resource
allocation of the cloud. In an Infrastructure as a Ser-
vice (IaaS) cloud like Amazon EC2, multiple virtual
machines (VMs) share a physical server. Thus, the
security of VMs is dependent not only on how secure
the Operating System and applications they are run-
ning, but also the security of Virtual Machine Monitor
(VMM, or hypervisor), running below the VMs.
There are many attacks developed against cloud
environments. In this paper, we are interested in two
types of attacks since they are related to how VMs
are placed in a cloud. Type I attacks, such as (CVE-
2007-4993, 2007; CVE-2007-5497, 2007), exploit
the vulnerabilities of hypervisors, eg., Xen and KVM.
Once succeed, attackers can compromise the physi-
cal server running the hypervisor and all VMs run-
ning above the hypervisor. Alternatively, Type II at-
S
0
S
1
S
2
S
3
Start
Traditional
Attack
Compromise hypervisor
(Type I attack)
Side Channal Attack
(Type II Attack)
Analyze other VMs
above the hypervisor
Select a target server
through dependencies
Traditional
Attack
Figure 1: The State Transition Graph of Attacks. S
0
: One
VM compromised on a new server. S
1
: Hypervisor com-
promised. S
2
: Dependency information collected. S
3
: New
target server selected.
tacks compromise other VMs on the same physical
server through mounting side channel attacks (Ris-
tenpart et al., 2009; Hlavacs et al., 2011), instead of
compromising the hypervisor.
As shown in Figure 1, attackers can utilize both
types of attacks to compromise as many VMs as pos-
sible in the cloud. In the first step, the attacker can
compromise one guest VM (Dom U) or the manage-
321
Li M., Zhang Y., Bai K., Zang W., Yu M. and He X..
Improving Cloud Survivability through Dependency based Virtual Machine Placement.
DOI: 10.5220/0004076003210326
In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2012), pages 321-326
ISBN: 978-989-8565-24-2
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
ment VM (Dom 0) through the vulnerabilities in op-
erating system or services.Consequently, the attacker
can launch side channel attacks to collect information
of other VMs on the same physical server. The in-
formation can also be collected using type I attacks,
as shown in Figure 1. Because network connections,
e.g., from a web server to a back end database server,
may leak security information such as authentica-
tion information on the database server, the attacks
can cause cascading effects, or domino effects in the
cloud. Also, the connections among VMs become
paths of attacks,
For example, the attack Hey, you get out of my
cloud” (HYG) (Ristenpart et al., 2009) is one kind of
type II attack. The initial stage of the HYG attack is
to locate the target VM. Once success is achieved, the
attacker will try to launch a VM on the same physical
server. It is a placement based attack and the success
of the attack depends on the placement strategies of
the cloud, or the configuration policy of the cloud.
In this paper, we present an approach of virtual
machine placement based on the security evaluation
of VMs and the dependencies among them. Our tech-
nique not only increases the survivability of cloud but
also is compatible with performance requirements. In
security evaluation, we use Discrete Time Markov
Chain (DTMC) to analyze the possibility of being
compromised for each VM. In performance evalua-
tion, we calculate both migration cost and the dis-
tance of placement. Based on performance and se-
curity analysis, we generate a new VM placement.
To the best of our knowledge, this paper is the
first effort to develop the following mechanisms and
techniques to enhance cloud security through chang-
ing cloud placement. The contributions of this paper
are summarised as below:
We propose a systematic approach to evaluating
security risks of a cloud placement plan.
Based on our security evaluation, we propose an
innovative and practical approach to generate a
safe placement plan.
Our placement generation is compatible with per-
formance requirements since it is also based on
dependencies among VMs.
The experimental results confirm that our place-
ment plan can significantly improve the surviv-
ability of VMs in cloud.
In section 2, we review related work about VM
placement and indicate our unique contributions. Sec-
tion 3 defines the attack model and describes our goals
of placement. In Section 4, we explain the architec-
ture of our implementation, dependency based VM
placement strategy, and other implementation details.
In Section 5, we show how well our technique per-
form over real data.
2 RELATED WORK
A good VM placement plan can immensely improve
the performance. For example, (Sindelar et al., 2011)
demonstrated that memory share based placement can
save the memory resources of a server. In their pro-
posed placement scheme, VMs with the most shared
pages are collocated in the same physical server.
To improve efficiency, (Yusoh and Tang, 2010)
developed a generic algorithm to create a place-
ment plan to reduce Estimated Total Execution Time
(ETET). Work (Lucas Simarro et al., 2011) provided
a scheduling model to optimize virtual cluster place-
ment through cloud offers. The cloud prices and user
demand have been considered in the model. The ex-
perimental results on the real data show that dynamic
placement plan can bring more benefits on reducing
users’ costs than the fixed one.
Unfortunately, none of the above work considered
the security issue. Our previous work (Zhang et al.,
2012) proposed to periodically migrating VMs based
on game theory, making it much harder for adver-
saries to locate the target VMs in terms of survivabil-
ity measurement. However, our previous work did
not discuss how to evaluate the security of a cloud
placement and how to generate a placement plan to
improve the cloud security. This paper proposes an
innovative and effective placement strategy based on
dependency relations among VMs.
3 SYSTEM OVERVIEW
In this section, we describe our basic assumptions and
the goals of VM placement.
3.1 Characteristics
An example of virtual machine placement is shown
in Figure 2. The most important component is Vir-
tual Machine and Node. Each virtual machine runs
different services and some of the VMs are depen-
dent on others. Node represents a physical machine
which runs a few to many VMs, given the limit of
hardware resource. In Figure 2, Node 1 has three
VMs while Node 2 holds four VMs. Besides, Cloud
Provider has necessary privileges to scan vulnerabil-
ities of VMs and obtain information of network con-
nections among VMs.
SECRYPT2012-InternationalConferenceonSecurityandCryptography
322
VM
4
VM
5
Node 2Node 1
VM
2
VM
6
VM
7
VM
1
VM
3
Figure 2: Cloud Placement Example.
In addition, We assume the following about an at-
tacker.
1. The attacker can exploit the vulnerability of a hy-
pervisor or a VM.
2. The attacker follows the state transition graph in
Figure 1 to compromise VMs step by step.
3. The attacker always chooses the easiest target to
compromise in each step, in terms of the vulnera-
bilities in VMs and the attacker’s skills.
4. The attacker has no global view of the cloud at the
beginning of the attacks. However, the attacker
may acquire more knowledge after compromising
more nodes in the cloud.
Since the success of attacks highly depends on the
placement strategy of cloud. Our purpose is to present
systematic solution which reduces the security risks
for both Type I and Type II attacks while not sacrific-
ing performance.
We can defeat Type I attacks because our mech-
anism will change the VM placement strategy after
a specific time. Hence, if an adversary plan to com-
promise a specified VM through compromising hy-
pervisor and the whole node. The VM can survive
if it can be migrated to other node before an attacker
compromise the node. In addition, we can also resist
Type II attacks because we try to assign dependent
VMs in the same node. In this situation, it is difficult
for the adversary to compromise other nodes. Hence,
we increase the survivability of VMs on other node.
In this work, we try to both reduce the number of
compromised VMs and increase the survivability
of services.
To verify the improvement of survivability of ser-
vice, we define the survivability of service. Service is
accomplished by a set of connected VMs, which are
defined as VMs have data transmission.Next, we pro-
vide the definition of compromised possibility of VM
that the possibility of being compromised for a given
VM at specific associate attack step. According to the
definition, we can give a theorem about how to evalu-
ate the survivability of service.
Security Evaluation
Exploitable Possibility
Markov Chain Analysis
Generate Placement Strategy
Performance Evaluation
Migration Cost
Dependency Exploration
Figure 3: Architecture.
Theorem 1 (Survivability of a Service). Given
a service S
i
(VM chain) including some related
S
i
= {VM
a
, VM
b
, ..., VM
n
} and node set N =
{N
1
, N
2
, ..., N
m
}, If the survivability in specific attack
step for the Nodes which hold the VM belong to S
i
is {PN
1
, PN
2
, ..., PN
m
}, Then survivability (PS) for
service S
i
is below:
PS
i
=
m
j=1
PN
j
. (1)
Now we change the question from how to evaluate
a service to the one how to evaluate a node. Then, cal-
culating the node survivability in Equation 1 becomes
the critical problem. To solve the problem, we define
the survivability of a node.
Theorem 2 (Survivability of a Node). Given a node
N and a set of VMs = {VM
a
, VM
b
, ...VM
m
} which lo-
cate at node N, and the compromised probability for
these VMs are {P
a
, P
b
, ..., P
m
}, the survivability (PN)
for Node N is below:
PN
N
=
m
j=1
(1 P
j
) (2)
Because we assume if an adversary takes over a VM,
the physical node will highly possibly be compro-
mised. In other words, survivability possibility of a
physical node is possibility that all owned VMs sur-
vive in this attack, so we can obtain equation 2. Sim-
ilarly, survivability possibility of service is possibil-
ity that all VMs which constitute this service can sur-
vive in this attack. In addition, because we assume if
the node is compromised, all VMs on this node can’t
work, the survivability possibility of VM is equal to
survivability possibility of physical node, so we can
acquire Equation 1.
4 IMPLEMENTATION
The structure of our implementation is shown in Fig-
ure 3. In order to improve the overall security level
ImprovingCloudSurvivabilitythroughDependencybasedVirtualMachinePlacement
323
while not sacrificing performance, our design in-
cludes three components: security evaluation, strat-
egy generation, and performance evaluation. Peri-
odically, the cloud provider changes VM placement
to defend against placement based Attacks. First, a
dependency exploration mechanism identifies the ser-
vice dependencyrelations among VMs. In our design,
we identify the dependency relations through network
connections. Since the operating system maintains
the information about network port and IP address,
we collect the network information through the oper-
ating system. In the following, we score each VM’s
security level according to the National Vulnerabil-
ity Database (NVD). Afterwards, we map the vulner-
abilities to the possibility of compromise and lever-
age Discrete Time Markov Chain Analysis (DTMC)
to predict the possibility of successful attacks in each
attack step. Finally, we design an algorithm to create
the placement plan. Integrated with migration cost
and migration time analysis, we can conclude the fi-
nal placement solution.
4.1 Security Evaluation
The Security Evaluation component consists of ex-
ploitable possibility assessment and Markov chain
analysis. First, we explore dependency relations
among all VMs and construct a graph based on the
dependencies. Second, we scan each VM against the
National Vulnerability Database to generate a esti-
mate value in terms of the VM’s security level. Third,
we use Markov chain to predict the possibility of be-
ing attacked for each VM.
4.1.1 Dependency Exploration
The dependency relations among VMs is the basis
of security evaluation. There are already many re-
search on how to discover dependency relations be-
tween VMs. For example, LWT (Apte et al., 2010)
identifies the cross-domain dependencies by CPU uti-
lization. The authors claim that there will be the
same spike in the CPU utilization of dependent VMs.
Hence how to identify the dependency of VMs is out
of the scope of our work. In this paper, we sim-
ply use network topological structure information like
IP address and network port numbers generated by
netstat to identify dependency relations. After all
dependency relations are obtained, we can construct
the VM Dependency Graphs like the one shown in
Figure 2.
VM
1
VM
2
VM
3
VM
4
VM
5
VM
6
VM
7
0.5
0.5
0.7
0.3
0.8
0.2
1
0.1
0.9
1
NODE1
NODE2
(a) Step One.
VM
1
VM
2
VM
3
VM
4
VM
5
VM
6
VM
7
0.5
0.5
0.7
0.8
0.2
1
0.9
1
NODE1
NODE2
(b) Step Two.
Figure 4: An example based on Markov Chain Analysis.
4.1.2 Exploitable Vulnerability
In order to quantify the exploitable vulnerability,
we use the Common Vulnerability Scoring Sys-
tem(CVSS) (CVSS, 2012). CVSS includes three met-
rics group: base, temporal, and environment. Each of
the metrics represents different characters of vulnera-
bilities.
Now we have the quantified vulnerability for each
VM. We need to map the quantified vulnerability
to the possibility of compromise for each connected
VM. Here we use a linear mapping function. Given
a VM VM
a
and the vulnerability score of VMs con-
nected with VM
a
is {V
1
, V
2
, ..., V
n
}, the possibility of
compromise for VM
a
, denoted by P
a
, is given by
Equation 3.
P
a
=
V
a
n
k=1
V
k
(3)
The linear mapping Function 3 is very simple,
however, more complex mapping functions can be
used and the following discussions will remain the
same.
4.1.3 Markov Chain Analysis
We run Markov Chain analysis over an Attack Depen-
dency Graph which is defined as follows.
Definition 1. Attack Dependency Graph. Given the
Dependency Attack Graph GhV, Ei of an attack and
two virtual machines v
1
and v
2
, where V is the set of
virtual machine, and E is the set of transitions among
attack, if there exists an edge e E from v
1
to v
2
,
then v
2
is attack dependent on v
1
, denoted by v
1
v
2
.
Besides, the possibility of transition is determined by
Equation 3
All virtual machines in the cloud and attack de-
pendencies among them can be represented by one or
a set of attack dependencygraph(s) (ADG). Besides if
we associate a probability to each edge, an ADG can
SECRYPT2012-InternationalConferenceonSecurityandCryptography
324
be modelled by a Discrete Time Markov Chain (Sah-
ner et al., 1997). Given n nodes in the ADG, the initial
probability distribution on each node (that a particular
node is compromised) is
π
(0) = (1, 0, 0, . . . , 0
|
{z }
n1
). After
the k
th
step, the probability that the attacker can reach
other nodes in the ADG can be calculated by Equa-
tion 4.
π
(n) =
π
(0)P
n
(4)
where P is the state-transition probability matrix
of DTMC and P = P · P ···P
|
{z }
n
. P is given by {a
ij
},
where a
ij
is the probability associated to edge v
i
v
j
.
The initial P can be determined by attacker’s first
choice. If attacker can compromise v
j
from v
i
based
on compromised possibility which has been scored
by CVSS and mapped by equation 3, then we assign
a
ij
as this score. In order to eliminate loop in ADG,
we will remove the compromised node and its edges.
Here, to simplify our work, we just convert the exam-
ple (Figure 2) to ADG format and assume that
1
m
as
the probability to each transition to the successor.if v
i
has m successors. However, it’s not hard to assign the
initial P using CVSS (CVSS, 2012).
In Figure 2, we assume that the first compromised
VM should be VM
2
, so
π
(0) = {0 1 0 0 0 0 0}. Hence
we may obtain the ADG for step one and step two in
Figure 4. Finally attack possibility for 6 attack step
from above assumption.
π
(0)
π
(1)
π
(2)
π
(3)
π
(4)
π
(5)
=
0 1 0 0 0 0 0
0 0 0.2 0 0 0.8 0
0 0.26 0 0.56 0 0 0.18
0.56 0 0.26 0 0 0.18 0
0 0.026 0 0 0 0 0.414
0 0 0.026 0 0 0.414 0
(5)
where
π
(k), 0 k 5 is the probability distribution in
step k. In the above example, according to
π
(4), in the
4
th
step, the probability that VM
2
is compromised will
be 2.6% and the probability that VM
7
is compromised
will be 41.4%.
4.2 Placement Generation
From the above discussion, we have gained the pos-
sibility of being attacked for each VM. Therefore, we
design an algorithm to generate placement plans. The
core part of the algorithm is to separate VMs with
high risks from VMs with low risks. The algorithm
tries to assign VM with connections to same node.
In the first part of the algorithm, we separate VMs
from others. The VMs with high risks, called danger-
ous VMs are identified by DTMC analysis. In DTMC
analysis, at a specific step, if the probability of being
compromised of a VM is larger than zero, the VM has
high security risk in this step. A VM compromised in
earlier steps is considered at higher risk level than the
ones compromised at later steps. To satisfy our goal
functions, we sort VM set in descending order of at-
tack possibility. Then, we assign each node a VM
from the first one of VM set. Then we choose the
node with minimal attack possibility to hold the rest
of VMs. If the current node is full, we choose next
node until VM set is empty. To minimize the num-
ber of migratedVM, our placement plan will consider
the previous plan. For example, if one VM belongs to
the physical node without dangerous VMs in preced-
ing placement and for new placement plan, this VM is
still put into a node without dangerous VMs. Then we
will not migrate this VM because migration this kind
of VM will not increase VM security but decrease the
migration performance.
4.3 Performance Evaluation
In this section, we discuss the overhead of our place-
ment mechanism. We will look at the following types
of costs: the computing costs of placement, the mi-
gration time of a VM, and the total number of migra-
tion needed to achieve a new placement, or migration
path.
4.3.1 Computing Costs
The computing costs include the costs of DTMC and
the algorithm. The algorithm complexity is polyno-
mial. The cost of DTMC, denoted by C
DTMC
, is de-
fined as the following.
Definition 2 (Cost of DTMC). Given a series of K
Attack Steps and the cost for step i is PDTMC
i
, the
performance cost for this series of attack steps is
C
DTMC
=
K
i=1
PDTMC
i
(6)
According to our experiments, the costs of matrix
multiplication for each attack step in DTMC is poly-
nomial time complexity is PDTMC
i
= i
3
. Our exper-
iment result of DTMC shows calculation complex-
ity for a cloud with 2048 VMs in term of 7 steps,
C
DTMC
=
7
i=1
PDTMC
i
= 38.36s. In general, regard-
ing with a cloud with N VMs in term of M steps, the
total calculation for DTMC is
M
i=1
PDTMC
N
i
.
4.3.2 Migration Time and Migration Path
When migrating a VM, the VM is usually shut off
first, hence, migration time is one of the most signif-
icant factor we should consider in order to improve
ImprovingCloudSurvivabilitythroughDependencybasedVirtualMachinePlacement
325
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
Elapsed time
Delay(ms)
967ms
Figure 5: Migration impact on response delay of web server.
As illustrated in the graph, the web service downtime due
to migration is 967ms.
0 10 20 30 40 50 60 70 80
0
0.2
0.4
0.6
0.8
1
1.2
Service#
Survivability possibility
Random Placement
New Placement
Figure 6: Comparison of survivability.
the system performance. Figure 5 presents a migra-
tion delay of Web server on our platform.
The cost on migration path happens when a new
placement plan is deployed. The cost may differ if
the VMs are migrated in a different order because im-
migrated VM should wait until the target node has
enough space. In addition, we should choose sus-
pended VMs to migrate first because migrating sus-
pended VMs will not cause performance loss. There-
fore, we should try to choose a migration path with
minimum costs. Calculation of the optimal migration
path is out of the scope of this paper due to limit of
space.
5 EXPERIMENTAL RESULTS
We apply our placement algorithm to the data set to
generate placement plans. The data set includes 81
VMs on 10 node. The capacity for 10 nodes are
20,15,10,10,10,5,5,5,5,5. Based on the data set, we
generated a random placement plan and optimize the
placement using our algorithm. We compared the new
placement plan with the random one to investigate the
improvement of security levels.
According to our experimental results shown in
Figure 6, 91.3% services obtained improved surviv-
ability. The maximum survivability enhancement is
74.28% and the average improvement of survivability
possibility is 27.15%. Our results also show reduced
number of compromised VMs.
ACKNOWLEDGEMENTS
This work was supported in part by NSF Grants CNS-
1100221 and CNS-0905153.
REFERENCES
Apte, R., Hu, L., Schwan, K., and Ghosh, A. (2010). Look
who’s talking: discovering dependencies between vir-
tual machines using cpu utilization. In Proceedings
of the 2nd USENIX conference on Hot topics in cloud
computing, HotCloud’10, pages 17–17, Berkeley, CA,
USA. USENIX Association.
CVE-2007-4993 (2007). Cve-2007-4993: Xen guest
root can escape to domain 0 through pygrub.
http://cve.mitre.org/cgibin/cvename.cgi?name=CVE-
2007-4993, 2007.
CVE-2007-5497 (2007). Cve-2007-5497: Vul-
nerability in xenserver could result in privi-
lege escalation and arbitrary code execution.
http://support.citrix.com/article/CTX118766, 2007.
CVSS (2012). Common vulnerability scoring system.
http://www.first.org/cvss/cvss-guide.
Hlavacs, H., Treutner, T., Gelas, J., Lefevre, L., and Orgerie,
A. (2011). Energy consumption side-channel attack
at virtual machines in a cloud. In Dependable, Au-
tonomic and Secure Computing (DASC), 2011 IEEE
Ninth International Conference on, pages 605 –612.
Lucas Simarro, J., Moreno-Vozmediano, R., Montero, R.,
and Llorente, I. (2011). Dynamic placement of virtual
machines for cost optimization in multi-cloud envi-
ronments. In High Performance Computing and Sim-
ulation (HPCS), 2011 International Conference on,
pages 1 –7.
Ristenpart, T., Tromer, E., Shacham, H., and Savage, S.
(2009). Hey, you, get off of my cloud: exploring infor-
mation leakage in third-party compute clouds. In Pro-
ceedings of the 16th ACM conference on Computer
and communications security, CCS ’09, pages 199–
212, New York, NY, USA. ACM.
Sahner, R., Trivedi, K., and Puliafito, A. (1997). Perfor-
mance and reliability analysis of computer systems
(an example-based approach using the sharpe soft-
ware. Reliability, IEEE Transactions on, 46(3):441.
Sindelar, M., Sitaraman, R. K., and Shenoy, P. (2011).
Sharing-aware algorithms for virtual machine colo-
cation. In Proceedings of the 23rd ACM symposium
on Parallelism in algorithms and architectures, SPAA
’11, pages 367–378, New York, NY, USA. ACM.
Yusoh, Z. and Tang, M. (2010). A penalty-based genetic al-
gorithm for the composite saas placement problem in
the cloud. In Evolutionary Computation (CEC), 2010
IEEE Congress on, pages 1 –8.
Zhang, Y., Li, M. L., Bai, K., Yu, M., Zang, W., and He, X.
(4-6 June 2012). Incentive compatible moving target
defense against vm-colocation attacks in clouds. In
IFIP International Information Security and Privacy
Conference 2012.
SECRYPT2012-InternationalConferenceonSecurityandCryptography
326