EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL
MACHINE BASED SERVER ARCHITECTURES
Meng Yu
1
, Alex Hai Wang
2
, Wanyu Zang
1
and Peng Liu
2
1
Western Illinois University, IL, Macomb, U.S.A.
2
Pennsylvania State University, PA, University Park, U.S.A.
Keywords:
Security modeling, Survivability, Security architecture, Software security, Data center.
Abstract:
Virtual machine based services are becoming predominant in data centers or cloud computing since virtual
machines can provide strong isolation and better monitoring for security purposes. While there are many
promising security techniques based on virtual machines, it is not clear how significant the difference between
various system architectures can be in term of survivability.
In this paper, we analyze the survivability of three virtual machine based architectures load balancing ar-
chitecture, isolated service architecture, and BFT architecture. Both the survivability based on the availability
and the survivability under sustained attacks for each architecture are analyzed. Furthermore, the costs of each
architecture are compared. The results show that even if the same set of commercial off the shell (COTS) soft-
ware are used, the performance of various service architectures are largely different in surviving attacks. Our
results can be used as guidelines in the service architecture design when survivability to attacks is important.
1 INTRODUCTION
Virtual machine technology provides strong isolation
and better monitoring capability at the virtual ma-
chine monitor level. Once attacks happen, though it
is possible, it is hard for the attacker to break into the
virtual machine monitor to compromise other virtual
machines or avoid being monitored. Therefore, vir-
tual machine technology is widely used in cloud com-
puting and data centers as a preliminary approach in
various service architectures.
When using the same set of commercial off the
shell (COTS) software and the virtual machine tech-
nology, the service architectures can be very differ-
ent. Accordingly, the security characteristics of each
architecture will be different too. In this paper, we use
three virtual machine based architectures as examples
to evaluate such differences with regard to survivabil-
ity and costs. While we evaluate three specific archi-
tectures in the paper, our techniques used for evalua-
tion are general enough to be applied on other archi-
tectures.
There have been many techniqeus for evaluating
attacks or defense, such as attack graphs (Sawilla and
Ou, 2008), attack tree (Mauw and Oostdijk, 2005),
stochastic activity network (Sanders et al., 2001),
stochastic petri-Net (Marsan, 1990), reliability block
diagram (RBD) (Sahner et al., 1996b), queu-
ing networks and Continuous Time Markov Chain
(CTMC) (Tijms, 1994; Sahner et al., 1996a). The
aforementioned techniques have been used to evalu-
ate some server architectures, like web service archi-
tectures (Gokhale et al., 2006), or data flow software
architecture (Padilla et al., 2008) with regard to the
reliability, availability, and performance. Models that
can be used to evaluate the dependability and secu-
rity have been summarized in (Nicol et al., 2004).
However, survivability of virtual machine based ar-
chitectures have not been investigated yet, especially
the architecture with COTS service components.
The major contribution of the paper includes: 1)
evaluating the impact of various architecture designs
on both the static and dynamic survivability; 2) study-
ing several survivability related metrics in each archi-
tecture; and 3) comparing the costs of these architec-
tures to see how much we need to pay for a specific
survivability level. To our best knowledge, our work
is the first work to analyze the survivability, and per-
formance for the virtual machine based architectures.
The paper is organized as follows. We review re-
lated work in Section 2. In Section 3, we describe
three virtual machine based architectures to be eval-
uated. We analyze the survivability statically in Sec-
tion 4 and analyze the dynamic behaviors of each ar-
478
Yu M., Hai Wang A., Zang W. and Liu P. (2010).
EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL MACHINE BASED SERVER ARCHITECTURES.
In Proceedings of the International Conference on Security and Cryptography, pages 478-485
DOI: 10.5220/0002994604780485
Copyright
c
SciTePress
chitecture in Section 5. In Section 6, we compare the
costs of different architectures. We conclude the pa-
per in Section 7.
2 RELATED WORK
Data centers using virtual machine (VM) consolida-
tion are taking over old computer rooms run by in-
dividual companies. The reasons include space and
energy bills. According to a report from research firm
Gartner, millions of virtual machines have been or are
being deployed in data centers around the world, and
virtualization is becoming a dominant indispensable
technology for IT departments.
Due to resource and service consolidation, data
centers are becoming the “backbone” of the infras-
tructure for IT operations in companies, governments,
and military. Accordingly, two top requirements for
modern data centers are business continuity and in-
formation security. Although these two requirements
clearly show the importance of data center protection,
from the security viewpoint, consolidating services
and resources does not automatically consolidate the
corresponding security mechanisms. Without secu-
rity consolidation, the cost of protection can be much
higher that it should be, and more importantly, blindly
reusing the separate security equipment and tools as-
sociated with the services/resources being consoli-
dated could even “create” new security holes. Unfor-
tunately, current security consolidation has been lag-
ging behind service/resource consolidation in the data
center industry.
Replicated systems can provide better fault toler-
ance, or intrusion tolerance when attacks are unknown
attacks that can be modeled as Byzantine faults (Cas-
tro, 2001; Schneider, 1990; Bernstein et al., 1987;
Seguin et al., 1979; Jajodia and Mutchler, 1990;
Alvisi et al., 2001; Malkhi and Reiter, 1998; Castro,
2001). While the performance was a concern of these
approaches, recent advance of research leads to more
pratical performance, e.g., in Zyzzyva (Kotla et al.,
2007). the peak throughput achieved by Zyzzyva
is within 35% of that of an unreplicated server that
simply replies to client request over an authenticated
channel. With replications, compromised nodes can
be removed through a voting mechnism. Therefore,
the attacker have to compromise replicas more than a
thredhold number, usually more than
n1
3
replicas,
of a replicated system with n replicas to disable the
system. However, an attacker can easily compromise
all replicas through a common vulnerability shared
by all replicas, especially when all replicas are run-
ning homoenous environments, e.g., the same operat-
Application
Virtual Machine Monitor (VMM)
Load Balancer
OS 1
DBMS
Web server
server
Application
OS 2
DBMS
Web server
server
Application
OS i
DBMS
Web server
server
Application
OS n
DBMS
Web server
server
Figure 1: Load Balanced Server Architecture (LBSA).
ing systems or web servers.
The basic idea of diverse replication using Virtual
Machines has been described in (Chun et al., 2008).
The combination of diversification and replication is
capable of defeating unknown attacks with practi-
cal costs. Proper configured diverse replication will
be immune to attacks based on single vulnerability
and can remove compromised node even if less then
n1
3
nodes are compromised. However, the method
has not been implemented or evaluated with respect
to the effectiveness or performance.
Current evaluation techniqeus for attacks or de-
fense, such as aforementioned attack graphs (Saw-
illa and Ou, 2008), attack tree (Mauw and Oost-
dijk, 2005), queuing networks and Continuous Time
Markov Chain (CTMC) (Tijms, 1994; Sahner et al.,
1996a), and so on, have ben used to evaluate some
server architectures. Models that can be used to evalu-
ate dependability and security have been summarized
in (Nicol et al., 2004). However, survivability of vir-
tual machine based architectures have not been inves-
tigated yet, especially the architecture with COTS ser-
vice components.
3 THREE VIRTUAL MACHINE
BASED ARCHITECTURES
In this section, we describe three virtual machine
based service architectures - Load Balanced Server
Architecture (LBSA), Isolated Component based
Server Architecture (ICSA), and Byzantine Fault Tol-
erant (Castro and Liskov, 1999) Server Architecture
(BFTSA).
The LBSA architecture is shown in Figure 1. This
is the most popular architecture to provide high avail-
ability services. In this architecture, a set of services
are installed on each virtual machine of a virtual ma-
chine cluster. The users’ requests are directed to alive
virtual machines with load balancing. In a secure con-
figuration, the virtual machine environments are di-
versified, which guarantees that a single instance of
EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL MACHINE BASED SERVER
ARCHITECTURES
479
OS
Virtual Machine Monitor (VMM)
DBMS
server
Application
Web server
OS OS
Figure 2: Isolated Component based Server Architecture
(ICSA).
OS i
Virtual Machine Monitor (VMM)
BFT Protocol
OS 1
DBMS
Web server
server
Application
OS 2
DBMS
Web server
server
Application
DBMS
Web server
server
Application
OS n
DBMS
Web server
server
Application
Figure 3: Byzantine Fault Tolerant Server Architecture
(BFTSA).
attack cannot compromise all virtual machines in the
cluster. Moreover, even if one virtual machine is com-
promised, the load balancer can detect the inconsis-
tency, give up or reboot the compromised virtual ma-
chine with a different diversified configuration. Since
virtual machines provide good isolation, we assume
that a single instance of attack cannot compromise all
virtual machines of the cluster.
The ICSA architecture is shown in Figure 2. In
this architecture, each service of a service architec-
ture is installed on a virtual machine. The number of
virtual machines is determined by the number of ser-
vices. In this figure, we have three services thus we
have three virtual machines. Once a service is com-
promised, it is confined in its virtual machine and can
be recovered through a reboot of the service. The vir-
tual machine isolation mechanism reduces the num-
ber of reboots in the situation of attacks.
The BFT architecture is shown in Figure 3. A
Byzantine Fault Tolerant (BFT) algorithm (Castro and
Liskov, 1999) is required for this architecture to han-
dle users’ requests. Once a user’s request is received,
the BFT algorithm goes through three phases: pre-
prepare, prepare, and commit to remove replies from
nodes with arbitrary faults. The BFT algorithm can
tolerate
n1
3
compromised servers.
Note that in all architectures, virtual machines
should be configured with different diversification,
which ensures that no more than one virtual machine
can be compromised by one instance of attack.
4 EVALUATING SURVIVABILITY
BASED ON STATIC ANALYSIS
4.1 Basic Assumptions
In this paper, we assume that the defense mechanisms
used in the virtual machines are based on diversifica-
tion (Chun et al., 2008). Thus, a single instance of
attack cannot compromise more than one virtual ma-
chine in any architecture no matter what technique the
attacker uses, such as buffer overflow, stack overflow,
format string, and etc. However, each instance of at-
tack can have a positive probability to compromise a
virtual machine, thus any component inside the vir-
tual machine. Furthermore, attacks can include finite
steps on the target, so virtual machines in an archi-
tecture may be compromised one by one until all are
compromised but not in a single step.
We also assume that the defense mechanisms have
an automated recovery procedure included. Once a
virtual machine is detected as compromised, the re-
covery mechanism can recover the virtual machine
with a different diversification through reboot or
micro-reboot.
We define the survivability of a system under spe-
cific condition as follows.
Definition 1 (Survivability). We define the surviv-
ability of a system under a specific condition as the
probability that the system can meet the following two
requirements.
Availability. The system can provide replies to all re-
quests, and
Service Integrity. The replies meet the functional
specification of the system and all service com-
ponents of the system are functional.
For example, a web service architecture needs three
level of servers: web server, application server, and
DBMS server. When a system is under attacks, as
long as the system can provide replies (availability)
and the replies are generated by the three levels of
servers correctly (service integrity), we say that the
system can survive the attacks. Otherwise, the whole
system is compromised.
4.2 Analytical Results
The LBSA architecture is good at providing avail-
ability. A system under such architecture will pro-
vide services unless all virtual machines are crashed.
However, the service integrity is another story. If any
virtual machine is compromised by the attacker, the
service integrity is compromised (note that we are
SECRYPT 2010 - International Conference on Security and Cryptography
480
discussing attacks instead of “fail-stop” failures), be-
cause the load balancer cannot tell a virtual machine
demonstrating arbitrary faults from other normal vir-
tual machines. Therefore, a single compromised vir-
tual machine leads to a compromised cluster.
Under the ICSA architecture, once a virtual ma-
chine is compromised, the installed service will be
gone. Thus, both the availability and service integrity
are compromised. As the results, the whole system is
compromised.
For both LBSA and ICSA architectures, assume
that a system has n virtual machines. Note that for
ICSA, n is usually the same as the number of services
unless multiple services are installed on the same vir-
tual machine. If the virtual machines are not diversi-
fied, the probability breaking into the system through
a single vulnerability, denoted by P
b
, will be the same
as the probability breaking into a single replica, de-
noted by P
r
, because the vulnerability is shared by
all replicas. Thus, the survivability of the system
P
s
= 1 P
b
= 1 P
r
. With disjoint diversification in a
space S, where no attack can compromise more than
one variation at a time and S contains all possible vari-
ations of the diversification, P
s
can be calculated by
Equation 1.
P
s
= 1
m
i=1
m
i
(P
r
)
i
(1 P
r
)
mi
= 1
m
i=1
m
i
(
n
|S|
)
i
(1
n
|S|
)
mi
(1)
where n |S| (n nodes cannot have more than |S|
variations), and m > 1 (we need to compromise at
least 1 replicas) is the total number of intrusion at-
tempts.
With k independent diversification approaches,
e.g., diversified API, memory randomization, etc.,
the diversification space will be S
1
, S
2
, . . . , S
k
respec-
tively. Thus, for a sequence of m intrusion attempts,
P
s
= 1
m
i=1
m
i
(
n
k
j=1
|S
j
|
)
i
(1
n
k
j=1
|S
j
|
)
mi
(2)
Under the BFTSA architecture, we assume a
BFT (Castro, 2001; Kotla et al., 2007) server group
with n replicated virtual machines. Because BFT
protocol can tolerate
n1
3
compromised virtual ma-
chines, a system will be compromised if more than
n1
3
virtual machines in the cluster are compro-
mised. Therefore, the survivability can be calculated
by Equation 3.
P
s
= 1
m
i=
n1
3
+1
m
i
(P
r
)
i
(1 P
r
)
mi
= 1
m
i=
n1
3
+1
m
i
(
n
|S|
)
i
(1
n
|S|
)
mi
(3)
where n |S| (n nodes cannot have more than |S|
variations), and m >
n1
3
(we need to compromise
at least
n1
3
+ 1 replicas) is the total number of in-
trusion attempts.
Similarly, with k independent diversification ap-
proaches, for a sequence of m intrusion attempts,
P
s
= 1
m
i=
n1
3
+1
m
i
(
n
k
j=1
|S
j
|
)
i
(1
n
k
j=1
|S
j
|
)
mi
(4)
where n
k
j=1
|S
j
|, and m >
n1
3
.
In the above discussion, we assumed the ideal sit-
uation where 1) the success of one intrusion attempt is
independent of other attempts; 2) any combination of
diversification techniques is valid, and 3) BFT repli-
cation is used. Note that randomization techniques
make the probability breaking into a system to be in-
dependent between attacks in a sequence. However,
the following challenges may significantly change the
form of Equation 4. a) Due to the resource limit,
full BFT replication may be not feasible in a heavily
loaded system, where the threshold,
n1
3
, to break
into the system will be different. b) The probability
of success will become accumulative, conditional, or
others, in a sequence of attacks if the diversification
is not randomized for each attempt of attack. c) The
characteristics of the attacks, e.g., steps included in
the attack, may change the probability of success be-
tween attempts to be conditional. d) A random com-
bination of different diversification techniques does
not automatically lead to a valid and independent de-
fenses, which will change
k
j=1
|S
j
| in Equation 4.
For an example, a simple combination of two different
stack randomization techniques may lead to incorrect
stack frames. Thus, there will be many combination
in
k
j=1
S
j
not valid for deployment. As the result, our
defense space will actually be smaller. Furthermore,
when the space of a diversification technique is very
small, multiple replicas may need to share the same
variation, which further weakens the capability of de-
fense.
When all diversification are valid, the comparison
of survivability is shown in Figure 4. In the figure,
BFTSA shows better survivability and the survivabil-
ity gets better when increasing the total number of
nodes.
An interesting thing is that the survivability of
BFTSA is not monotonic, which is due to the charac-
teristics of BFT protocol (Castro and Liskov, 1999).
The BFT protocol can tolerate
n1
3
compromised
virtual machines in a cluster with n virtual machines.
In the example, when we have 7, 8, or 9 virtual
EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL MACHINE BASED SERVER
ARCHITECTURES
481
0
0.2
0.4
0.6
0.8
1
4 5 6 7 8 9 10 11 12 13
Survivability
The number of virtual machines in the system
LBSA and ICSA
BFTSA
Figure 4: Static analysis of survivability with m = 5 and
|S| = 50.
machines in the cluster, the cluster can tolerate the
same number,
n1
3
= 2, of compromised virtual ma-
chines. However, increased number of virtual ma-
chines in the cluster also increases the probability of
successful attack in one attempt. Therefore, if the in-
creasing numberof virtual machines does not increase
n1
3
, the survivability decreases.
5 EVALUATING SURVIVABILITY
UNDER SUSTAINED ATTACKS
In this section, we analyze the behaviors of each ar-
chitecture under sustained attacks.
5.1 State Transition Diagrams
The state transition diagrams of LBSA and ICSA are
shown in Figure 5. In the figure, the attacker com-
promises the virtual machines one by one with rate λ.
The system can recover virtual machines or services
in the virtual machine (through reboot, micro-reboot,
or other approaches) with rate µ. A state G
i
, where
0 i n, indicates that the system has i virtual ma-
chines not compromised and n i virtual machines
compromised. In LBSA or ICSA, once a virtual ma-
chine is compromised, the service integrity is gone.
Thus, the only normal state is G
n
where no virtual
machine is compromised.
The state transition diagrams of BFTSA is shown
in Figure 6. In the figure, the state and state transi-
tion rates are the same as the one of LBSA and ICSA.
However, since BFTSA has a BFT protocol to elimi-
nate compromised nodes of the cluster, it can tolerate
n1
3
compromised virtual machines. Therefore, the
normal states are {G
n
, G
n1
, . . . , G
n−⌊
n1
3
}.
5.2 The Continuous Time Markov
Chain (CTMC) Model
We assume that λ and µ in Figure 5 and Figure 6
meet the Poisson distribution. Based on the as-
sumptions about parameters and the above discus-
sion, the state transition of our model becomes a finite
states Continuous-Time Markov Chain (CTMC) (Ti-
jms, 1994; Sahner et al., 1996a) that can be character-
ized by a state transition matrix Q = (q
i, j
) and initial
state probability vector π(0), where q
i, j
is the transi-
tion rate from i to j and q
i,i
=
j6=i
q
i, j
.
The state transition matrix of Figure 5 and Fig-
ure 6 is as follows.
Q =
G
0
G
1
G
2
. . . G
n1
G
n
G
0
µ µ 0 . . . 0 0
G
1
λ λ µ µ . . . 0 0
.
.
.
.
.
. . . .
.
.
.
G
n
0 0 0 . . . λ λ
(5)
The initial state of system is at G
n
, Thus
π(0) = (0, 0, . . . , 1)
| {z }
n
(6)
With the CTMC model, both steady state and tran-
sient state can be calculated. The steady state of a
system is the state that all features of the system do
not change any more after running a long period of
time. It may not exist at all for a specific CTMC.
Fortunately, most real systems do have their steady
states. Once a n by n generator matrix Q is given,
the steady-state probability vector π is determined by
Equation 7.
πQ = 0,
1in
π
i
= 1. (7)
The comparison of steady states of different ar-
chitectures is shown in Figure 7. In the figure, X-
axes shows the recovery rate µ and the compromise
rate λ has a fixed value 1. While µ increases from 1
to 10, all architectures demonstrate increasing surviv-
ability. However, BFTSA shows much better surviv-
ability and it is very sensitive to the increment of µ.
The survivability of BFTSA reaches values close to 1
much earlier than LBSA and ICSA. Furthermore, the
survivability of BFTSA is very close to 1 after µ is
greater than 2.
The impact of attack rate λ is shown in Figure 8.
In the figure, we keep the recovery rate µ at the same
value 5. When we increase the attack rate λ from 1 to
20, the survivability of all architectures decrease. The
BFTSA architecture demonstrates better responses to
the increasing attack rates.
SECRYPT 2010 - International Conference on Security and Cryptography
482
Normal
Compromised
λ
µ
λ
λ
µ µ
µ
λ
G
n
G
0
G
n2
G
1
G
n1
Figure 5: The state transition diagram of LBSA and ICSA.
Normal
Compromised
µ
λ
µ
λ
µ
λ
λ
λ
µ µ
G
Figure 6: The state transition diagram of BFTSA.





vability
The recovery rate
LBSA and ICSA
BFTSA
Figure 7: Comparison of steady state survivability while
λ = 1 and n = 10.




5 10 15 20
Survivability
The attack rate
LBSA and ICSA
BFTSA
Figure 8: Comparison of steady state survivability while
µ = 5 and n = 10.
The CTMC model can also tell the transient be-
haviors of a system. A transient state of a system is
the state of the system at a specific moment. The eval-
uation of transient states shows how quickly a system
goes to its steady state, and how much time is spent
on each state. A system may satisfy us with its steady
states but disappoint us with its transient behaviors,
e.g. taking too long to enter the steady state. Tran-
sient behaviors also tell us what may happen if a sys-
tem suffers a short term of high attacking rate.
Given a generator matrix Q and initial state proba-
bility vector π(0), transit state probability π(t) at time
t is determined by Equation 8.
d
dt
π(t) = π(t)Q (8)
In Figure 9, we show the transient survivability
of LBSA and ICSA. In the figure, when time in-
0.75
0.8
0.85
0.9
0.95
1
0 1 2 3 4 5 6 7 8
Survivability
Time
Figure 9: The transient behaviors of LBSA and ICSA while
λ = 1, µ = 4, and n = 10.
creases from 0 to 8, the survivability decreases fastly
to around 0.75. In other words, we can see that the
system enters the steady state very quick.
The transient behaviors of BFTSA is shown in
Figure 10. According to the figure, while the steady
EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL MACHINE BASED SERVER
ARCHITECTURES
483
0.999997
0.999997
0.999998
0.999998
0.999999
0.999999
1
0 1 2 3 4

Figure 10: The transient behaviors of BFTSA while λ = 1,
µ = 4, and n = 10. Y-axes is survivability.
state survivability is higher than LBSA and ICSA, the
system enters the steady state slower.
Given the state transition matrix, when needed,
the cumulative time l(t) spent on each state at time
t is given by Equation 9.
d
dt
l(t) = l(t)Q + π(0) (9)
An example of accumulative time spent on the “nor-
mal” state of the LBSA architecture is shown in Fig-
ure 11.
0

1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3 3.5 4
Accumulative time at the normal states
Time
Figure 11: The accumulative time spent on the normal state
of LBSA while λ = 1, µ = 5, and n = 10.
6 EVALUATING THE COSTS
In this section, we discuss the costs of each architec-
ture considering the processing costs, memory costs,
and communication costs.
Table 1: Evaluation metrics comparison of three architec-
tures.
Metrics LBSA ICSA BFTSA
Processing costs n 1 n
Memory costs n 1 n
Communication costs O(1) O(1) O(n
3
2
)
Intrusion tolerance 0 0
n1
3
Fail-safe fault tolerance n1 0
n1
3
Both LBSA and BFTSA need diversified replica-
tions. When we replicate the services in virtual ma-
chines, the processing costs and memory needed by
replication will be n times more than the needs of
ICSA.
When we consider the communication costs, for
each request, ICSA does not duplicate and forward
the request. Similarly, while the load balancer in
LBSA forwards the request to lightly loaded virtual
machine, it does not duplicate the request to other
virtual machines either. However, BFTSA requires
a more complex communication protocol. The BFT
protocol (Castro and Liskov, 1999) goes three phases
and in each phase, one or all nodes in the cluster needs
to broadcast to others. The communication complex-
ity of BFT is O(n
3
2
) (Castro and Liskov, 1999).
We summarize the costs and the capability of in-
trusion tolerance or fault tolerance in Table 1. In the
table, intrusion tolerance indicates how many com-
promised nodes can be tolerated by a specific archi-
tecture. The fail-safe fault tolerance indicates how
many fail-safe nodes can be tolerated, where fail-safe
nodes do not demonstrate arbitrary behaviors as a
compromised node does under attacks.
7 CONCLUSIONS
In this paper, we compared the survivability and costs
of three virtual machine based architectures. Our
studies show that even with the same COTS software,
a different architecture can have significant impact
on the survivability of the whole system. According
to our analytical results, replicated architecture with
BFT protocol and diversification is much better than
simple replication and isolation with regard to surviv-
ability. However, the costs of BFT protocol is high.
The analytical methods described in this paper, such
as the static analysis and the dynamic CTMC based
analysis, can be used to analyze the survivability of
other architectures. The results of this paper can also
be used as guidelines in architecture design when sur-
vivability is crucial to the system.
SECRYPT 2010 - International Conference on Security and Cryptography
484
ACKNOWLEDGEMENTS
We thank Dr. Yan Yang for the discussions of some
contents of the paper. We also thank all anonymous
reviews comments which helped us to greatly im-
prove the quality of the paper. Meng Yu was sup-
ported by NSF grant CNS-0905153. Peng Liu was
supported by NSF CNS-0905131, AFOSR FA9550-
07-1-0527 (MURI), and ARO MURI: Computer-
aided Human Centric Cyber Situation Awareness.
REFERENCES
Alvisi, L., Malkhi, D., Pierce, E., and Reiter, M. K. (2001).
Fault detection for byzantine quorum systems. IEEE
Transactions on Parallel and Distributed Systems,
12(9):996–1007.
Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987).
Concurrency Control and Recovery in Database Sys-
tems. Addison-Wesley, Reading, MA.
Castro, M. (2001). Practical Byzantine Fault Tolerance.
PhD thesis, Department of Electrical Engineering and
Computer Science, Massachusetts Institute of Tech-
nology. Also as Technical Report MIT/LCS/TR-817.
Castro, M. and Liskov, B. (1999). Practical byzantine fault
tolerance. In The Third Symposium on Operating Sys-
tems Design and Implementation (OSDI 99), pages
173–186, New Orleans, USA.
Chun, B.-G., Maniatis, P., and Shenker, S. (2008). Diverse
replication for single-machine byzantine-fault toler-
ance. In ATC’08: USENIX 2008 Annual Technical
Conference on Annual Technical Conference, pages
287–292, Berkeley, CA, USA. USENIX Association.
Gokhale, S. S., Vandal, P. J., and Lu, J. (2006). Perfor-
mance and reliability analysis ofweb server software
architectures. In PRDC ’06: Proceedings of the 12th
Pacific Rim International Symposium on Dependable
Computing, pages 351–358, Washington, DC, USA.
IEEE Computer Society.
Jajodia, S. and Mutchler, D. (1990). Dynamic voting algo-
rithms for maintaining the consistency of a replicated
database. ACM Trans. Database Syst., 15(2):230–280.
Kotla, R., Alvisi, L., Dahlin, M., Clement, A., and Wong,
E. (2007). Zyzzyva: speculative byzantine fault toler-
ance. SIGOPS Oper. Syst. Rev., 41(6):45–58.
Malkhi, D. and Reiter, M. (1998). Byzantine quorum sys-
tem. Distributed Computing, 11(4):203–213.
Marsan, M. A. (1990). Stochastic Petri nets: an elementary
introduction, pages 1–29. Springer-Verlag New York,
Inc., New York, NY, USA.
Mauw, S. and Oostdijk, M. (2005). Foundations of at-
tack trees. In International Conference on Information
Security and Cryptology ICISC 2005. LNCS 3935,
pages 186–198. Springer.
Nicol, D. M., Sanders, W. H., and Trivedi, K. S. (2004).
Model-based evaluation: From dependability to secu-
rity. IEEE Transactions on Dependable and Secure
Computing, 1(1):48–65.
Padilla, G., Gao, T., Yen, I.-L., Bastani, F., and de Oca,
C. M. (2008). An early reliability assessment model
for data-flow software architectures. Mexican Inter-
national Conference on Computer Science, 0:9–19.
Sahner, R. A., Trivedi, K. S., and Puliafito, A. (1996a). Per-
formance and Reliability Analysis of Computer Sys-
tems. Kluwer Academic Publishers, Norwell, Mas-
sachusetts, USA.
Sahner, R. A., Trivedi, K. S., and Puliafito, A. (1996b).
Performance and reliability analysis of computer sys-
tems: an example-based approach using the SHARPE
software package. Kluwer Academic Publishers, Nor-
well, MA, USA.
Sanders, W. H., S, W. H., and Meyer, J. F. (2001). Stochastic
activity networks: Formal definitions and concepts.
Sawilla, R. E. and Ou, X. (2008). Identifying critical at-
tack assets in dependency attack graphs. In ESORICS
’08: Proceedings of the 13th European Symposium on
Research in Computer Security, pages 18–34, Berlin,
Heidelberg. Springer-Verlag.
Schneider, F. B. (1990). Implementing fault tolerant ser-
vices using the state machine approach: A tutorial.
ACM Computing Surveys, 22(4).
Seguin, J., Sergeant, G., and Wilms, P. (1979). A major-
ity consensus algorithm for the consistency of dupli-
cated and distributed information. In IEEE Interna-
tional Conference on Distributed Computing Systems,
pages 617–624, New York.
Tijms, H. C. (1994). Stochastic Models. Wiley series in
probability and mathematical statistics. John Wiley &
Son, New York, NY, USA.
EVALUATING SURVIVABILITY AND COSTS OF THREE VIRTUAL MACHINE BASED SERVER
ARCHITECTURES
485