studies have represented availability and performance
attributes using analytical models and demonstrated
very accurate results in a time-efficient manner. (Kim
et al., 2009) presented availability models for virtual-
ized and non-virtualizedserversusing hierarchical an-
alytical models and demonstrated encouraging results
with the use of virtualization. Similar model is used
by Jhawar et al. (Jhawar and Piuri, 2012) to model
component failures at different levels in data centers,
correlation between failures, and impact boundaries.
Dynamic creation of replicas to deal with system
failures has been used before. For example, VMWare
High Availability (HA) (VMware, 2007) allow a vir-
tual machine on a failed host to be re-instantiated on
a new machine and (Pu et al., 1988) uses regenera-
tion of new data objects to account for reduction in
redundancy in the Google File System. The work
most relevant to the proposal in this paper is by Jung
et. al (Jung et al., 2010) (Jung et al., 2008) that ex-
amines how virtualization can be used to provide en-
hanced solutions to the classic problem of ensuring
high availability while maintaining performance of
multi-tier web services. Software components are re-
stored whenever failures occur and component place-
ment is managed using information about application
control flow and performance predictions. Our work
is different from the existing systems in the way we
handle system failures to create replicas and orthogo-
nally migrate them in various deployment levels in the
Cloud. Moreover, other approaches that generate new
configurations at runtime do not take into account the
placement constraints as we do in this paper.
The performance impact of resource allocation
on web applications has been studied in (Urgaonkar
et al., 2005), but it does not combine availability re-
quirements and regeneration of failed components.
Several works on dependability have highlighted the
necessity to trade-off between availability and perfor-
mance (Shin et al., 1989) (Sahai et al., 2002).
7 CONCLUSIONS
In this paper, we have highlighted that adaptive re-
source management is critical for fault tolerance of
applications in Cloud computing. We extended the
concept of fault tolerance policy management, em-
bedded in FTM (that provides fault tolerance as a ser-
vice), with an online controller to dynamically change
the replication levels and deployment configurations
in the event of system failures (e.g., server crashes and
security exploits resulting in the denial of service).
First, we formulated availability and performance of
applications using Markov chains and layered queu-
ing networks, and showed that the two attributes may
be competing with each other in a given configura-
tion. Then, using the models, we presented the online
controller that realizes a heuristics-based algorithm
to restore application’s requirements at runtime. Fi-
nally, we reported our simulation results and showed
that the online controller can significantly improvethe
availability and lower the degradation of system re-
sponse times compared to traditional static schemes.
Our future work will extend the models to a larger
scale and perform case studies on specific software
architectures in Cloud computing environments.
ACKNOWLEDGEMENTS
This work was supported in part by the Italian Min-
istry of Research within PRIN project ”GenData
2020” (2010RTFWBH), and by Google, under the
Google Research Award program.
REFERENCES
Buyya, R., Garg, S. K., and Calheiros, R. N. (2011). Sla-
oriented resource provisioning for cloud computing:
Challenges, architecture, and solutions. In Proc. of the
2011 International Conference on Cloud and Service
Computing, pages 1–10, Washington, DC, USA.
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson,
N., and Warfield, A. (2008). Remus: high availabil-
ity via asynchronous virtual machine replication. In
Proc. of the 5th USENIX Symposium on Networked
Systems Design and Implementation, pages 161–174,
San Francisco, California.
De Capitani di Vimercati, S., Foresti, S., and Samarati, P.
(2012). Managing and accessing data in the cloud:
Privacy risks and approaches. In Proc. of the 7th Inter-
national Conference on Risks and Security of Internet
and Systems, Cork, Ireland.
Franks, G., Al-Omari, T., Woodside, M., Das, O., and De-
risavi, S. (2009). Enhanced modeling and solution
of layered queueing networks. IEEE Transactions on
Software Engineering, 35(2):148–161.
Gilbert, S. and Lynch, N. (2002). Brewer’s conjecture
and the feasibility of Consistent, Available, Partition-
tolerant web services. SIGACT News, 33(2):51–59.
Gill, P., Jain, N., and Nagappan, N. (2011). Understanding
network failures in data centers: measurement, analy-
sis, and implications. ACM Computer Communication
Review, 41(4):350–361.
Hermenier, F., Lawall, J., Menaud, J.-M., and Muller, G.
(2011). Dynamic Consolidation of Highly Available
Web Applications. Technical Report RR-7545, IN-
RIA.
Jensen, P. A. (2011). Operations Research Models
and Methods – Markov Analysis Tools. Available at
www.me.utexas.edu/jensen/ormm/excel/markov.html.
AdaptiveResourceManagementforBalancingAvailabilityandPerformanceinCloudComputing
263