security and privacy, and monitoring, as well as
machine-readability. In addition, they describe use
of SLAs in relation to Cloud brokers. Machine-
readability will be important in supporting large
numbers of enquiries in a fast moving market, and
two frameworks address machine-readable SLA
specification and monitoring: Web Service
Agreement (WS-Agreement) developed by Open
Grid Forum (OGF) and described by Andrix et al.
(2007), and Web Service Level Agreement (WSLA),
introduced by IBM in 2003. Through WS-
Agreement, an entity can construct an SLA as a
machine-readable and formal contract. The content
specified by WS-Agreement builds on SOA (driven
by agreement), where the service requirement can be
achieved dynamically. In addition, the Service
Negotiation and Acquisition Protocol (SNAP,
Czajkowski et al., 2002) defines a message exchange
protocol between end-user and provider for
negotiating an SLA. It supports the following:
resource acquisition, task submission and
task/resource binding.
Related work in the AssessGrid project (Kerstin
et. al, 2007) uses WS-Agreement in the negotiation
of contracts between entities. This relies on the
creation of a probability of failure (PoF), which
influences price and penalty (liability). The end-user
compares SLA offers and chooses providers from a
ranked list: the end-user has to evaluate the
combination and balance between price, penalty and
PoF. Such an approach gears readily towards Cloud
Brokerage, in which multiple providers are
contracted by the Broker, and it is up to the Broker
to evaluate and factor in the PoF to the SLA. It is
possible, then, that such a Broker may make a range
of different offers which appear to have the same
composition by providers but will vary because of
actual performance and the PoF. For PoF, we may
consider partial and complete failure – where partial
may be a factor of underperformance of one or more
resources, or complete failure of some resources,
within a portfolio of such resources.
As well as PoF and liability, a machine-readable
SLA should also address, at least, service
availability, performance and autonomics:
Service Availability. This denotes responsiveness to
user requests. In most cases, it is represented as a
ratio of the expected service uptime to downtime
during a specific period. It usually appears as a
number of nines - five 9s refers to 99.999%
availability, meaning that the system or service is
expected to be unresponsive for less than 6 minutes
a year. An AppNeta study on the State of Cloud
Based Services, available as one of their white
papers, found that of the 40 largest Cloud providers
the suggested average Cloud service availability in
2010 was 99.948%, equivalent to 273 minutes of
downtime per year. Google (99.9% monthly) and
Azure (99.9%, 99.95% monthly) reportedly failed to
meet their overall SLA, while AWS EC2 (99.95%
yearly) met their SLA but S3 (99.9% monthly) fell
below. However, we do not consider availability to
be the same as performance, which could vary
substantially whilst availability is maintained – put
another way, contactable but impossibly slow.
Performance. According to a survey from IDC in
2009 (Gens, 2009), the performance of a service is
the third major concern following security and
availability. Websites such as CloudSleuth and
CloudHarmony offer some information about
performance of various aspects of Cloud provisions,
however there appear to be just one or two data
samples per benchmark per provider, and so in-
depth performance information is not available, and
further elements of performance such as
provisioning, booting, upgrading, and so on, are not
offered. Performance consideration is vital since it
becomes possible to pay for expected higher
performance yet receive lower – and not to know
this unless performance is being accurately
monitored.
Autonomics. A Broker’s system may need to adapt
to changes in the setup of the underlying provider
resources in order to continue to satisfy the SLA.
Maintenance and recovery are just two aspects of
such autonomics such that partial or complete failure
is recoverable with a smaller liability than would be
possible otherwise. Large numbers of machine-
readable SLAs will necessitate an autonomic
approach in order to optimize utilization – and
therefore profitability.
Since PoF should be grounded in Performance,
and should offer a better basis for presenting Service
Availability, in the remainder of this paper we focus
primarily on performance. Performance variability
of virtualized hardware will have an impact on
Cloud applications, and we posit that the stated cost
of the resource, typically focussed on by others in
relation to Cloud Economics, is but a distraction –
the performance for that cost is of greater
importance and has greater variability: lower
performance at the same cost is undesirable but
cannot as yet be assured against. To measure
performance variability, we investigate a small set of
benchmarks that allow us to compare performance
both within Cloud instances of a few Cloud
Infrastructure providers, and across them. The
results should offer room to reconsider price
CLOSER2012-2ndInternationalConferenceonCloudComputingandServicesScience
624