In this paper, we consider the problem of
determining the expected performance of a set of
instances before they are provisioned. The rest of
this paper is structured as follows: Section 2
provides background information on Cloud
structure, and in section 3 we discuss CPU model
distribution in a number of AZs as is relevant to the
discussions in this paper. In section 4 we describe
our AZ model, and Sections 5-7 discusses
simulations of VM offerings. In section 8 we
compare our simulated data with real data from
Amazon’s “us-west-1c” zone. In section 9 we
discuss assumptions made in our simulations and
how this relates to other scheduling algorithms one
could use. In section 10 we review related work, and
section 11 concludes and discusses future work.
2 CLOUD INFRASTRUCTURE
EC2, Google Compute Engine (GCE), HP Public
Cloud and Rackspace OpenCloud (AWS Global
Infrastructure, no date, Google Cloud Platform, no
date, HP Public Cloud, no date and Rackspace
Global Infrastructure, no date) structure their Clouds
into Regions and Zones; and all define these terms in
essentially the same manner: Regions are dispersed
into geographic areas consisting of Zones, and each
Zone is presented as an isolated location. Zones have
separate power and network connections and are tied
together with high speed interconnects.
Consequently, failure of one Zone should not, in
theory, disrupt operations in another. EC2 refer to
Zones as AZs and we will use these terms
interchangeably.
There exist some operational differences
between Regions/Zones in different Clouds. For
example, EC2 will not automatically replicate any
resources placed in one Region into another, whilst
GCE state that they make ‘no guarantee that project
data at rest is kept only in that Region’. However,
for the purposes of our present discussion these
operational differences are not important.
Clouds offer VMs in a range of sizes known as
instance types, with similar types grouped into
instance classes. Instances types within the same
class typically have the same ratio of quantity of
CPU cores to RAM.
Providers give their instance types a performance
rating, which expresses the compute power an
instance’s vCPUs should deliver, with a vCPU
rating multiplied by the number of vCPUs offered to
provide an overall rating for the type. On Amazon’s
Elastic Compute Cloud (EC2), this is called the EC2
Compute Unit (ECU), whilst Google Compute
Engine has the Google Compute Engine Unit
(GCEU). Such ratings should guarantee, if not
identical, then certainly very similar, levels of
performance of instances of the same type.
However, we have previously shown (Author1
and Author2, 2013) that m1.small instances on EC2
backed by a Xeon E5430 will run a bzip2
compression benchmark on average in 445s, whilst
instances backed by a Xeon E5507 will take on
average 621s to run the same benchmark. This is a
39% increase in time taken for instances backed by
Xeon E5507 compared to the Xeon E5430. These
instances are ‘rated’ the same and the user is charged
the same price for each, but a performance-based
pricing would clearly be expected to distinguish
these.
By a heterogeneous Cloud we mean one where
instances of the same type may run a variety of
underlying hardware (which we refer to as hosts). Of
the providers considered, EC2 is the only one which
acknowledges a heterogeneous infrastructure,
stating: ‘EC2 is built on commodity hardware, over
time there may be several different types of physical
hardware underlying EC2 instances’ (Amazon EC2
FAQs, no date). In such an environment
performance variation is to be expected. As Amazon
adds new Regions into EC2, and new Availability
Zones (AZ) into existing Regions performs
hardware refreshes, and as CPU models are retired
by manufacturers, heterogeneity is seemingly
inevitable. One would assume that, over time, any
successful Cloud provider is will end up with a
heterogeneous environment.
We have however found a stable association
between hardware, as identified by CPU model, and
instance classes. That is, an instance of a given class
only runs on particular set of hardware. For
example, First Generation Standard (FGS) instances
run on hardware with Intel Xeon E5430, E5-2650,
E5645 and E5507 CPUs. Over the 6 month period,
in which we have been running various performance
experiments on EC2 (April to September 2013),
these sets and associations have not changed.
We would of course expect change over time, for
the reasons discussed above. Indeed, in Ou, et al.
(2012), the authors find Intel Xeon E5430, E5645,
E5507 and AMD Opteron 270 backing FGS
instances. They note that the AMD model is present
less often in 2012 than compared to results they
obtain in 2011. We also find that the hardware
associated to different instance classes is distinct
from each other. We would assume that hardware
associated to different classes has either different
PerformancePredictionforUnseenVirtualMachines
71