methodology based on the Taguchi method to
determine how closely the performance parameters
(base measures) involved in the performance
analysis process are related. The Taguchi method
combines industrial and statistical experience, and
offers a means for improving the quality of
manufactured products. It is based on the “robust
design” concept, popularized by Taguchi, according
to which a well designed product should cause no
problems when used under specified conditions
(Taguchi, Chowdhury et al. 2005). Although the
experiment presented in this paper was not
developed in a CC production system, the main
contribution of this work is to propose the Taguchi
method as a way to determine relationships between
performance parameters of CCA.
This paper is structured as follows. Section 2
presents background of concepts related to the
performance measurement of CCA and introduces
the MapReduce programming model, which is used
to develop CCA. In addition, section 2 presents the
PMFCC, which describes the key performance
concepts and sub concepts identified from
international standards. Section 3 presents the
method for examining the relationships among the
performance concepts identified in the PMFCC. In
this section, an experimental methodology based on
the Taguchi method of experimental design, is used
and offers a means for improving the quality of
product performance. Section 4 presents the results
of the experiment and analyzes the relationship
between the performance factors of CCA. Finally,
section 5 presents a synthesis of the results of this
research and suggests future work.
2 BACKGROUND
2.1 Performance Analysis in Cloud
Computing Applications
Researchers have studied the performance of CCA
from various viewpoints. For example, Jackson
(Jackson et al., 2010) analyzes high performance
computing applications on the Amazon Web
Services cloud, with the objective of examining the
performance of existing CC infrastructures and
creating a mechanism to quantitatively evaluate
them. His work is focused on the performance of
Amazon EC2 as a representative example of the
current mainstream of commercial CC services, and
its potential applicability to Cloud-based
environments in scientific computing environments.
He quantitatively examines the performance of a set
of benchmarks designed to represent a typical High
Performance Computing (HPC) workload running
on the Amazon EC2 platform. Timing results from
different application benchmarks are used to
compute a Sustained System Performance (SSP)
metric, which is a derived measure for measuring the
performance delivered by the workload of a
computing system. According to the National
Energy Research Scientific Computing Center
(NERSC) (Kramer et al., 2005), SSP is useful for
evaluating system performance across any time
frame, and can be applied to any set of systems, any
workload, and/or benchmark suite, and for any time
period. In addition, SSP measures time to solution
across different application areas, and can be used to
evaluate absolute performance and performance
relative to cost (in dollars, energy, or other value
propositions). In his work, Jackson shows that the
SSP metric has a strong correlation between the
percentage of time an application spends
communicating and its overall performance on EC2.
Also highlighted, the more communication there is,
the worse the performance became. Jackson
concludes that the communication pattern of an
application can have a significant impact on
performance.
Other researchers focus on applications in
virtualized Cloud environments. For instance, Mei
(Mei et al., 2010) studies the measurement and
analysis of the performance of network I/O
applications (network-intensive applications) in
these environments. The aim of his research is to
understand the performance impact of co-locating
applications in a virtualized Cloud, in terms of
throughput performance and resource sharing
effectiveness. Mei addresses issues related to
managing idle instances, which are processes
running in an operating system (OS) that are
executing idle loops. Results show that when two
identical I/O applications are running together,
schedulers can approximately guarantee that each
has its fair share of CPU slicing, network bandwidth
consumption, and resulting throughput. It also shows
that the duration of performance degradation
experienced is related to machine capacity, workload
level in the running domain, and the number of new
virtual machine (VM) instances to start up.
Although these publications present interesting
methods for performance measurement of CCA, the
approaches used were from an infrastructure
perspective and did not consider CCA performance
factors from a software engineering perspective.
This work bases the performance evaluation of CCA
on frameworks developed for data intensive
CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience
376