Towards Combining Reactive and Proactive Cloud Elasticity on Running

HPC Applications

Vinicius Facco Rodrigues

, Rodrigo da Rosa Righi

, Cristiano Andr

e da Costa

, Dhananjay Singh

Victor Mendez Munoz

and Victor Chang

Applied Computing Graduate Program, Universidade do Vale do Rio dos Sinos (UNISINOS), Brazil

Hankuk Univeristy of Foreign Studies (HUFS), Republic of Korea

Autonomous University of Barcelona, Barcelona, Spain

Xi’an Jiaotong Liverpool University, Suzhou, China

Keywords:

Cloud Utility, High-performance Computing, Live Thresholding, Resource Management, Self-organizing.

Abstract:

The elasticity feature of cloud computing has been proved as pertinent for parallel applications, since users do

not need to take care about the best choice for the number of processes/resources beforehand. To accomplish

this, the most common approaches use threshold-based reactive elasticity or time-consuming proactive elas-

ticity. However, both present at least one problem related to: the need of a previous user experience, lack on

handling load peaks, completion of parameters or design for a speciﬁc infrastructure and workload setting. In

this regard, we developed a hybrid elasticity service for parallel applications named SelfElastic. As parameter-

less model, SelfElastic presents a closed control loop elasticity architecture that adapts at runtime the values

of lower and upper thresholds. Besides presenting SelfElastic, our purpose is to provide a comparison with

our previous work on reactive elasticity called AutoElastic. The results present the SelfElastic’s lightweight

feature, besides highlighting its performance competitiveness in terms of application time and cost metrics.

1 INTRODUCTION

Commonly, HPC applications are executed either on

clusters or grid architectures. Maintaining these en-

vironments in terms of infrastructure, scheduling, and

energy consumption may turn it an expensive solu-

tion (Niu et al., 2013). In the HPC view point, a

shared characteristic of such environments regards the

ﬁxed number of resources to run an application. Due

this limitation, deciding the right amount of processes

to execute an HPC application can be a difﬁcult pro-

cedure. Conversely, cloud computing has been gai-

ning attention in this context thanks to its resource

reorganization facility named elasticity (Herbst et al.,

2015), which The act of deciding the right amount

of cloud computing resources for a parallel applica-

tion is a nontrivial task and may lead to either under-

provisioning or over-provisioning (Nikravesh et al.,

2015; Dustdar et al., 2015). Today, most of the elasti-

city control strategies can be classiﬁed as either being

reactive or proactive (Farokhi et al., 2015; Nikravesh

et al., 2015; Moore et al., 2013). For the ﬁrst case,

typically users deﬁne an upper bound t

and a lo-

wer bound t

in an ad-hoc manner on a target per-

formance metric to trigger, respectively, the activa-

tion and deactivation of a certain number of resour-

ces (Netto et al., 2014). On the other hand, a pro-

active approach employs prediction techniques to an-

ticipate the behavior of the system (its load) and the-

reby decide the reconﬁguration actions. The afore-

mentioned requirements are not trivial and someti-

mes is needed a deep knowledge about the behavior

of the system over time (Dustdar et al., 2015; Jams-

hidi et al., 2014). In this context, we have proposed

in previous work a model named AutoElastic (Righi

et al., 2015a; Righi et al., 2015b; Righi et al., 2016)

which addresses reactive elasticity to reorganize re-

sources for loop-based synchronous parallel applica-

tions. Although achieving remarkable performance

gains, AutoElastic remains suffering the main pro-

blems of reactive elasticity approaches: deﬁnition of

thresholds and reactivity. In this context, this arti-

cle presents a new elasticity model called SefElas-

tic, which offers automatic threshold conﬁguration.

SelfElastic presents the following contributions to the

state-of-the-art when considering the HPC applicati-

ons and cloud elasticity duet: (i) a modeling of clo-

Facco Rodrigues, V., Righi, R., André da Costa, C., Singh, D., Munoz, V. and Chang, V.

Towards Combining Reactive and Proactive Cloud Elasticity on Running HPC Applications.

DOI: 10.5220/0006761302610268

In Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security (IoTBDS 2018), pages 261-268

ISBN: 978-989-758-296-7

261

sed control-theoretic (Ghanbari et al., 2011) infra-

structure to support the hybrid elasticity behavior on

parallel cloud-based applications; and (ii) based on

the TCP (Transmission Control Protocol) congestion

control, we propose an algorithm named Live Thres-

holding (LT) to handle application load projection and

lower and upper thresholds adaptivity.

2 RELATED WORK

Today, we can cite basically two main types of appli-

cation workloads that could take proﬁt from elasticity

in the cloud (Ghanbari et al., 2011): (i) transactio-

nal (Moore et al., 2013; Nikolov et al., 2014); and (ii)

batch (e.g., text mining, video transcoding, graphical

rendering and parallel applications) (Niu et al., 2013;

Righi et al., 2015a). The applications in the ﬁrst case

are built to serve online HTTP clients, being com-

monly deployed on commercial systems like Amazon

AWS, RightScale and Microsoft Azure using reactive

elasticity (Nikravesh et al., 2015). Users must com-

plete the rules and the limits of a metric to be mo-

nitored as well as the conditions and actions for re-

conﬁguration. Besides graphical and command-line

tools, these commercial systems also provide a parti-

cular API for resource provisioning and monitoring.

Reactive elasticity is explored in two scenarios: (i)

when using the standard technique with static thres-

holds (Dustdar et al., 2015; Righi et al., 2015a; Righi

et al., 2015b; Righi et al., 2016; Galante and Bona,

2015); (ii) when using other techniques to runtime

adapt the threshold values (Netto et al., 2014). In

both scenarios, there are at least a lower (t

) and an

upper (t

) threshold that guide horizontal or vertical

elasticity. It is unison among the authors that the per-

formance of the threshold-based technique is highly

dependent on the selected parameters, even in the se-

cond scenario (Farokhi et al., 2015). In addition to

performance, energy consumption and cost metrics

are also important both at user and cloud administra-

tor perspectives (Righi et al., 2016). Other problems

are related to reactiveness to trigger elasticity actions

and oscillations on VM allocations.

3 SelfElastic MODEL

We developed SelfElastic with the following design

decisions in mind: (i) parameterless, not needing

to write elasticity rules, conditions or thresholds at

user/programmer perspective; (ii) easy-to-use elas-

ticity service, being provided in a plug-and-play

fashion; (iii) without needing any prior information

about the application components/phases and without

needing previous executions to generate metadata;

(iv) lightweight, so not being prohibitive for time-

sensitive HPC applications; (v) easy integration with

the parallel application, so the processes can be reor-

ganized easily and quickly in the presence of a drop

or addition of resources.

3.1 Closed Feedback-Loop Architecture

Aiming at providing a proactive feature, we desig-

ned SelfElastic as a closed feedback-loop architec-

ture (Ghanbari et al., 2011), involving two main

components: the SelfElastic Manager and the cloud,

which is our target system. As illustrated in Figure 1,

we have a loop in which the monitoring metrics serve

to optimize and predict internal parameters, so trigge-

ring or not elasticity actions to support the application

historical behavior.

P P

Node 0

Node m-1

Cloud

front-end

Data

Shared

Area

Physical Layer

Virtual Layer

c-1

n-1

(m-1) x c

...

P P

Application Layer

Cloud

Legend

SelfElastic Manager

Actuator: Elasticity

Actions

Controller: Live

Thresholding and

System Load

Sensor:

Monitoring

Switch

Application Process

Virtual Machine

...

Figure 1: SelfElastic architecture, with two main compo-

nents: SelfElastic Manager and a cloud-based parallel ap-

plication (our target system). At the cloud perspective, c

denotes the number of cores inside a node, m is the num-

ber of nodes and n refers to the number of VMs running

application processes, being obtained by c × m.

Our cloud model considers a front-end that acts as

a cloud manager to instantiate, deallocate and monitor

VMs in the Virtual Layer and a set of homogeneous

nodes in the Physical Layer. In addition, the front-end

also accounts for answering requests (including the

three previous procedures) done with the cloud API

by the SelfElastic Manager. Regarding the Applica-

tion Layer, there is a collection of processes which

are instantiated through application-speciﬁc VM tem-

plates. Each VM is assigned with an template and au-

tomatically starts an application process. Depending

on the application, different VM templates could start

processes with distinct functions.

3.2 Deﬁning the Notion of System Load

The sensor module of SelfElastic Manager monitors

CPU load of each VM periodically, passing data to

the controller afterward. In turn, the controller apply

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

262

algorithms to deﬁne load and threshold values. If re-

source reorganization is necessary, the actuator pro-

ceeds elasticity actions using the cloud API. When

completing tasks of all modules, the SelfElastic Ma-

nager ends a monitoring observation and waits for

the next monitoring cycle. One role of the control-

ler is to generate the system load (l) in order to mi-

nimize the effect of disturbances or noises on the be-

havior of the target system. Thus, we are working

with time-series and SES (Simple Exponential Smoo-

thing (Herbst et al., 2013)) technique over the CPU

load metric of each VM. Equation 1 presents l(o) as

the system load at the o

monitoring observation con-

sidering n active VMs. This equation is an arithme-

tic average of the load on each VM, which is com-

puted through l

(v, o). Here, v is a VM index, o is

the current monitoring observation and n the number

of VMs running application processes (see Equation

2). l

consists in a SES average, where the weight

of the current observation o has a stronger inﬂuence

than o−1 in the ﬁnal calculus (starting from

, we are

using

an so on for the weights). The recurrence

ends in the cpu(v, o) computation, which returns the

CPU load of VM v at observation o.

l(o) =

∑

n−1

v=0

(v, o)

(1)

(v, o) =







cpu(v,o)

i f o = 0

(v,o−1)

cpu(v,o)

i f o 6= 0

(2)

3.3 Live Thresholding Technique

The Manager is responsible for retrieving a vector

of CPU load from all VMs running slave proces-

ses. More precisely, the sensor module of the Ma-

nager periodically queries the cloud front-end to cap-

ture such data. The mentioned vector is used to

compute the system load detailed in Subsection 3.2.

Instead of using static thresholds, SelfElastic propo-

ses the dynamic adaption of the lower (t

) and upper

) thresholds, which are initiated with the values 0

and 100, respectively. We named this novel techni-

que as Live Thresholding (LT), which considers the

deﬁnition of two procedures: adapt thresholds() and

reset thresholds(). The former is computed at each

monitoring observation, while the second is called

only when an elasticity action takes place.

adaptT hresholds() has three parameters: t

, t

(both input/output) and load (only input). Firstly, we

compute the system load variation considering both

current and previous monitoring observations (refer-

red by the indexes o and o − 1, respectively). This

value is assigned to ∆l (Equation 3), where function

l() was deﬁned earlier in Subsection 3.2. ∆l decides

which threshold will be updated: (i) if ∆l is negative,

we are experiencing a decreasing load behavior so t

is recalculated to handle this situation quickly; (ii) if

∆l is positive, the application workload is growing up

so t

is updated to address this situation; (iii) if ∆l is

equal to 0, threshold adaptations do not occur. Equa-

tions 4 and 5 present how new values of thresholds

are computed. Contemplating that t

decreases when

updated, it has a lower bound equal to 0. On the other

side, an upper bound of 100 is used when computing

the new value of t

∆l = l(o)− l(o − 1) (3)

= Min(t

+ |∆l|, 100) (4)

= Max(t

− ∆l, 0) (5)

An initial thought to design the resetT hresholds()

procedure, which has the same set of parameters pre-

sented in adaptT hresholds(), is to reassign these de-

fault values at each elasticity action. In our understan-

ding, this threshold resetting strategy may not be the

best for elasticity reactivity, since we are putting away

all historical data stored in the SelfElastic Manager.

Aiming at proposing new forms to reset thresholds,

we analyzed the TCP congestion algorithm (Bing

et al., 2009). In the TCP protocol, after exceeding

a threshold, the window value is incremented linearly

by the maximum segment at each burst. So, at each

timeout, this threshold is set to half of the current con-

gestion window, and the congestion window is reset

to one maximum segment. Thus, we have investiga-

ted 6 approaches A

|z ∈ {a, b, c, d, e, f }) to address

threshold adaptivity after an elasticity action:

• When violating t

we can apply A

, A

or A

accordance with Equation 6 to compute the new

value for t

, while t

is redeﬁned to 100;

• When violating t

we can apply A

, A

or A

accordance with Equation 7 to compute the new

value of t

, while t

is restarted as 0.











0 f or A

l (o)

f or A

l(o − 1) −



l (o−1)−l(o)



f or A

(6)











100 f or A

l(o) +

100−l(o)

f or A

l(o − 1) +



l (o−1)−l(o)



f or A

(7)

SelfElastic always uses a ﬁxed combination of one

approach when violating t

and another for t

. This re-

sults in a notation named LT

, where x (x is A

, A

Towards Combining Reactive and Proactive Cloud Elasticity on Running HPC Applications

263

) and y (y is A

, A

or A

) refer to a particular possi-

bility for the lower and upper thresholds, respectively.

and A

simply reset the thresholds to the same va-

lues that they were initialized when starting the moni-

toring. A

and A

use the system load after an elasti-

city action to redesign the thresholds, while A

and A

achieve them considering the system load before and

after delivering/consolidating resources. SelfElastic

is a parameterless model, so the possibility to choose

elasticity approaches does not ﬁt our previous design

decision. In this way, we conducted experiments with

all possibilities of LT

over eight load patterns consi-

dered in the evaluation methodology.

4 EVALUATION

METHODOLOGY

We developed a master-slave HPC iterative appli-

cation that computes the numerical integration of a

function f (x) in a closed interval [a, b]. The appli-

cation presents a master process that works in an ex-

ternal loop, where it reads a line from a ﬁle that de-

ﬁnes the current workload for such an iteration and

the number of them. Figure 2 presents the eight lo-

ads patterns. To evaluate all LT strategies we ﬁr-

stly executed all combinations of LT

with the ap-

plication running the load patterns Constant, Ascen-

ding, Descending and Wave. From this evaluation we

analyzed the best choice for LT to be considered in

the next experiments. All eight loads were executed

in three different scenarios: (s1) without cloud elas-

ticity; (s2) enabling self-organizing elasticity mana-

gement through the functioning of the LT technique;

(s3) traditional elasticity approach using static thres-

holds. While SelfElastic is employed to accomplish

the second scenario, our previous work named Au-

toElastic (Righi et al., 2015a) is adopted to address

the third one. Contrary to AutoElastic, here we are

using 4 combinations of thresholds: t

30% and 50%;

and t

70% and 90%. Additionaly, our evaluation ana-

lyzes the load patterns and the scenarios against three

metrics: time, resource and cost.

5 EVALUATION

In this section, we ﬁrstly present in Subsection 5.1

an analysis of the cloud and behavior with elasticity

guided by all LT ideas. Then, in Subsection 5.2 we

focus on evaluating the best strategy for LT . In Sub-

section 5.3 we analyze the application performance.

0 500

10000

1000000

equation

subintervals

(s)

(a)

0 500

10000

1000000

equation

subintervals

(s)

(b)

0 500

10000

1000000

equation

subintervals

(s)

(c)

0 500

10000

1000000

equation

subintervals

(s)

(d)

0 500

10000

8000000

equation

subintervals

(s)

(e)

0 500

10000

8000000

equation

subintervals

(s)

(f)

0 500

10000

1000000

equation

subintervals

(s)

(g)

0 500

10000

1000000

equation

subintervals

(s)

(h)

Figure 2: Eight workload patterns considered in the tests:

(a) Constant; (b) Ascending; (c) Descending; (d) Wave; (e)

Positive Exponential; (f) Negative Exponential; (g) Partial

Random; (h)Total Random.

5.1 Analyzing Behavior of the LT

Technique

Figure 3 shows all application executions with the

Constant load. In this load, violations in the lower

threshold (t

) did not occur. The tiny variations in the

load bring similar adaptions in both thresholds. Ho-

wever, t

is always violated resulting in addition of

resources since the load always range the same va-

lues and it is nearer the upper threshold (t

). The ﬁ-

gures (b), (e) and (h) present the common use of the

approach. In these executions, when an elasticity

action was performed, t

was recalculated to a new

value close to the load. It resulted in new violations

faster than the other strategies.

The Figure 4 presents LT when executing the As-

cending load pattern. As occurred in the Constant

load, here the t

was not violated since the load has a

growing trend. In this way, ﬁgures (b), (c), (e), (f), (h)

and (i) where strategies recalculated the t

to values

near the load, resulted in higher resource consumption

and lower execution times. Particularly, executions

with the A

approach achieved up to 12 VMs resulting

in faster executions. However, even though in ﬁgure

(h) the maximum of resources was 10 VMs, this exe-

cution achieved the best result considering time. It

happened because resources were added faster in the

beginning of the execution when comparing with the

other approaches. So, with more resources availa-

ble earlier, the application ended without needing two

more extra resources.

Figure 5 present the behavior of the cloud with the

application running with the Descending load pattern.

This load has an opposite trend when comparing with

the Ascending load. In this way, differently from the

loads Constant and Ascending, here t

has impact in

elasticity actions. In addition, as the load started in a

high level and decreased slowly, it violated the t

in all

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

264

Upper Threshold

Lower Threshold

Cloud Load Virtual Machines

load

virtual machines

time (seconds)

Axis legends

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(a) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(b) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

a f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(d) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(e) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(f) LT

b f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(g) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(h) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(i) LT

c f

Figure 3: Historical behavior of cloud parameters and resources when running the application with the Constant load.

Upper Threshold

Lower Threshold

Cloud Load Virtual Machines

load

virtual machines

time (seconds)

Axis legends

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(a) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(b) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

a f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(d) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(e) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(f) LT

b f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(g) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(h) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(i) LT

c f

Figure 4: Historical behavior of cloud parameters and resources when running the application with the Ascending load.

executions. The LT technique is sensible to small va-

riations in the load, thus t

decreased hitting the load

and it resulted in addition of resources. The ﬁrst half

of execution impacted in performance more than the

ﬁnal part. Resources were added to the cloud in this

phase where the load were in high levels. When it

started do decrease, in all executions the t

were vio-

lated when the application was near the end. This do

not had great impact in performance because the time

the application executed with a set new with less re-

sources were to small.

Finally, Figure 6 presents the executions with all

LT combinations running the Wave load pattern. In

this scenario, both t

and t

were violated resulting in

elasticity actions. Figures (a), (b) and (c) present si-

milar behaviors and the common use of the strategy

. The variation of the strategy to recalculate t

cau-

sed variations only in the time when new resources

were added. In scenarios presented by ﬁgures (d), (e)

and (f) the amount of resources available was diffe-

rent in each one. The main difference occurred in (e)

since extra resources were added when the load were

decreasing near 1000 seconds. It happened because

a new elasticity action was already started before and

resources were available only at this point. With this

extra resources the application execute faster in the

last portion of time, resulting in a better performance.

Likewise, ﬁgures (g), (h) and (i) present LT applying

and differing the strategy to recalculate t

. Fi-

gure (h) presents a behavior quite different than the

Towards Combining Reactive and Proactive Cloud Elasticity on Running HPC Applications

265

Upper Threshold

Lower Threshold

Cloud Load Virtual Machines

load

virtual machines

time (seconds)

Axis legends

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(a) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(b) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

a f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(d) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(e) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(f) LT

b f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(g) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(h) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(i) LT

c f

Figure 5: Historical behavior of cloud parameters and resources when running the application with the Descending load.

Upper Threshold

Lower Threshold

Cloud Load Virtual Machines

load

virtual machines

time (seconds)

Axis legends

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(a) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(b) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

a f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(d) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(e) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(f) LT

b f

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(g) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(h) LT

0 0.2 0.4 0.6 0.8 1

0 900 1800 2700

12 10 8 6 4

(i) LT

c f

Figure 6: Historical behavior of cloud parameters and resources when running the application with the Wave load.

others two. While in (g) and (i) two elasticity acti-

ons were performed to remove resources in the ﬁrst

load drop, in (h) it did no occur. It happened because

when the load started to drop by the time 1000 se-

conds an elasticity action to increase resources was

already running. As SelfElastic does not trigger si-

multaneously elasticity actions, new actions were al-

lowed only when this new resources were available.

However, it happened after the load drop and when

the application load was already increasing again. As

the application keep resources from former actions,

the second half of execution was faster in this scena-

rio than all other scenarios.

5.2 Deﬁning Final Approach for LT

Figure 7 presents results of the metrics time (a) and

cost (b) when executing the load patterns Constant,

Ascending, Descending and Wave with all possibili-

ties for LT . In the time perspective, Figure 7 (a) shows

that LT

achieved better results than the other approa-

ches. This strategy obtained the best mean time (1967

seconds) between all four loads. Although pertinent

for performance purposes, we cannot negligence re-

source consumption and consequently the cost me-

tric. When analyzing cost, the gains of LT

are not

so evident in Figure 7 (b). However, this strategy also

obtained the best mean of all loads costs.

Aiming at generating a single approach to guide

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

266

1000

2000

3000

LTad LTae LTaf LTbd LTbe LTbf LTcd LTce LTcf

Time (

seconds

)

Constant Ascending Descending Wave

(a) Execution Time

0 10000

20000

30000

LTad LTae LTaf LTbd LTbe LTbf LTcd LTce LTcf

Cost [x1000]

Constant Ascending Descending Wave

(b) Cost

Figure 7: Results of metrics (a) time and (b) cost when run-

ning the load patterns.

the functioning of the Live Thresholding technique,

we used the cost within the Weighted Sum Mo-

del (Triantaphyllou, 2000) techique. Therefore, for

each load pattern we classiﬁed the cost results of the

nine combinations of LT

in an ascending fashion.

For each one we attributed a weight starting from 1.0

for the ﬁrst place, 0.9 for the second, 0.8 for the third

and so on. Thus, considering we have executed all

nine LT combinations with four load patterns, each

received four weights. So then, the sum of them

represents the ﬁnal result where the highest value de-

ﬁnes the ﬁnal approach for LT . Table 1 shows this

evaluation revealing LT

as the better strategy for LT .

Table 1: Using the cost metric to deﬁne the ﬁnal solution

for the Live Thresholding: LT

was selected as the best

approach when combining different types of work loads.

Weight

Ascending Constant Descending Wave Total

0.3 0.7 0.8 0.8 2.6

0.9 0.8 0.6 0.7 3.0

a f

0.5 0.9 0.5 1.0 2.9

0.2 0.4 0.3 0.4 1.3

0.6 0.3 0.2 0.5 1.6

b f

0.7 0.6 0.4 0.3 2.0

0.4 0.2 1.0 0.6 2.2

1.0 1.0 0.9 0.2 3.1

c f

0.8 0.5 0.7 0.9 2.9

5.3 Performance Analysis

Table 2 shows results we obtained running the appli-

cation with all load patterns and parameters. For the

scenario s2, the results regards to the strategy LT

which is the one we adopted as ﬁnal in Subsection 5.2.

For simplicity, here we will call LT

just LT . One of

the differences between LT and approaches with sta-

tic thresholds regards to how each strategy behaviors

at the exact moment after a resource reorganization.

In the best case for static thresholds, lower values for

and higher values for t

increases reactivity. In these

cases, when the load drops or increases after an ope-

ration it can stay over or under the same threshold

that has triggered the last operation. For this reason,

a new operation can occur sooner and it can antici-

pate actions. Conversely, in the worst case, higher va-

lues for t

and lower values for t

decreases reactivity

since the load could stagnate between the thresholds

not allowing more operations. In addition, with lo-

ads trending up or down, in these situations after an

operation it could take more time to the load reach

a threshold again. Differently from static thresholds,

LT proposes an algorithm to recalculate both t

and t

after an elasticity operation to close the load after the

operation. Thus, elasticity actions do not occur in the

observation that the thresholds are recalculated. It ta-

kes some more observations do continue adapting the

thresholds and then violate it again. In most results,

this distinct behavior made LT achieve time and cost

values slightly higher then the ones the better set of

static thresholds achieved. On the other hand, it also

made LT achieve results much better than the ones

obtained by the worst set of thresholds.

Table 2: Results of all scenarios and metrics.

Scenario Application Pattern

Thresholds

Time Resource Cost

s1 Without Elasticity

Ascending - - 4319 8618 37221142

Descending - - 4410 8798 38799180

Constant - - 4283 8542 36585386

Wave - - 4363 8700 37958100

Pos. Exponential - - 4601 9180 42237180

Neg. Exponential - - 4528 9042 40942176

All Random - - 4018 8040 32304720

Partial Random - - 3994 8010 31991940

s2 Live Thresholding

Ascending - - 1769 12064 21341216

Descending - - 2000 13088 26176000

Constant - - 1932 12828 24783696

Wave - - 2165 13408 29028320

Pos. Exponential - - 1918 10138 19444684

Neg. Exponential - - 2089 11134 23258926

All Random - - 1829 11920 21801680

Partial Random - - 2036 10770 21927720

s3 Static Thresholds

Ascending

30 1818 11936 21699648

50 1825 11874 21670050

30 3091 9450 29209950

50 3000 9540 28620000

Descending

30 1891 14056 26579896

50 1880 12746 23962480

30 2667 10110 26963370

50 2638 9840 25957920

Constant

30 1888 12382 23377216

50 1913 12546 24000498

30 2625 9886 25950750

50 2653 9954 26407962

Wave

30 2286 12496 28565856

50 2296 11784 27056064

30 2911 9750 28382250

50 2904 9600 27878400

Pos. Exponential

30 1880 9600 18048000

50 1888 10440 19710720

30 2212 9790 21655480

50 2226 9816 21850416

Neg. Exponential

30 2018 12090 24397620

50 2042 11810 24116020

30 2093 11250 23546250

50 2072 10664 22095808

All Random

30 1782 11700 20849400

50 1799 11730 21102270

30 2534 9120 23110080

50 2484 9270 23026680

Partial Random

30 1757 11490 20187930

50 1754 11430 20048220

30 2861 8850 25319850

50 2727 8910 24297570

Towards Combining Reactive and Proactive Cloud Elasticity on Running HPC Applications

267

6 CONCLUSION

This article presented the SelfElastic model as an ad-

vance in the current state of research by offering the

aforementioned features both in terms of application

and parameter writing. SelfElastic offers hybrid elas-

ticity through the Live Thresholding technique, so

self-organizing threshold values and resource alloca-

tion to offer a competitive solution at performance

and cost levels. Although being developed for pa-

rallel applications, SelfElastic can be easily extended

to address elasticity adaptivity on Web-based servi-

ces including e-commerce and electronic funds trans-

fer. The results are encouraging in favor of using

Live Thresholding since LT presents performance and

costs very close or even better than static thresholds.

REFERENCES

Bing, H., Ying-lan, F., and e bai, L. Y. (2009). Research

and improvement of congestion control algorithms ba-

sed on tcp protocol. In Software Engineering, 2009.

WCSE ’09. WRI World Congress on, volume 1, pages

440–443.

Dustdar, S., Gambi, A., Krenn, W., and Nickovic, D.

(2015). A pattern-based formalization of cloud-based

elastic systems. In Proceedings of the Seventh In-

ternational Workshop on Principles of Engineering

Service-Oriented and Cloud Systems, PESOS ’15, pa-

ges 31–37, Piscataway, NJ, USA. IEEE Press.

Farokhi, S., Jamshidi, P., Brandic, I., and Elmroth, E.

(2015). Self-adaptation challenges for cloud-based

applications : A control theoretic perspective. In

10th International Workshop on Feedback Computing

(Feedback Computing 2015). ACM.

Galante, G. and Bona, L. C. E. D. (2015). A programming-

level approach for elasticizing parallel scientiﬁc appli-

cations. Journal of Systems and Software, 110:239 –

252.

Ghanbari, H., Simmons, B., Litoiu, M., and Iszlai, G.

(2011). Exploring alternative approaches to im-

plement an elasticity policy. In Cloud Computing

(CLOUD), 2011 IEEE International Conference on,

pages 716–723.

Herbst, N. R., Huber, N., Kounev, S., and Amrehn, E.

(2013). Self-adaptive workload classiﬁcation and fo-

recasting for proactive resource provisioning. In Pro-

ceedings of the 4th ACM/SPEC International Confe-

rence on Performance Engineering, ICPE ’13, pages

187–198, New York, NY, USA. ACM.

Herbst, N. R., Kounev, S., Weber, A., and Groenda, H.

(2015). Bungee: An elasticity benchmark for self-

adaptive iaas cloud environments. In Proceedings of

the 10th International Symposium on Software Engi-

neering for Adaptive and Self-Managing Systems, SE-

AMS ’15, pages 46–56, Piscataway, NJ, USA. IEEE

Press.

Jamshidi, P., Ahmad, A., and Pahl, C. (2014). Auto-

nomic resource provisioning for cloud-based soft-

ware. In Proceedings of the 9th International Sym-

posium on Software Engineering for Adaptive and

Self-Managing Systems, SEAMS 2014, pages 95–104,

New York, NY, USA. ACM.

Lorido-Botran, T., Miguel-Alonso, J., and Lozano, J.

(2014). A review of auto-scaling techniques for elastic

applications in cloud environments. Journal of Grid

Computing, 12(4):559–592.

Moore, L. R., Bean, K., and Ellahi, T. (2013). Transfor-

ming reactive auto-scaling into proactive auto-scaling.

In Proceedings of the 3rd International Workshop on

Cloud Data and Platforms, CloudDP ’13, pages 7–12,

New York, NY, USA. ACM.

Netto, M. A. S., Cardonha, C., Cunha, R. L. F., and As-

suncao, M. D. (2014). Evaluating auto-scaling stra-

tegies for cloud computing environments. In IEEE

22nd International Symposium on Modelling, Analy-

sis & Simulation of Computer and Telecommunication

Systems, MASCOTS 2014, Paris, France, September

9-11, 2014, pages 187–196. IEEE.

Nikolov, V., K

achele, S., Hauck, F. J., and Rautenbach,

D. (2014). Cloudfarm: An elastic cloud platform

with ﬂexible and adaptive resource management. In

Proceedings of the 2014 IEEE/ACM 7th Internatio-

nal Conference on Utility and Cloud Computing, UCC

’14, pages 547–553, Washington, DC, USA. IEEE

Computer Society.

Nikravesh, A. Y., Ajila, S. A., and Lung, C.-H. (2015).

Towards an autonomic auto-scaling prediction system

for cloud resource provisioning. In Proceedings of the

10th International Symposium on Software Engineer-

ing for Adaptive and Self-Managing Systems, SEAMS

’15, pages 35–45, Piscataway, NJ, USA. IEEE Press.

Niu, S., Zhai, J., Ma, X., Tang, X., and Chen, W. (2013).

Cost-effective cloud hpc resource provisioning by

building semi-elastic virtual clusters. In Proceedings

of the International Conference on High Performance

Computing, Networking, Storage and Analysis, SC

’13, pages 56:1–56:12, New York, NY, USA. ACM.

Righi, R. R., Costa, C. A., Rodrigues, V. F., and Rostirolla,

G. (2016). Joint-analysis of performance and energy

consumption when enabling cloud elasticity for syn-

chronous hpc applications. Concurrency and Compu-

tation: Practice and Experience, 28(5):1548–1571.

Righi, R. R., Rodrigues, V. F., Costa, C. A., Galante, G.,

Bona, L., and Ferreto, T. (2015a). Autoelastic: Auto-

matic resource elasticity for high performance appli-

cations in the cloud. Cloud Computing, IEEE Tran-

sactions on, PP(99):1–1.

Righi, R. R., Rodrigues, V. F., Costa, C. A., Kreutz, D., and

Heiss, H.-U. (2015b). Towards cloud-based asynchro-

nous elasticity for iterative hpc applications. Journal

of Physics: Conference Series, 649(1):012006.

Triantaphyllou, E. (2000). Multi-Criteria Decision Making

Methodologies: A Comparative Study, volume 44 of

Applied Optimization. Springer, Dordrecht.

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

268