Towards Modeling Adaptation Services for

Large-Scale Distributed Systems with Abstract State Machines

Sorana Tania Nemes

and Andreea Buga

Christian Doppler Laboratory for Client-Centric Cloud Computing,

Johannes Kepler University of Linz, Software Park 35, 4232 Hagenberg, Austria

{t.nemes, andreea.buga}@cdcc.faw.jku.at

Keywords:

Formal Modeling, Abstract State Machines, Large-Scale Distributed Systems, Adaptation, Smart City.

Abstract:

The evolution of Large-Scale Distributed Systems favored the development of solutions for smart cities. Such

systems face a high-level of uncertainty as they consist of a large number of sensors, processing centers,

and services deployed along a wide geographical area. Bringing together different resources poses increased

complexity as well as communication efforts, and introduces a large set of possible failures and challenges

of continuously growing computational and storage expectations. In such a frame, the role of the adaptation

components is vital for ensuring availability, reliability, and robustness. This paper introduces a formal ap-

proach for modeling and verifying the properties and behavior of the adaptation framework addressing the

case of a system failure. We formalize the behavior and the collaboration mechanisms between agents of the

system with the aid of Abstract State Machines and employ the ASMETA toolset for simulating and analyzing

properties of the model.

1 INTRODUCTION

Large-scale distributed systems (LDS) have appeared

as a solution to the continuously expanding comput-

ing and storage demands. Services offered through

such architectures bring an increased value to the end

client, but there are still many open questions posed

by issues like heterogeneity, network failures, and

random behavior of components. Recovering from

failures and ensuring a high availability of the sys-

tem requires reliable monitoring and adaptation tech-

niques.

One of the biggest beneﬁciaries of LDS are the

applications for smart cities, which connect and al-

low the communication of a huge number of sensors

spread along a wide radius. Such systems cover vari-

ous aspects like trafﬁc surveillance, infrastructure and

environment, and aim to ease and improve the quality

of life of inhabitants. These solutions are character-

ized by the same properties as well as failures and

availability issues as any LDS. Therefore, the adapta-

tion component plays a key role in enacting adapta-

tion plans to bring the system to a normal execution

mode.

The goal and contribution of this paper is to in-

tegrate the formal modeling capabilities of the Ab-

stract State Machines (ASMs) for deﬁning and val-

idating an adaptation solution for LDS. Our project

promotes a service-oriented approach to heteroge-

neous, distributed computing that enables on-the-ﬂy

run-time adaptation of the running system based on

the replacement of sets of employed services by alter-

native solutions. For this we develop an advanced ar-

chitecture and an execution model by envisioning and

adapting a wide spectrum of adaptation means such as

re-allocation, service replacement, change of process

plan, etc.

The remainder of the paper is structured as fol-

lows. Section 2 provides an overview of the system

and its architecture, followed by a description of the

structure of the adaptation framework in Section 3.

Essential concepts related to the Abstract State Ma-

chine formal methods as well as the formal speciﬁca-

tion of the adaptation framework are detailed in Sec-

tion 4. Related work is discussed in Section 5, after

which conclusions are drawn in Section 6.

2 SYSTEM OVERVIEW

The evolution of distributed systems, Internet of

Things (IoT) and network capabilities played an im-

portant role in the adoption of ubiquitous solutions

for smart cities. Widely distributed sensors for trafﬁc,

193

P rovider

...

i+1

...

P rovider

...

j+1

...

k+1

...

P rovider

...

l+1

...

p+1

...

Client

Providers

Monitoring Layer

Adaptation Layer

Laptop

Phone

Computer

Abstract Machine

——Observe

—— Send

——request

——Reply

Figure 1: Architecture of the LDS system.

pollution, energy efﬁciency and environment continu-

ously collect data that are integrated in various appli-

cations. The aim is to sustainably develop cities and

improve the quality of life of the the inhabitants.

One of the main areas of interest is provisioning

of trafﬁc services, where centrally-controlled trafﬁc

sensors regulate the ﬂow of trafﬁc through the city

in response to demand. The beneﬁts of such a smart

trafﬁc management application also empowers people

to take informed decisions, and prevent severe trafﬁc

congestion due to overcrowded areas. In a smart city

network, trafﬁc sensors provide real time data related

to the percentage of road occupancy, the number or

trafﬁc participants, just to name a few. Such sensors

are distributed along an LDS and the data they pro-

vide can be integrated with activity patterns extracted

from smart gadgets for building a knowledge base for

a trafﬁc application.

The organization of the solution reﬂects the struc-

ture of LDS, where nodes refer to sensors and services

are offered by various providers. Such systems are

characterized by heterogeneity, unavailability, insta-

bility, network and node failures. Problems occurring

at component level are propagated to the whole solu-

tion, making it hard to identify the source. We empha-

size the role of the adaptation framework for ensuring

availability of the system and propose a formal model

for the solution. Figure 1 illustrates the architecture

of the LDS system for a smart trafﬁc application.

2.1 System Architecture

The envisioned solution proposes distributed middle-

ware components containing different units responsi-

ble for speciﬁc tasks: service integration, process op-

timization, communication handler. The core makes

use of ASMs for expressing the speciﬁcation of the

other components and foresees a three-layered ab-

stract machine model addressing normal processing,

monitoring and adaptation. The organization, as il-

lustrated in Figure 1, is rooted in three parts: the

client side where different users request services from

providers, the side of the providers where sensors

are deployed and an abstract machine containing the

monitoring and adaptation layers for the resources of

the providers. The interaction of the clients with the

service providers is based on a solution deﬁned by

osa et al., 2015), where the client-cloud interaction

middleware processes the requests and ensures the de-

livery of services to the end user.

The processes of the monitors and adapters are

highly interconnected and interdependent, enabling

the system to perform reconﬁguration plans when-

ever any of the trafﬁc sensors faces a problem. The

monitors are responsible for collecting data, aggre-

gating it into meaningful information and communi-

cating observations about abnormal executions to the

adaptation framework. The latter deals with recov-

ering from anomalous situations, logging them, and

ﬁnding the best remedy to restore the LDS to normal

running mode. Diagnosis is strongly correlated with

the high-level interpretation of collected data.

The adoption of LDS demands a deep understand-

ing of the underlying infrastructure, its running mech-

anisms and uncertainties, as services may become un-

available or change, or network problems may impact

negatively on the reliability and performance of the

distributed system (Grozev and Buyya, 2014). De-

livery of reliable services requires a continuous eval-

uation of the system state and adaptation in case of

abnormal execution. Therefore primarily, two aspects

are considered: resilience and fault tolerance. With

respect to resilience the project targets system archi-

tectures that guarantee that a LDS keep running and

producing desired results, even if some services be-

come unavailable, change or break down. With re-

spect to fault tolerance the project targets assessment

methods that permit the detection of failure situations

and adaptive repair mechanisms. Therefore for LDS,

adaptability is a valuable and an almost inevitable

process.

3 ORGANIZATION OF THE

ADAPTATION FRAMEWORK

As aforementioned in section 2, the architecture and

execution model are enhanced to capture dynamic

adaptation of a LDS to changing environmental cir-

cumstances. The Adaptation Engine aims to perpetu-

ally react to the input measurements and notiﬁcations

from the monitoring component and maintain its re-

siliency to gracefully handle and adapt to new con-

texts varying from network trafﬁc ﬂuctuations to un-

Seventh International Symposium on Business Modeling and Software Design

194

conﬁguration

Case

Ω

Action

Controller

Ω

...

Action Workﬂow

system adaptation

launch

initial action: Ω

actionStarted

actionFailed

actionCompleted

handle signals

execute

state(Ω) = active

Figure 2: Overview speciﬁcation of the Adaptation Manager.

availability of different system components. Its main

measures consist in reacting to and evaluating the data

collected and assessed by the monitoring components

in regards to the detected faults within the system, em-

ploying the repair of the encountered problem under

presumably optimal performance and adjusting the

solution to higher levels of quality compliance.

Standardized repair actions for on-the-ﬂy changes

in reaction to identiﬁed critical situations are deﬁned

and will be employed on demand. This will result in

a catalog of possible adaptive collaboration patterns,

each supported by a set of subsequent adaptation tools

and components. Such repair patterns can be the re-

placement of a component service by an equivalent

one or the change of location for a service, up to the

replacement of larger parts of the LDS, i.e. a set of

services involved, by a completely different, alterna-

tive solution.

These steps in the adaptation process highlight

the two major components that make up the Adap-

tation Engine as an inner component of the abstract

machine included in the middleware: solution explo-

ration, identiﬁcation and maintenance carried out by

the Case Manager, and solution management and en-

actment processed by the Action Manager. Each con-

stituent component runs with well delimited respon-

sibilities and areas of inference and control. The cur-

rent paper focuses on the second part of adaptation,

the Action Manager.

In the envisioned framework, any adaptation so-

lution is conﬁgured and stored in the repository as a

workﬂow schema detailing the actions and underlying

transition dependencies needed to restore the system

to a normal execution mode.

An action is an autonomous entity (e.g. a software

module) which has the power to act or cause a single

update to the system. Its autonomy implies that its

processes are neither controlled by other actions, nor

are they controlled by the environment. The power of

such autonomous and self-aware actions lies in their

ability to deal with unpredictable, dynamically chang-

ing, heterogeneous environments while relying fully

on existing solutions for LDS adaptability. Given

the situation of replacing one service with another,

one such action would encompass ﬁnding a suitable

matching service to replace the problematic one (by

accessing the capabilities of an existing tool) or dy-

namically reconﬁgure the service calls to the new in

use service.

The action’s instantiation and execution are han-

dled by linked ActionController loaded based on the

deﬁned contract for that particular action. The ac-

tions’ ordering and dependency on other actions is

handled by means of notiﬁcation/signaling, where ev-

ery action state change would imply for the parent

ActionController to broadcast the associated notiﬁca-

tion. Figure 2 depicts the overall structure of the adap-

tation process once the problem is mapped to previ-

ous encountered problems and the attached solution

is carried out based on its conﬁguration.

Therefore, the adaptation system consists of a ﬁ-

nite set of autonomous, interacting Action Controllers

that intercept and assess all the raised notiﬁcations

triggered by actions’ execution or failure. The assess-

ment implies either enacting and executing its cor-

responding action, or ignoring the notiﬁcation as it

is not of interest in the given solution conﬁguration.

Having Action Controllers to monitor and handle the

interaction between the actions of a solution, it em-

phasizes new properties of the actions being deﬁned

in terms of needed input, concrete implementation

and resulting output. More importantly, the under-

lining actions can be easily reused or substituted by

enabling the possibility to add or remove any given

number of actions without the need to update the cur-

rent actions.

The model’s underlining observer/controller ar-

chitecture is one realization of the feedback loop prin-

ciple (Brun et al., 2009): the executing adaptation is

observed by its registered controllers, which in turn,

based on the reported observations and broadcast no-

tiﬁcations, affect the system towards the remedia-

tion of the reported problem/failure. An environmen-

tal change resulted from the execution of an adapta-

tion action triggers a reaction within the system that

causes, in return, a conﬁguration-based chain of sub-

sequent changes. These loops guide the system be-

havior and dynamics for the adaptation to succeed in

reaching the intended goals.

In order to better understand the intrinsic prob-

lems that the framework can face, we focused our at-

tention on building ground models in terms of ASMs.

Based on them we can validate the speciﬁcations and

verify if they fulﬁll desired properties as safety and

liveness.

Towards Modeling Adaptation Services for Large-Scale Distributed Systems with Abstract State Machines

195

Passive

Waiting

notiﬁcation

Notiﬁcation

received

Acknowledge

notiﬁcation received

Assess

notiﬁcation

Handle

notiﬁcation

YesNo

Ignore

notiﬁcation

Action

started

Broadcast

ActionStarted

notiﬁcation

Waiting for

ActionStarted

acknowledgement

ActionStarted

acknowledged by all

Yes

Terminate

adaptation

execution

Adaptation

not initiated

Clear

StartAction

echo

Action

running

Trigger action

Action

successfully

completed

Yes

Action active

Broadcast

ActionCompleted

notiﬁcation

Waiting for

ActionCompleted

acknowledgement

ActionCompleted

acknowledged by all

Yes

Clear

CompleteAction echo

Controller

acknowledge

failed

Action failed

Broadcast

ActionFailed

notiﬁcation

Waiting for

ActionFailed

acknowledgement

ActionFailed

acknowledged by all

Yes

Clear

FailedAction echo

Assess

component

data and status

Figure 3: Control ASM for the Action Controller agent.

4 FORMAL SPECIFICATION OF

THE SYSTEM

4.1 Background on ASM Theory

According to (Kossak and Mashkoor, 2016), ASMs

stood out as a high-quality software engineering

method for behavioral and architectural system design

and analysis. By further considering the assistance of

the model through the expressiveness of the speciﬁca-

tion, the software development process, its coherence

and the scalability in industrial applications (Kossak

and Mashkoor, 2016) we adopted the ASM method.

The speciﬁcation of an ASM consists of a ﬁnite

set of transition rules of the type: if Condition then

Updates (B

orger and Stark, 2003), where an Update

consists of a ﬁnite set of assignment f(t

, ..., t

) := t.

As ASMs allow synchronous parallelism execution,

two machines might try to change a location with two

different values, triggering an inconsistency. In this

case the execution throws an error.

Rules consist of different control structures that

reﬂect parallelism (par), sequentiality (seq), causal-

ity (if...then) and inclusion to different domains

(in). With the forall expression, a machine can en-

force concurrent execution of a rule R for every el-

ement x that satisﬁes a condition ϕ: forall x with

ϕ do R. Non-determinism is expressed through the

choose rule: choose x with ϕ do R.

Deﬁnition 1. A control state ASM is an ASM built

on the following rules : any control state i veriﬁes

at most one true guard, cond

, triggering, thus, rule

and moving from state i to state s

. In case no guard

is fulﬁlled, the machine does not perform any action.

In the design phase of software development,

ASM technique permits transforming the require-

ments from natural language to ground models, and

further to control state diagrams, that are easier to for-

malize. ASMETA

toolset for simulating, validating

and model-checking ASM models, permits elaborat-

ing the models with the aid of the AsmetaL language,

which is able to capture speciﬁc ASM control struc-

tures and functions. The models are further simulated

and validated for inconsistencies. The tool permits

also automatic review of the model for properties like

conciseness or faultiness or for design issues. In the

veriﬁcation stage, properties like reachability, safety

and liveness are deﬁned and checked.

4.2 ASM Speciﬁcation

Based on the overall speciﬁcation of the adaptation

framework mentioned in Section 3, we deﬁne the spe-

ciﬁc states and transitions of the adaptation processes,

with emphasis on the management and enactment of

http://asmeta.sourceforge.net/

Seventh International Symposium on Business Modeling and Software Design

196

adaptation actions. The model contains ActionCon-

troller ASM agents, each of which carrying out its

own execution. The ground model illustrated in Fig-

ure 3 details the behavior of an ActionController in re-

lation to the received and broadcast notiﬁcations. The

ActionController can pass through several states by

various rules and guards.

At initialization, the ActionController is in the

Passive, Waiting notiﬁcation state. This initial state

is reached again either when the associated action’s

execution and acknowledgment by all the other con-

trollers are fulﬁlled or when the received notiﬁca-

tion is not bound to inﬂuence the ActionController in

question. This is a clear indication of the continuous

character of the adaptation process which takes place

in the background of service execution.

Once a notiﬁcation arises, the ActionController

acknowledges the received notiﬁcation in disregard of

the actual sender, after which it moves to the Assess

notiﬁcation state. The rule responsible for acknowl-

edging a notiﬁcation is captured in Listing 1.

rule r AcknowledgeNotiﬁcationReceived($c in Controller,$broadcaster in Controller) =

if (controller state($c) = NOTIFICATION RECEIVED) then

seq

controller state($c) := ASSESS NOTIFICATION

par

acknowledged controllers($broadcaster) := acknowledged controllers(

$broadcaster) + 1

r HandleNotiﬁcation[$c]

endpar

endseq

endif

Listing 1: Acknowledge notiﬁcation ASM rule.

Handling the received notiﬁcation implies to broad-

cast ﬁrst the notiﬁcation that the action execution is

bound to start, as captured in Listing 2.

rule r BroadcastNotiﬁcation($c in Controller, $n in Notiﬁcation) =

forall ($neighbor in Controller) then

if (not(id($c) = id($neighbor))

seq

acknowledged controllers($c) := 1

par

controller state($c) := WAITING FOR ACKNOWLEDGEMENT

AcknowledgeNotiﬁcationReceived[$neighbor, $c]

endpar

endseq

endif

endforall

Listing 2: Broadcast notiﬁcation ASM rule.

The controller must act on executing the underly-

ing action only once the notiﬁcation is acknowledged

by all neighboring ActionControllers which were in-

stantiated as part of the same adaptation session. De-

pending on the output of the executed action, one no-

tiﬁcation will be broadcast signaling the success or

failure of this particular system update. As there is

no linked track of the ActionControllers’ order to ex-

ecution, if at least one ActionController does not ac-

knowledge any of the sent notiﬁcations, the adapta-

tion is abruptly terminated and the component data

and status are assessed and logged accordingly. The

rule responsible for triggering the associated adapta-

tion action is captured partially in Listing 3.

rule r TriggerAction($c in Controller) =

seq

while (controller state($c) = RUNNING ACTION)

wait

if (action completed($c))

par

r BroadcastNotiﬁcation[$c, ACTION COMPLETED]

r AwaitAcknowledgement[$c]

if (acknowledged controllers($n) = numberOfControllers)

par

r ClearNotiﬁcationEcho[$c]

controller state($c) := WAITING NOTIFICATION

endpar

else

par

controller state($c) := CONTROLLER ACKNOW FAILED

AssessDataAndStatus

endpar

endif

endpar

else

par

BroadcastNotiﬁcation[$c, ACTION FAILED]

...

Listing 3: Trigger action ASM rule.

4.3 Validation of the Model

The validation for the current state of our work deals

only with the separate processes for each agent. It

focuses on checking the work ow and the transitions

from different states. AsmetaV tool permits valida-

tion of speciﬁc scenarios deﬁned with the aid of the

Avalla language presented by (Carioni et al., 2008).

Scenarios resemble the unit tests performed during

the software testing development phase. They capture

execution ﬂows given speciﬁc values to functions of

the system.

One of the problems we identiﬁed during the val-

idation phase was that the Avalla language does not

support working with inﬁnite domains. Therefore,

we needed to consider that each ActionController re-

members only one Notiﬁcation instance. Other incon-

sistency errors detected at simulation time led to de-

sign changes or restrictions.

More than one system failure can be reported in

a short time frame. Therefore, the failure recovery

part is done in a sequential mode because, although

the case/reconﬁguration plan is locked while it’s as-

sociated solution is executed, a parallel execution of

simultaneous adaptations may try to update system

parts or components with different values at the same

time. We leave as a future work the elaboration of

transaction speciﬁc operations, which would permit

triggering simultaneously multiple adaptions within

the system. This could be supported by annotating

the case with extensive knowledge on the area of in-

ference in the system of each case, which would later

on be considered in the retrieval phase of the process.

Towards Modeling Adaptation Services for Large-Scale Distributed Systems with Abstract State Machines

197

5 RELATED WORK

Research in software adaptation ranges from the de-

velopment of generic architectural frameworks to

speciﬁc middleware using component frameworks

and reﬂective technologies for specialized domains.

Mechanisms proposed include: DA by generic in-

terceptors (Sadjani, 2004), which do not mod-

ify a component’s behavior, but intercept messages

between components; DA with aspect-orientation

(Yang, 2002); parametric adaptation (Pellegrini.,

2003) or dynamic reconﬁguration by means of ad-

justing or ﬁne-tuning predeﬁned parameters in soft-

ware entities; dynamic linking of components (Es-

cofﬁer and Hall, 2007); and model-driven develop-

ment (Zhang and Cheng, 2006).

However, while existing techniques offer a wide

range of options to achieve different degrees of DA,

questions related to the identiﬁcation and soundness

of a given adaptation model are still open. Formal

methods grant clearer deﬁnitions and precision for the

adaptation framework. Our project focuses on how to

extend and build on this previous research while spec-

ifying and validating LDS speciﬁc requirements like

on-the-ﬂy reaction to change, loss or addition of re-

sources. We consulted the area of formal methods and

chose the ASM technique proposed and exempliﬁed

in various industrial examples by (B

orger and Stark,

2003).

Modeling LDS has been addressed in several

cloud and grid related projects. The ASM technique

contributed to the description of the job management

and service execution in (Bianchi et al., 2013). Speci-

ﬁcation of grids in terms of ASMs have been proposed

also by (N

emeth and Sunderam, 2002), where the au-

thors focused on expressing differences between grid

and traditional distributed systems.

6 CONCLUSIONS

The current paper proposes an approach for achieving

a reliable adaptation solution for LDS. By employing

the ASM formal method we analyze the properties of

the model and identify reasoning ﬂaws. The knowl-

edge scheme presented in the paper supports adapta-

tion related processes and is reﬂected in the model.

We analyzed the model with the aid of the AsmetaV

tool and validated the reliability of some of our mod-

els when executing an adaptation solution.

In the future steps of our work we aim to enhance

the models and express their properties in terms of

CTL logic, which is supported by the Asmeta toolset.

By these means, faults and drawbacks of the proposal

can be identiﬁed and corrected.

REFERENCES

Bianchi, A., Manelli, L., and Pizzutilo, S. (2013). An ASM-

based Model for Grid Job Management. Informatica

(Slovenia), 37(3):295–306.

orger, E. and Stark, R. F. (2003). Abstract State Machines:

A Method for High-Level System Design and Analysis.

Springer-Verlag New York, Inc., Secaucus, NJ, USA.

osa, K., Holom, R., and Vleju, M. B. (2015). A formal

model of client-cloud interaction. In Correct Software

in Web Applications and Web Services, pages 83–144.

Brun, Y., Marzo Serugendo, G., Gacek, C., Giese, H.,

Kienle, H., Litoiu, M., M

uller, H., Pezz

e, M., and

Shaw, M. (2009). Software engineering for self-

adaptive systems. chapter Engineering Self-Adaptive

Systems Through Feedback Loops, pages 48–70.

Springer-Verlag, Berlin, Heidelberg.

Carioni, A., Gargantini, A., Riccobene, E., and Scandurra,

P. (2008). A scenario-based validation language for

asms. In Proceedings of the 1st International Confer-

ence on Abstract State Machines, B and Z, ABZ ’08,

pages 71–84, Berlin, Heidelberg. Springer-Verlag.

Escofﬁer, C. and Hall, R. S. (2007). Dynamically Adaptable

Applications with iPOJO Service Components, pages

113–128. Springer Berlin Heidelberg, Berlin, Heidel-

berg.

Grozev, N. and Buyya, R. (2014). Inter-cloud architec-

tures and application brokering: taxonomy and survey.

Softw., Pract. Exper., 44(3):369–390.

Kossak, F. and Mashkoor, A. (2016). How to Select the

Suitable Formal Method foran Industrial Application:

A Survey, pages 213–228. Springer International Pub-

lishing, Cham.

emeth, Z. N. and Sunderam, V. (2002). A For-

mal Framework for Deﬁning Grid Systems. 2014

14th IEEE/ACM International Symposium on Cluster,

Cloud and Grid Computing, 0:202.

Pellegrini., M.-C., R. M. (2003). Component management

in a dynamic architecture. The Journal of Supercom-

puting, 24(2):151–159.

Sadjani, S., M. P. (2004). An adaptive corba template to

support unanticipated adaption. In International Con-

ference on Distributed Computing Systems, pages 74–

83.

Yang, Z., C. B. S. R. S. J. S. S. M. P. (2002). An aspect-

oriented approach to dynamic adaptation. WOSS,

pages 85–92.

Zhang, J. and Cheng, B. H. C. (2006). Model-based devel-

opment of dynamically adaptive software. In Proceed-

ings of the 28th International Conference on Software

Engineering, ICSE ’06, pages 371–380, New York,

NY, USA. ACM.

Seventh International Symposium on Business Modeling and Software Design

198