ON THE STUDY OF DYNAMIC AND ADAPTIVE DEPENDABLE

DISTRIBUTED SYSTEMS

J. E. Armend´ariz-I˜nigo, J. R. Ju´arez-Rodr´ıguez, J. R. Gonz´alez de Mend´ıvil

Universidad P´ublica de Navarra, 31006 Pamplona, Spain

F. D. Mu˜noz-Esco´ı, R. de Juan-Mar´ın

Instituto Tecnol´ogico de Inform´atica, Universidad Polit´ecnica de Valencia, 46022 Valencia, Spain

Keywords:

Distributed systems, Availability, Dependability, Dynamic systems, Data consistency.

Abstract:

Due to the usage of MANETs and some kinds of collaborative applications (P2P), current distributed systems

are becoming increasingly dynamic; i.e., it is difﬁcult to manage membership information and to forecast the

accessibility of each system node. Moreover, dependable applications for static distributed systems also need

to provide good adaptability levels (to different request arrival rates, usage patterns, classes of requests,...) and

good scalability; a case to study is the cloud computing paradigm. Development of dependable applications in

dynamic and adaptive systems is not trivial, since both dynamism and adaptability may compromise algorithm

liveness or may complicate the design of such algorithms, specially those best suited for static systems. Strate-

gies for building adaptable and scalable dependable services (based on “cloud systems”) will be surveyed and

improved. Moreover, an efﬁcient support for dependable applications in dynamic systems will be provided,

combining three different approaches: relaxed consistency models, interconnection protocols (for supporting

both consistency and multicasting) and reconciliation strategies. Last but not least, also the usage and support

for integrity constraints in replicated systems will be analyzed and improved for dynamic systems.

1 INTRODUCTION

In the last decade, there have been several advances

in the distributed systems ﬁeld that have allowed the

creation of new decentralized applications. A ﬁrst

sample are P2P applications, e.g. Napster, as a ﬁrst

system of this kind, that still used a centralized di-

rectory, but where all data transfers were made be-

tween peers that collaborated in order to share ﬁles.

Later on, the need of a centralized service is skipped,

such as Gnutella, although it raised difﬁculties in or-

der to ﬁnd the ﬁles to be shared, since search mes-

sages need to be ﬂooded into the network. This com-

promised system scalability, since this excessively in-

creased the amount of messages needed for obtain-

ing the searched ﬁle and did not guarantee that such

ﬁle could be located, even when multiple copies of

such ﬁle did exist in the system. Another evolution

of these P2P systems were their structured variants,

e.g. Tapestry (Zhao et al., 2004), providing some-

thing similar to a distributed hash table (DHT). In this

case, such system organization or “structure” allows

a fast localization of any shared content, guarantee-

ing system scalability. However, structured variants

also raise serious problems when their nodes join and

leave the system very often. Note that these P2P ap-

plications are a valid example of dynamic distributed

system, since none of their nodes needs to know all

the other potential collaborators system nodes. More-

over, P2P system nodes do not remain in the system

for long time intervals, causing system membership

to be highly dynamic.

As it has been shown in the previous paragraph,

system decentralization may lead to obtaining dy-

namic distributed systems. Another example of dy-

namic system is one consisting of a set of mo-

bile computers interconnected through a wireless net-

work. As in the previous case, the usage of such

systems has become popular nowadays. Currently,

wireless networks can be found in manyenvironments

(universities, enterprises, and airports, to name a few,

or even at home) since many users prefer a com-

183

E. Armendáriz-Iñigo J., R. Juárez-Rodríguez J., R. Gonzalez de Mendivil J., D. Muñoz-Escoí F. and de Juan-Marín R. (2009).

ON THE STUDY OF DYNAMIC AND ADAPTIVE DEPENDABLE DISTRIBUTED SYSTEMS.

In Proceedings of the 4th International Conference on Software and Data Technologies, pages 183-186

DOI: 10.5220/0002279801830186

 SciTePress

puter network of this kind. In this kind of systems,

sensor networks (mainly oriented towards monitor-

ing applications in different environments: weather

forecasting, zoological surveys (habitat-oriented), do-

motics, biomedical applications, etc.) and VANETs

(MANETs whose nodes are plugged into vehicles,

providing a complementary set of services to the ve-

hicles’ driver or users) can also be included. All these

cases are samples of dynamic distributed systems.

Applications developed on them need to consider a

system model that is not equivalent to the traditional

one. Besides assuming asynchrony, in these cases the

application designer is unable to know which is the

actual system membership (each node does only in-

teract with a small part of the system population), and

he/she should also assume that the probability of node

failure or disconnection is high. Thus, the design of

efﬁcient distributed algorithms in these environments

is a challenging task that has deserved attention in

the last years. So, several theoretical results have

been published in this area. For instance, (Most´efaoui

et al., 2005) already identiﬁed these problems and

proposed some parameters and basic primitives that

allow algorithm migration from static to dynamic sys-

tems. Their proposal uses an alpha parameter, con-

sisting in the number of nodes that could be consid-

ered as a stable core in such dynamic system (such

core does not need to be perpetual; its nodes should

be alive and active in a time interval long enough

to ensure algorithm progress), and several commu-

nication primitives (a termination-guaranteed query-

response, that is reactivated as soon as alpha replies

are collected, and a reliable and persistent multicast;

i.e., that guarantees message delivery to the nodes that

have joined the system whilst the message was mul-

ticast). Using these tools, such paper proves that a

leader-election algorithm can be migrated to a dy-

namic system, despite being a problem that has tra-

ditionally been considered unresolvable when system

nodes’ identiﬁers were unknown. A second proposal

was presented in (Baldoni et al., 2007) where differ-

ent characteristics of several dynamic systems are sur-

veyed and a ﬁrst classiﬁcation is provided, depending

on the basis provided for designing distributed algo-

rithms. This proves that this is an interesting research

line, since it will allow algorithm migration (or prove

that such migration is not possible in some cases)

from static systems to dynamic ones. Thus, our aim

consists in choosing several problems already solved

in static systems (consensus, leader election, mutual

exclusion, global state collection, ...) and to analyze

in which dynamic settings such problems could still

be solved. Note that it has been widely accepted that

a distributed system should be able to overcome its

node and communication failures in order to be use-

ful. Due to this, it is common that the applications be-

ing developed on these system assume some kind of

dependable support (or even that the underlying sys-

tem is dependable), but it is not trivial to provide such

dependable support in a dynamic system. Another ob-

jective is to improve the dependability of applications

designed, implemented and deployed in this kind of

systems.

On the other hand, systems that consist of a sta-

ble set of nodes able to adapt themselves to different

kinds of load or to different kinds of environments

can also be classiﬁed as dynamic. Thus, in the ﬁeld

of data replication management it has been provided a

meta-protocol (Ruiz-Fuertes et al., 2007) that may si-

multaneously support multiple replication protocols,

allowing that each application uses a different replica-

tion protocol in order to manage its transactions, im-

proving its performance and reducing the abort rates

of those transactions that could use a relaxed isola-

tion level (note that each protocol may support a dif-

ferent isolation level, if needed). Note also that dif-

ferent database replication protocols provide different

performance when they manage different load levels,

as proven by (Wiesmann and Schiper, 2005). Thus,

this meta-protocol is complemented by a system load

monitor, could allow that its replication middleware

chooses the best protocol for each transaction accord-

ing to the current system load conﬁguration when

such transaction is started, improving system adapt-

ability.

This adaptability to variable loads, and the cost

reduction implied by hardware downscaling, have

given rise to a new distributed computing paradigm

called “cloud computing”. The aim of these sys-

tems consists in enabling possibly huge quantities of

networked computing resources to be “contracted”

in order to achieve the execution of large-scale ap-

plications of enterprise customers. To their clients,

the contracted resources become “remote”, being ac-

cessed via the Internet (the virtual “cloud” location

which gave name to this new paradigm). Thus, the

IT departments of cloud computing clients only need

to worry about developing their applications (usually,

web services), and deploying them in the hired vir-

tual computers. They do not need to purchase and

operate a proprietary computing center, such that big

investments and administration costs are avoided. On

the other hand, the company providing such clouds

must maintain enough virtual computing capacities

for guaranteeing the contracted services and for en-

suring easy adaptation to different request loads (note

that the ﬁnal user of such services could be a third

company or the users of the services provided by the

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

184

client company). There are multiple problems that

need to be solved in this kind of cloud systems: (1)

scalability, (2) availability, (3) security, (4) virtuali-

sation (hired computers are only virtual nodes whose

load will be supported by actual computers; this al-

lows an easy isolation of each virtual node, ensur-

ing a secure deployment, but it also complicates a bit

the overall system management regarding the other

requirements: scalability and availability). Finally,

client companies will be budgeted with a static pay-

ment (depending on the bandwidth and minimal com-

puting power that have been contracted), plus a vari-

able amount that depends on the actual resource us-

age. There are several proposals for this computing

model: Amazon Elastic Compute Cloud (EC2) (Ama-

zon, 2009), Google App Engine (Google, 2009), Win-

dows Azure (Microsoft, 2009), .. .This conﬁrms that

service scalability, adaptability, security and avail-

ability are important issues that need additional re-

search. Solutions being described in the white papers

referenced above are not complex, but it seems ap-

propriate to propose advanced solutions to these prob-

lems, and this is our main objective in this proposal.

We can not develop an entire cloud system, since this

demands a lot of time and effort (and the solutions

developed by Amazon, Google, Microsoft, IBM and

other companies will have been highly improved in

the meantime). So, our objective will be centered

in proposing complementary solutions that could be

easily included in such commercial products or that

could be directly used by the client companies dis-

cussed above.

2 MAIN RESEARCH GOALS

As already mentioned in the introduction, there

are several types of dynamic distributed systems

and all of them have received special coverage in

the last years. However, and with a few excep-

tions (Most´efaoui et al., 2005; Baldoni et al., 2007),

there have not been any theoretical surveys about how

to adapt the classical solutions for static systems to

these new environments, and about how to formally

characterize the latter. Thus, there are many things

of interest in this research line, from both theoreti-

cal and practical points of view. Every application

running in a dynamic system shares some amount

of information among its system nodes. This im-

plies that the consistency protocols already developed

for static systems could be improved and adapted for

these dynamic environments, where node connectiv-

ity cannot be always guaranteed. To this end, an ini-

tial basis could be provided by interconnection proto-

cols for communication systems (

Alvarez et al., 2008)

(needed for supporting some consistency model when

previously isolated node groups are re-joined, a fre-

quent event in a dynamic system) or some reconcili-

ation protocols (Asplund et al., 2007) that have been

recently published. This need of scalability and adap-

tation to highly variable workloads has been identi-

ﬁed by the most important companies in the ﬁeld of

web service technologies and operating systems, driv-

ing them to propose the cloud computing systems al-

ready discussed in the introduction. This shows that

there are not good solutions (at least, demanding a low

or moderate computing effort) for these problems and

that additional research is needed in this area. Thus,

some of their persistence mechanisms are still very

basic (Amazon, 2009) (providing a shared-disk image

where multiple copies of the information are stored)

or they provide an interface very close to the web

services (Google, 2009; Microsoft, 2009) but with-

out relational support (losing thus general function-

ality).On the other hand, in order to improve scala-

bility, the general principle (Helland and Campbell,

2009) consists in relaxing consistency while increas-

ing the degree of asynchrony, using consistency and

concurrency control mechanisms that are very opti-

mistic. Such principle could be also used in a dynamic

system, improving application performance, but re-

laxing the consistency provided to the users. These

are going to be the main goals of our study:

Theoretical Study and Classiﬁcation of Differ-

ent Types of Dynamic Systems and Mechanisms

Needed for Migrating Algorithms from Static to

Dynamic Systems. Although there have been some

results (Most´efaoui et al., 2005; Baldoni et al., 2007)

in this area, such classiﬁcations could still be reﬁned

and the support suggested to each identiﬁed class

could also be extended, evaluating which kinds of

algorithms designed for static systems could be mi-

grated to a dynamic one.

Study, Design and Prototyping of Process and

Persistent Data Management Strategies Providing

Adaptability and Scalability. Regarding adaptabil-

ity and scalability, the meta-protocol (Ruiz-Fuertes

et al., 2007) that allows the coexistence of multiple

replication protocols in the same middleware. On this

support, it will be necessary to design and to imple-

ment a module of load and performance analyses that

chooses at every moment the most suitable protocol

for the current system characteristics. But we must

extend such support to guarantee a greater scalabil-

ity. To this end, it will be necessary to evaluate which

replication strategies will be most appropriate, what

type of transactional support could be provided and

what consistency degree could be maintained. The

ON THE STUDY OF DYNAMIC AND ADAPTIVE DEPENDABLE DISTRIBUTED SYSTEMS

185

proposed solutions in “cloud systems” use relaxed

strategies in those three variants and they do not al-

ways consider replication.

Adaptation and Improvement of Existing Consis-

tency Protocols for Static Systems in Order to use

them in Dynamic Systems. Considering some ex-

isting results (

Alvarez et al., 2008), it seems appro-

priate to select FIFO or causal consistency models

in dynamic systems, thus allowing the usage of sim-

ple interconnection protocols for those subgroups that

remained in isolation and are currently rejoining in

order to compound a bigger system. Other solu-

tions (Asplund et al., 2007) are based on the usage

of reconciliation strategies, choosing which updates

could be accepted and which others should be re-

jected or adapted. Interconnection and reconciliation

strategies have evolved independently. Their combi-

nation has not yet been studied. More recently, it

has been proposed the eventually consistent model for

large scale distributed systems (Vogels, 2009). Our

proposal tries to combine their best characteristics

in those settings where such approach would make

sense. Moreover, the theoretical results provided in

the context of our ﬁrst objective will also affect this

study of consistency protocols.

Analysis, Design and Prototyping of a Dependable

Middleware System for Dynamic Environments.

The static solutions taken as a basis for dependable

middleware development would be placed in modules

that could be replaced by other appropriately tailored

for dynamic settings. The resulting system should be

able to provide a good support for both static and dy-

namic systems. Multiple mechanisms for ensuring se-

curity should be also supported by this middleware.

Extension and Dynamization of Integrity Support

and Usage in Replicated Systems. It has been stud-

ied the integrity constraint management in replicated

databases (Lin et al., 2009); However, another pos-

sible extension consists in including dynamic con-

straints (i.e., triggers) in our management, and also

constraints of arbitrary generality. Up to now, only

built-in constraints declared in the database schema

have been maintained by the system. Another ex-

tension could be based on migrating our mechanisms

to other kinds of replication protocols. For instance,

(Asplund et al., 2007) analyzed the consistency-

availability trade-off in partitionable systems, with a

constraint-based consistency management. A last ex-

tension could consist in the evaluation and measure-

ment of the resulting (in)consistency degree. Par-

tial database replication is another ﬁeld where our

proposed mechanisms could be used. since partial

replication is needed in dynamic systems with lim-

ited resources of computing power and storage capac-

ity. There have not been any important results in this

ﬁeld, up to now.

ACKNOWLEDGEMENTS

This work has been supported by the Spanish MEC

under research grant TIN2006-14738-C02.

REFERENCES

Alvarez, A., Ar´evalo, S., Cholvi, V., Fern´andez, A., and

Jim´enez, E. (2008). On the interconnection of mes-

sage passing systems. Inf. Process. Lett., 105(6):249–

254.

Amazon (2009). Amazon elastic compute

cloud (amazon ec2). Accessible in URL:

http://aws.amazon.com/ec2/.

Asplund, M., Nadjm-Tehrani, S., Beyer, S., and Gald´amez,

P. (2007). Measuring availability in optimistic

partition-tolerant systems with data constraints. In

DSN, pages 656–665.

Baldoni, R., Bertier, M., Raynal, M., and Piergiovanni,

S. T. (2007). Looking for a deﬁnition of dynamic dis-

tributed systems. In PaCT, volume 4671 of LNCS,

pages 1–14. Springer.

Google (2009). What is google app engine? Ac-

cessible in URL: http://code.google.com/appengine/

docs/whatisgoogleappengine.html.

Helland, P. and Campbell, D. (2009). Building on quick-

sand. In CIDR.

Lin, Y., Kemme, B., Pati˜no-Mart´ınez, M., Jim´enez-Peris,

R., and Armend´ariz-I˜nigo, J. E. (2009). Snapshot iso-

lation and integrity constraints in replicated databases.

In ACM TODS. To appear.

Microsoft (2009). Azure services platform. Accessible in

URL: http://www.microsoft.com/azure/default.mspx.

Most´efaoui, A., Raynal, M., Travers, C., Patterson, S.,

Agrawal, D., and Abbadi, A. E. (2005). From static

distributed systems to dynamic systems. In SRDS,

pages 109–118.

Ruiz-Fuertes, M. I., de Juan-Mar´ın, R., Pla-Civera, J.,

Castro-Company, F., and Mu˜noz-Esco´ı, F. D. (2007).

A metaprotocol outline for database replication adapt-

ability. In OTMWorkshops (2), volume 4806 of LNCS,

pages 1052–1061. Springer.

Vogels, W. (2009). Eventually consistent. Commun. ACM,

52(1):40–44.

Wiesmann, M. and Schiper, A. (2005). Comparison of

database replication techniques based on total order

broadcast. IEEE Trans. Knowl. Data Eng., 17(4):551–

566.

Zhao, B. Y., Huang, L., Stribling, J., Rhea, S. C., Joseph,

A. D., and Kubiatowicz, J. (2004). Tapestry: a re-

silient global-scale overlay for service deployment.

IEEE Journal on Selected Areas in Communications,

22(1):41–53.

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

186