Boosting Performance and Scalability in Cloud-deployed Databases

J. E. Armend´ariz-I˜nigo

, J. Legarrea

, J. R. Gonz´alez de Mend´ıvil

, A. Azqueta-Alzuaz

M. Louis-Rodr´ıguez

, I. Arrieta-Salinas

and F. D. Mu˜noz-Esco´ı

Dpto. de Ing. Matem´atica e Inform´atica, Univ. P´ublica de Navarra, Campus de Arrosad´ıa, 31006 Pamplona, Spain

Instituto Tecnol´ogico de Inform´atica, Univ. Polit`ecnica de Val`encia, 46022 Valencia, Spain

Keywords:

Uncritical Data, Databases, Transactions, Replication, Scalability.

Abstract:

Eventual consistency improves the scalability of large datasets in cloud systems. We propose a novel technique

for managing different levels of replica consistency in a replicated relational DBMS. To this end, data is

partitioned and managed by a partial replication protocol that is able to deﬁne a hierarchy of nodes with a

lazy update propagation. Nodes in different layers of the hierarchy may maintain different versions of their

assigned partitions. Transactions are tagged with an allowance parameter k that speciﬁes the maximum degree

of data outdatedness tolerated by them. As a result, different degrees of transaction criticality can be set and

non-critical transactions may be completed without blocking nor compromising the critical ones.

1 INTRODUCTION

Distributed architectures have been proposed as a so-

lution to deal with diverse issues of web systems,

like component development to service an increasing

number of users, replicating dynamic web content,

achieving fault-tolerance and performance by replica

proximity. So far, however, there has been little dis-

cussion about web data critical nature inﬂuence in the

replication techniques. Our objective is to determine

whether it is possible to work in a replicated environ-

ment and manage web data criticality to increase the

system performance, its scalability and availability.

We consider systems with two types of data: crit-

ical and non-critical. For example, information about

hotels and their rates can be found on some websites.

One can think that the information about their avail-

ability and rates is non-critical while the booking pro-

cess is critical. This paper emphasizes the treatment

of such non-critical data in a special way to boost the

system throughput.

Data is partially replicated; i.e., no replica holds

the entire state of the database. Data consistency is

managed with a hybrid replication protocol following

the ideas presented in (Arrieta-Salinas et al., 2012).

This system assigns a certain number of replicas to

each partition and these replicas are placed following

an onion structure. The core layer will have the most

recent data version while outer layers will have stale

data versions, though still consistent. Thus, we im-

plement adjustable consistency so it provides strong

consistency at the core and eventual consistencyin the

outer layers.

This model is extended to handle the critical/non-

critical data. The client sets up which data is non-

critical. The system partitions the data in a smart

way (Curino et al., 2010). Some non-critical data is

inside each partition and the critical data is accessed

in the core while non-critical data will be modiﬁed

and accessed in the outer layers. The latter also forms

another set of multiversioned layers where the origi-

nal core (critical data) of the partition behaves as an

outer layer of non-critical data. Thus, we are generat-

ing different data cores inside each partition.

We set some replication rules to infer if non-

critical data will need to process updates or defer them

in order to provide the consistency demanded by the

application. Clients will identify their transaction as

non-critical. This kind of transaction will commit but

its changes will not be viewed in other nodes until a

given time or on demand. This increments the system

throughput by delaying non-critical transactions.

The rest of the paper is organized as follows: Sec-

tions 2 and 3 deal with the motivation and model of

our proposal. Section 4 describes how to include the

features of critical and non-critical data in a system.

525

E. Armendáriz-Iñigo J., Legarrea J., R. González de Mendívil J., Azqueta-Alzúaz A., Louis-Rodriguez M., Arrieta-Salinas I. and D. Muñoz-Escoí F..

Boosting Performance and Scalability in Cloud-deployed Databases.

DOI: 10.5220/0004405605250528

In Proceedings of the 3rd International Conference on Cloud Computing and Services Science (CLOSER-2013), pages 525-528

ISBN: 978-989-8565-52-5

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

2 MOTIVATION

Observing the increase of web services, we can notice

that some of the data managed by those systems could

be classiﬁed as really critical information, like money

management transactions or on-time auction transac-

tions but some of the data in the same system could

be classiﬁed as uncritical information, like static in-

formation featured by your bank or auction site. Our

approach tries to maximize the scalability and avail-

ability of the whole system by taking this novel ap-

proach.

Web applications choose to store data on

databases with Snapshot Isolation (SI) (Berenson

et al., 1995). This is due to the non-blocking nature

of each read operation executed under that isolation

level, as it reads from a snapshot of committed up-

date transactions up to that transaction beginning.If

we extend the notion of SI to a replicated environ-

ment we obtain the Generalized Snapshot Isolation

(GSI) level (Elnikety et al., 2005): the snapshot got by

a transaction could be any of the previous history of

committed transaction up to its start; thus, SI is a par-

ticular case of GSI. This will cause a beneﬁt in repli-

cated databases, as clients access their closest replicas

and reduce their latency.

Replicated databases run a replication protocol to

manage transactions. It is well known that replica-

tion protocols perform differently depending on the

workload characteristics. For instance, a read inten-

sive partition may provide a higher throughput with

a primary-backup scheme (Wiesmann and Schiper,

2005). On the contrary, a partition whose items

are frequently updated might beneﬁt from an up-

date everywhere replication solution based on total

order broadcast such as certiﬁcation based replica-

tion (Wiesmann and Schiper, 2005). However, update

everywhere protocols suffer from a serious scalability

limitation, as the cost of propagating updates in total

order to all replicas is greatly affected by the number

of involved replicas.

We take the system presented in (Arrieta-Salinas

et al., 2012) as a basis to provide higher scalability

and availability. Hence, data is partitioned (Curino

et al., 2010) and each partition is placed in a set of

replicas, say M, where K of them run a given repli-

cation protocol (either update everywhere or primary

backup) and the rest (M − K) are placed in a repli-

cation tree whose depth and composition depends on

the application. Several, or all, of the K replicas act as

primaries for other backup replicas (those of the ﬁrst

level in the tree) which would asynchronously receive

updates from their respective primaries. At the same

time, backup replicas could act as pseudo-primaries

for other replicas placed at lower layers of the hier-

archy, thus propagating changes along the tree in an

epidemic way. If we augment the replication degree

of a given partition, then we can forward transactions

to different replicas storing it, and thus, transactions

will be more likely to obtain old, though consistent,

snapshots (GSI) and alleviating the traditional prob-

lem of scalability in the core.

The novelty in our approach is to take advantage

of this replication hierarchy and place non-critical

data along the hierarchy tree and proceed with it in

a similar way as with normal data. We will run a pri-

mary copy protocol based on the principles given in

the COLUP (Ir´un-Briz et al., 2003) algorithm. The

resulting protocol increases the performance and re-

duces the abort rate of non-critical update transac-

tions, by re-partitioning (apart from its original par-

titioning based on graphs or any other approach) the

data according to its critical nature. Hence, having

a traditional partitioning schema where critical and

non-critical data are placed in the same partition, we

establish that a certain replica should handle the up-

dates of non-critical data while critical data is updated

at another replica. Under this assumption, critical

transactions can be executed faster and not interleaved

with transactions accessing uncritical data. Those un-

critical transactions may access older data. Mean-

time, other critical transactions could be scheduled,

increasing the age of the snapshots accessed by un-

critical transactions. However, every uncritical trans-

action is characterized by a threshold on the age of the

data it needs to access. As a result, accessing old data

is tolerated by these transactions in the regular case

and such situation will not necessarily lead to their

abortion.

Compared with traditional GSI, what our model

does is to try to anticipate when a transaction is go-

ing to impact (i.e., to present a write-write conﬂict)

with other transactions and when a non-critical trans-

action is going to impact, then we control an alter-

native mechanism. In that case, a transaction A is

aborted in the validation phase only when at least one

of its conﬂicting transactions is critical (or uncritical

but with an allowing threshold lower than that of A).

Uncritical transactions are characterized by an al-

lowance parameter k. The value of k indicates the

number of missed updates tolerated by the uncritical

transaction. This value is 1 or greater than 1 for un-

critical transactions. Implicitly, critical transactions

are those that access at least one critical item and have

a zero value for k. When conﬂicts arise between trans-

actions that access critical data (i.e., critical transac-

tions) and transactions that only access uncritical data

(i.e., uncritical transactions), no critical transaction

CLOSER2013-3rdInternationalConferenceonCloudComputingandServicesScience

526

may be aborted by an uncritical transaction. Addi-

tionally, conﬂicts between uncritical transactions are

allowed in some applications (i.e., in those applica-

tions admitting different values for the k parameter).

For instance, an application may decide that it needs

three different kinds of transactions: (a) critical ones

with k = 0, (b) intermediate uncritical ones with k = 4,

and (c) relaxed uncritical ones with k = 10. In that

scenario, a transaction with k = 4 is able to toler-

ate (and overwrite) conﬂicts generated by transactions

with k = 10 without being aborted in its validation

step. Note that with this validation strategy, a critical

transaction C will be never aborted by any concurrent

conﬂicting transaction with k > 0 that had been com-

mitted while C has been executed.

Nodes maintaining database replicas should mon-

itor the update frequency over their non-primary data

partitions. Based on this, they are able to forecast the

“age” of their maintained snapshot; i.e., the number of

transaction writesets that have been applied in its pri-

mary replica but that have not been applied yet in the

local database replica. When a transaction B is started

with a k value lower than the local forecast snapshot

age, a forced update propagation is requested from

the primary replica. Transaction B is not started until

such update propagation is completed.

3 MODEL

The system model, shown in Figure 1, is divided into:

a) a set of client applications; b) a metadata manager

(MM) that holds the system state (stored in the meta-

data repository) and orchestrates the communication

with both clients and replicas in the replication clus-

ters; and, c) a set of replication clusters (RC), each

storing one data partition.

Client applications interact with the system by us-

ing a library, which acts as a wrapper for the man-

agement of connections with both the MM and the

replicas that serve the transactions. In order to submit

a transaction, the client library sends a request with a

non-critical data threshold to the MM. This informa-

tion determines the partition (or partitions) involved

in the transaction, selects one of the replicas of the

replication cluster storing that partition and sends the

address of the selected replica(s) to the client. Then,

the client directly submits the transaction operations

to the indicated replica(s). The client library main-

tains a cache with replica addresses to avoid perform-

ing a request for each transaction.

The MM module is in charge of maintaining the

metadata repository, which contains the following in-

formation: a mapping between each data item, its

Client

Application Logic

Client Library

Replication Protocol n

GCS

Replication Protocol 2

GCS

Replication Protocol 1

GCS

Client

Application Logic

Client Library

Metadata Manager

Workload Manager

Transaction Manager

Replication Manager

Metadata

Repository

Replication clusters

Data flow

Control flow

Client requests

Monitoring info

Figure 1: System model.

criticality and the partition it is stored in; a mapping

between each data partition (critical/non-critical) and

the set of replicas that belong to the RC that handles

the partition and their respective hierarchy (for criti-

cal partitions, it is needed the replication protocol run-

ning on the core level); and, status and metrics of each

replica.

This information has stringent consistency and

availability requirements. For this reason, the MM

can be replicated among a small set of nodes to pro-

vide fault tolerance while ensuring consistency with

a Paxos algorithm. The information of the metadata

repository is used and updated by the following com-

ponents of the MM:

- Workload manager: it monitors the set of active

replicas in the system. Every active replica must peri-

odically send a heartbeat message to let the workload

manager know that it is alive. These messages also

attach information regarding the status of the sending

replica. This component is also responsible for deter-

mining the partitioning scheme and deciding when a

replica should be upgraded or downgraded in the hi-

erarchy.

- Transaction manager: it assigns partitions to repli-

cas and synchronizes data when a transaction accesses

data items stored in several partitions.

- Replication manager: it chooses the replication pro-

tocol that best ﬁts for each replication cluster and de-

termines the hierarchy level that corresponds to each

replica.

Each RC of Figure 1 consists of a set of replicas

organized as a hierarchy of levels. Recall that inside

each partition, data is split in critical and non-critical

data. Both types of data have their own hierarchy

of levels. Let us start with the critical data, the core

level comprises a group of replicas that propagate up-

dates among themselves by means of a traditional dis-

BoostingPerformanceandScalabilityinCloud-deployedDatabases

527

Figure 2: Critical (blue) and non-critical (green) data man-

agement and epidemic propagation of updates in our sys-

tem.

tributed replication protocol (Wiesmann and Schiper,

2005) that makes use of a group communication sys-

tem to handle the messages among replicas and mon-

itor the set of replicas belonging to the group (blue

ﬁlled circles in Figure 2). The core level of each RC

may be managed by a different replication protocol,

which will be determined by the replication manager

of the MM depending on the current workload char-

acteristics. On the other hand, the replicas that do not

belong to the core level of the hierarchy can be dis-

tributed into several levels forming a tree whose root

is the aforementioned core level, where a replica that

belongs to a given level acts as a backup for a replica

of its immediately upper level and may also act as a

primary for one or more replicas of its lower level (cir-

cles whose line colors are green and blue in Figure 2).

These replicas communicate with their respective pri-

maries using reliable point-to-point channels. On the

non-critical data side, the MM will choose for each

RC one replica along the hierarchy level as the core

of these data items (the green ﬁlled circle in Figure 2)

and the rest of replicas will constitute the non-critical

hierarchy tree of this RC. With the aim of exploiting

the advantages of in-memory approaches, we assume

that every replica keeps all its data in main memory.

4 CONTRIBUTIONS

We are currently developing a partitioning algo-

rithm (Curino et al., 2010) adapted to our system. We

have successfully included the deﬁnition of the proper

replication protocol and the composition of the repli-

cation hierarchy tree. We are going to increase its fea-

tures so as to infer which data should be considered

as non-critical (e.g. data hardly updated). However,

we also consider that this information can be set in

advance by the application. Once the MM establishes

the replica that interacts with the client for a given par-

tition, this replica may manage non-critical transac-

tions. Update transactions are ruled by COLUP (Ir´un-

Briz et al., 2003), so these transactions are executed

in the primary and lazily propagated to other replicas.

We are going to adapt it to our system by way of three

different variants: read the data stored locally (pure

lazy approach); ask for the last version to the primary

copy; or, a hybrid approach. This hybrid approach

may be thought as follows, along with the propaga-

tion of updates (either critical or non-critical) it can

be piggybacked the current version of the non-critical

data. Meanwhile, the user can set a threshold at the

beginning of the transaction saying that it is ﬁne to

read data locally if data is not older than k versions;

otherwise, it will need to connect to the primary and

retrieve the most recent versions.

ACKNOWLEDGEMENTS

This work has been funded by the Spanish Govern-

ment and European FEDER under research grants

TIN2009-14460-C03 and TIN2012-37719-C03.

REFERENCES

Arrieta-Salinas, I., Armend´ariz-I˜nigo, J., and Navarro, J.

(2012). Classic replication techniques on the cloud.

In Availability, Reliability and Security (ARES), 2012

Seventh International Conference on, pages 268 –273.

Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil,

E., and O’Neil, P. (1995). A critique of ANSI SQL

isolation levels. SIGMOD Rec., 24(2):1–10.

Curino, C., Zhang, Y., Jones, E. P. C., and Madden,

S. (2010). Schism: a workload-driven approach

to database replication and partitioning. PVLDB,

3(1):48–57.

Elnikety, S., Zwaenepoel, W., and Pedone, F. (2005).

Database replication using generalized snapshot isola-

tion. In SRDS, pages 73–84. IEEE Computer Society.

Ir´un-Briz, L., Mu˜noz-Esco´ı, F. D., and Bernab´eu-Aub´an,

J. M. (2003). An improved optimistic and fault-

tolerant replication protocol. Lecture Notes in Com-

puter Science, 2822:188–200.

Wiesmann, M. and Schiper, A. (2005). Comparison of

database replication techniques based on total order

broadcast. IEEE Trans. Knowl. Data Eng., 17(4):551–

566.

CLOSER2013-3rdInternationalConferenceonCloudComputingandServicesScience

528