RELAXING CORRECTNESS CRITERIA IN DATABASE

REPLICATION WITH SI REPLICAS

J. E. Armend´ariz-

I˜nigo, J. R. Gonz´alez de Mend´ıvil, J. R. Garitagoitia, J. R. Ju´arez-Rodr´ıguez

Universidad P´ublica de Navarra, 31006 Pamplona, Spain

F. D. Mu˜noz-Esco´ı, L. Ir´un-Briz

Instituto Tecnol´ogico de Inform´atica, 46022 Valencia, Spain

Keywords:

Database replication, distributed databases, snapshot isolation, read one write all, correctness criteria, formal

proofs.

Abstract:

The concept of Generalized Snapshot Isolation (GSI) has been recently proposed as a suitable extension of

conventional Snapshot Isolation (SI) for replicated databases. In GSI, transactions may use older snapshots

instead of the latest snapshot required in SI, being able to provide better performance without signiﬁcantly

increasing the abortion rate when write/write conﬂicts among transactions are low. We study and formally

proof a sufﬁcient condition that replication protocols with SI replicas following the deferred update technique

must obey to achieve GSI. They must provide global atomicity and commit update transactions in the very

same order at all sites. However, as this is a sufﬁcient condition, it is possible to obtain GSI by relaxing certain

assumptions about the commit ordering of certain update transactions.

1 INTRODUCTION

Snapshot Isolation (SI) is the isolation level provided

by several commercial database systems, such as Or-

acle, PostgreSQL, Microsoft SQL Server or InterBase.

Transactions executed under SI allows to read from

the last committed snapshot and, hence, read oper-

ations are never blocked nor conﬂict with any other

update transaction. In order to prevent the lost update

phenomenon (Berenson et al., 1995), concurrent up-

date transactions (read-only transactions are always

committed) modifying the same data item apply the

ﬁrst-committer-wins rule: only the ﬁrst transaction

that commits is allowed to proceed the remainder are

aborted. This turns out into a nice feature because it

provides sufﬁcient data consistency (though not seri-

alizable (Fekete et al., 2005; Elnikety et al., 2005))

for non-critical applications while it maintains a good

performance, since read-only transactions are neither

delayed, blocked nor aborted and they never cause

update transactions to block or abort. This behavior

is important for workloads dominated by read-only

transactions, such as those resulting from dynamic

content Web servers (Plattner et al., 2008).

Many enterprise applications demand high avail-

ability since they have to provide continuous ser-

vice to their users. This also implies to replicate

the information being used; i.e., to manage replicated

databases. The concept of Generalized Snapshot Iso-

lation (GSI, concurrently to this a similar deﬁnition

denoted as 1-copy-SI was proposed in (Lin et al.,

2005)) has been recently proposed (Elnikety et al.,

2005) in order to provide a suitable extension of con-

ventional SI for replicated databases based on mul-

tiversion concurrency control. In GSI, transactions

may use older snapshots instead of the latest snap-

shot required in SI (setting up the latest snapshot in

a distributed setting is not trivial). Actually, authors

of (Elnikety et al., 2005) outline an impossibility re-

sult which justiﬁes the use of GSI in database replica-

tion: “there is no non-blocking implementation of SI

in an asynchronous system, even if databases never

fail” which has been formally justiﬁed in (Gonz´alez

de Mend´ıvil et al., 2007).

The deferred update technique (Pedone, 1999)

consists in executing transactions at their delegate

replicas (obtaining their corresponding snapshot) and

setting up a commit ordering for update transac-

tions which is mainly done thanks to the total or-

der broadcast (Chockler et al., 2001). When a trans-

E. Armendáriz-Íñigo J., R. González de Mendívil J., R. Garitagoitia J., R. Juárez-Rodríguez J., D. Muñoz-Escoí F. and Irún-Briz L. (2008).

RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS.

In Proceedings of the Third International Conference on Software and Data Technologies - ISDM/ABF, pages 45-53

DOI: 10.5220/0001877700450053

 SciTePress

action requests its commitment (read-only transac-

tions are committed right away) its updates are col-

lected and broadcast (using the total order primitive)

to the rest of replicas. Upon its delivery at replicas

a validation test (i.e. to detect conﬂicts with other

concurrent transactions in the system) is performed;

namely a certiﬁcation test (Wiesmann and Schiper,

2005) that performs the distributed ﬁrst-committer-

wins rule (Elnikety et al., 2005; Lin et al., 2005) in the

same way at al replicas and ensures the same order of

the commit process of transactions. The main advan-

tage of these replication protocols is that transactions

can start at any time without restriction or delay.

In this paper, we formalize the requirements for

achieving GSI over SI replicas using non-blocking

protocols. Thus, the criteria for implementing GSI

are: (i) Each submitted transaction to the system ei-

ther commits or aborts at all sites (atomicity); (ii) All

update transactions are committed in the same total

order at every site (total order of committed trans-

actions). Total order ensures that all replicas see

the same sequence of transactions, being thus able

to provide the same snapshots to transactions, inde-

pendently of their starting replica; i.e. giving the

logical vision of a one copy scheduler (1-Copy-GSI).

Whereas atomicity guarantees that all replicas take

the same actions regarding each transaction, so their

states should be consistent, once each transaction has

been terminated.

One can think that these assumptions are rather

intuitive but they constitute the milestone for our con-

tribution of the paper. It consists in somehow relax-

ing the assumption of the total order of committed

transactions. If a protocol is not careful about that,

those transactions without write/write conﬂicts might

be applied in different orders in different replicas. So,

transactions would be able to read different versions

in different replicas. However, this optimization is

important since processing messages serially as sup-

posed for replication protocols deployed over a group

communication system (Chockler et al., 2001) would

result in signiﬁcantly lower throughput rates. A re-

laxing assumption has been already presented in (Lin

et al., 2005), still using the total order broadcast, it lets

validated transactions to apply (and commit) transac-

tions concurrently as long as their respective updates

do not intersect. However, this protocol needs to block

the execution of the ﬁrst operation of any starting

transaction until the concurrent application of trans-

actions ﬁnishes. Thus, it is easy to see that there are

multiple approaches to obtain GSI at the price of im-

posing certain restrictions, in particular, the need to

block the start of transactions to obtain a global con-

sistent snapshot. Finally, we take a look and discuss

how to relax this last contribution, which is actually

too strong, for deploying GSI non-blocking protocols.

The rest of the work is organized as follows

. Sec-

tion 2 introduces the concept of multiversion histories

based on (Bernstein et al., 1987). Sections 3 and 4

give the concepts of SI and GSI respectively. In Sec-

tion 5, the structure of deferred update replication pro-

tocols is introduced. Conditions for 1-Copy-GSI is in-

troduced in 6. We take a look at how to relax con-

ditions for 1-Copy-GSI in Section 7. Finally, conclu-

sions end the paper.

2 MULTIVERSION HISTORIES

In the following, we deﬁne the concept of multiver-

sion history for committed transactions using the the-

ory provided in (Bernstein et al., 1987). The prop-

erties studied in our paper only require to deal with

committed transactions. To this end, we ﬁrst deﬁne

the basic building blocks for our formalizations, and

then the different deﬁnitions and properties will be

shown.

A database (DB) is a collection of data items,

which may be concurrently accessed by transactions.

A history represents an overall partial ordering of the

different operations concurrently executed within the

context of their corresponding transactions. Thus, a

multiversion history generalizes a history where the

database items are versioned.

To formalize this deﬁnition, each transaction sub-

mitted to the system is denoted by T

. A transaction is

a sequence of read and write operations on database

items ended by a commit or abort operation. Each T

’s

write operation on item X is denoted W

). A read

operation on item X is denoted R

) stating that T

reads the version of X installed by T

. Finally, C

and

denote the T

’s commit and abort operation respec-

tively. We assume that a transaction does not read an

item X after it has written it, and each item is read and

written at most once. Avoiding redundant operations

simpliﬁes the presentation. The results for this kind

of transactions are seamlessly extensible to more gen-

eral models. In any case, redundant operations can be

removed using local variables in the program of the

transaction (Papadimitriou, 1986).

Each version of a data item X contained in the

database is denoted by X

, where the subscript stands

for the transaction identiﬁer that installed that version

in the DB. The readset and writeset (denoted by RS

and WS

respectively) express the sets of items read

Due to space constraints, the reader is referred

to (Gonz´alez de Mend´ıvil et al., 2007) for a thorough ex-

planation of the correctness proof.

ICSOFT 2008 - International Conference on Software and Data Technologies

(written) by a transaction T

. Thus, T

is a read-only

transaction if WS

0 and it is an update one, other-

wise.

Let T = {T

, . . . , T

} be a set of committed trans-

actions, where the operations of T

are ordered by

≺

. The last operation of a transaction is the com-

mit operation. To process operations from a trans-

action T

∈ T, a multiversion scheduler must translate

’s operations on data items into operations on spe-

ciﬁc versions of those data items. That is, there is a

function h that maps each W

(X) into W

), and each

(X) into R

) for some T

∈ T.

Deﬁnition 1. A Complete Committed Multiversion

(CCMV) history H over T is a partial order with or-

der relation ≺ such that:

(1) H = h(

∈T

) for some translation function h.

(2) ≺⊇

∈T

≺

(3) If R

) ∈ H, i 6= j, thenW

) ∈ H andC

≺ R

In the previous Deﬁnition 1 condition (1) indi-

cates that each operation submitted by a transaction

is mapped into an appropriate multiversion operation.

Condition (2) states that the CCMV history preserves

all orderings stipulated by transactions. Condition (3)

establishes that if a transaction reads a concrete ver-

sion of a data item, it was written by a transaction that

committed before the item was read.

Deﬁnition 1 is more speciﬁc than the one stated

in (Bernstein et al., 1987), since the former only in-

cludes committed transactions and explicitly indicates

that a new version may not be read until the transac-

tion that installed the new version has committed. In

the rest of the paper, we use the following conven-

tions: (i) T = {T

, . . . , T

} is the set of committed trans-

actions for every deﬁned history; and, (ii) any history

H is a CCMV history over T. Note that these con-

ventions will be also applicable when a superscript is

used to denote the site of the database where the his-

tory is generated.

In general, two histories (H, ≺) and (H

′

, ≺

′

) over

the same set of transactions are view equivalent(Bern-

stein et al., 1987), denoted as H ≡ H

′

if they contain

the same operations, have the same reads-from rela-

tions, and produce the same ﬁnal writes. The notion

of equivalence of CCMV histories reduces to the sim-

ple condition, H = H

′

, if the following reads-from re-

lation is used: T

reads X from T

in a CCMV history

(H, ≺), if and only if R

) ∈ H.

3 SNAPSHOT ISOLATION

In SI reading from a snapshot means that a transac-

tion T

sees all the updates done by transactions that

committed before the transaction started its ﬁrst oper-

ation. The results of its writes are installed when the

transaction commits. However, a transaction T

will

successfully commit if and only if there is not a con-

current transaction T

that has already committed and

some of the written items by T

are also written by T

From our point of view, histories generated by a given

concurrency control providing SI may be interpreted

as multiversion histories with time restrictions.

Deﬁnition 2. Let (H, ≺) be a history and t : H → R

mapping such that it assigns to each operation op ∈ H

its real time occurrence t(op) ∈ R

. The schedule H

of the history (H, ≺) veriﬁes:

(1) If op, op

′

∈ H and op ≺ op

′

then t(op) < t(op

′

(2) If t(op) = t(op

′

) and op, op

′

∈ H then op = op

′

The mapping t() totally orders all operations of

(H, ≺). Condition (1) states that the total order < is

compatible with the partial order ≺. Condition (2) es-

tablishes, for sake of simplicity, the assumption that

different operations will have different times. We are

interested in operating with schedules since it facili-

tates the work, but only with the ones that derive from

CCMV histories over a concrete set of transactions T.

One can note that an arbitrary time labeled sequence

of versioned operations, e.g. (R

),t

), (W

),t

)

and so on, is not necessarily a schedule of a history.

Thus, we need to put some restrictions to make sure

that we work really with schedules corresponding to

possible histories.

Property 1. Let S

be a time labeled sequence of ver-

sioned operations over a set of transactions T, S

is a

schedule of a history over T if and only if it veriﬁes

the following conditions:

(1) item there exists a mapping h such that S =

i∈T

(2) if op,op

′

∈ T

and op ≺

′

then t(op) < t(op

′

) in

(3) if R

) ∈ S and i 6= j then W

) ∈ S and t(C

) <

t(R

)).

(4) if t(op) = t(op

′

) and op, op

′

∈ S then op = op

′

The proof of this fact can be inferred trivially. In

the following, we use an additional convention: (iii) A

schedule H

is a schedule of a history (H, ≺). Note that

every schedule H

may be represented by writing the

operations in the total order (<) induced by t(). We

deﬁne the “commit time” (c

) and “begin time” (b

)

for each transaction T

∈ T in a schedule H

as c

= t(C

)

and b

= t(ﬁrst operation of T

), holding b

< c

by def-

inition of t() and ≺

. In the following, we formalize

the concept of snapshot of the database. Intuitively, it

comprises the latest version of each data item. Firstly,

we will see an example of this:

Example 1. Let us consider the following transac-

RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS

tions T

, T

and T

: T

= {R

(X), W

(X), c

}, T

(Z),R

(X),W

(Y), c

}, T

= {R

(Y),W

(X), c

}. A

sample of a possible schedule of these transac-

tions might be the following one: b

) W

. As this

example shows, each transaction is able to include

in its snapshot (and read from it) the latest committed

version of each existing item at the time such transac-

tion was started. Thus T

has read version 1 of item

X since T

has generated such version and it has al-

ready committed when T

started. But it only reads

version 0 of item Z since no update of such item is

seen by T

. This is true despite transactions T

and T

are concurrent and T

updates X before T

reads such

item, because the snapshot taken for T

is previous to

the commit of T

This example provides the basis for deﬁning what

a snapshot is. For that purpose, we need to deﬁne

ﬁrst the set of installed versions of a data item X in a

schedule H

, as the set Ver(X, H) = {X

: W

) ∈ H} ∪

, being X

its initial version.

Deﬁnition 3. The snapshot of the database DB

at time τ ∈ R

for a schedule H

, is deﬁned as:

Snapshot(DB, H

, τ) =

X∈DB

latestVer(X, H

, τ) where

the latest version of each item X ∈ DB at time τ is

the set: latestVer(X, H

, τ) = {X

∈ Ver(X, H): (∄ X

∈

Ver(X, H): c

< c

≤ τ)}

From the previous deﬁnition, it is easy to show

that a snapshot is modiﬁed each time an update trans-

action commits. If τ = c

and X

∈ Ver(X, H), then

latestVer(X, H

, c

) = {X

}. In order to formalize the

concept of SI-schedule, we utilize a slight variation

of the predicate impacts for update transactions pre-

sented in (Elnikety et al., 2005). Two transactions T

∈ T impact at time τ ∈ R

in a schedule H

, denoted

impacts T

at τ, if the following predicate holds:

0 ∧ τ < c

< c

Deﬁnition 4. A schedule H

is a SI-schedule if and

only if for each T

∈ T:

(1) if R

) ∈ H then X

∈ Snapshot(DB, H

, b

); and,

(2) for each T

∈ T : ¬(T

impacts T

at b

Condition (1) states that all the versions read by a

transaction T

are obtained from Snapshot(DB, H

, b

);

that is, versions are obtained from the snapshot of the

database DB at the time the transaction starts its ﬁrst

operation. Condition (2) states that any pair of trans-

actions T

and T

, writing over some common data

items, can not overlap their time intervals [b

, c

] and

, c

]. In other words, they have to be executed in

a serial way. Other equivalent deﬁnitions of SI have

been provided in the literature (Berenson et al., 1995;

Kemme, 2000; Lin et al., 2005; Fekete et al., 2005;

Elnikety et al., 2005).

4 THE GSI LEVEL

The concept of Generalized Snapshot Isolation (or

GSI, for short) was ﬁrstly applied to database repli-

cation in (Elnikety et al., 2005). A hypothetical con-

currency control algorithm could have stored some

past snapshots. A transaction may receive a snapshot

that happened in the system before the time of its ﬁrst

operation (instead of its current snapshot as in a SI

concurrency control algorithm). The algorithm may

commit the transaction if no other transaction impacts

with it from that past snapshot. Thus, a transaction

can observe an older snapshot of the DB but the write

operations of the transaction are still valid update op-

erations for the DB at commit time. These previous

ideas deﬁne the concept of GSI.

Deﬁnition 5. A schedule H

is a GSI-schedule if and

only if for each T

∈ T there exists a value s

∈ R

such

that s

≤ b

and:

(1) if R

) ∈ H then X

∈ Snapshot(DB, H

, s

); and,

(2) for each T

∈ T : ¬(T

impacts T

at s

Condition (1) states that every item read by a

transaction belongs to the same (possible past) snap-

shot. Condition (2) also establishes that the time in-

tervals [s

, c

] and [s

, c

] do not overlap for any pair of

write/write conﬂicting transactions T

and T

. If for all

∈ T, conditions (1) and (2) hold for s

= b

then H

a SI-schedule. Thus, Deﬁnition 5 includes as a partic-

ular case the Deﬁnition 4. Another observation of the

deﬁnition concludes that if there exists a transaction

∈ T such that conditions (1) and (2) are only veri-

ﬁed for a value s

< b

then there is an item X ∈ RS

for

which latestVer(X, H

, s

) 6= latestVer(X, H

, b

). That is,

the transaction T

has not seen the latest version of X

at the begin time b

. There was a transaction T

with

) ∈ H such that s

< c

< b

. This can be best seen

in the next example.

Example 2. The following is an example of a GSI-

schedule: b

)

. In this schedule, transaction

reads X

after the commit of T

appears. This

would not be correct for a SI-schedule (since

the read version of X is not the latest one), but

it is perfectly valid for a GSI-schedule, taken

the time point of the snapshot provided to T

(i.e. s

) previous to the commit of T

, as it is shown:

. The intuition under this schedule in a dis-

tributed system is that the message containing the

modiﬁcations of T

(the write operation on X) would

have not yet arrived to the site at the time transaction

began. This may be the reason for T

to see this

previous version of item X. The fact that GSI captures

ICSOFT 2008 - International Conference on Software and Data Technologies

these delays into schedules makes attractive its usage

on distributed environments.

The value s

in Deﬁnition 5 plays the same role as

in Deﬁnition 4. Thus, it is possible to think that if

the operations in the GSI-schedule obtained from the

history H had been ‘on time’ then the schedule would

have been a SI-schedule.

Example 3. Let us use Example 2 to show how a

GSI-schedule can be transformed into a SI-schedule.

Thus, to turn that GSI-schedule into a SI-schedule, it

is just needed to move the beginning of T

back to

, and consequently, the resulting schedule will be

a SI-schedule: b

) b

) c

) R

)

) W

) c

. However, this schedule

does not ﬁt the deﬁnition of b

, which was described as

the time of the ﬁrst operation a transaction performs.

Thus, such ﬁrst operation of transaction T

must be

also moved in the SI-schedule, resulting in the follow-

ing: b

) b

) W

) c

) b

)

) c

The following property describes the previous

transformation in a formal way:

Property 2. Let H

be a GSI-schedule. There is a

mapping t

′

: H → R

such that H

′

is a SI-schedule.

This last property states that if H

is a GSI-

schedule, there will exist a H

′

, which is actually a

SI-schedule, and verify the following H

≡ H

′

(in the

sense of view-equivalence).

5 THE DEFERRED UPDATE

TECHNIQUE

The GSI concept is particularly interesting in repli-

cated databases, since many replication protocols ex-

ecute each transaction initially in a delegate replica,

propagating later its updates to the rest of repli-

cas (Lin et al., 2005; Elnikety et al., 2005; Ar-

mend´ariz-I˜nigo et al., 2007). This means that trans-

action writesets cannot be immediately applied in all

replicas at a time and, due to this, the snapshot being

used in a transaction might be “previous” to the one

that (regarding physical time in a hypothetical cen-

tralized system) would have been assigned to it. In

this Section we consider a distributed system that con-

sists of m sites, being I

= {1..m} the set of site iden-

tiﬁers. Sites communicate among them by reliable

message passing. We make no assumptions about the

time it takes for sites to execute and for messages to

be transmitted. We assume a system free of failures

Otherwise, writes will only be applied on the avail-

able replicas, but all our discussion is orthogonal to failures

Each site k runs an instance of the database manage-

ment system and maintains a copy of the database DB.

We will assume that each database copy, denoted DB

with k ∈ I

, provides SI (Berenson et al., 1995).

We use the transaction model of Section 2. Let

T = {T

: i ∈ I

} be the set of transactions submitted to

the system; where I

= {1..n} is the set of transaction

identiﬁers.

The deferred update technique deﬁnes for each

transaction T

∈ T, the set of transactions {T

: k ∈ I

}

in which there is only one, denoted T

site(i)

, verifying

site(i)

= RS

and WS

site(i)

= WS

; for the rest of the

transactions, T

, k 6= site(i), RS

0 and WS

= WS

site(i)

determines the local transaction of T

, i.e., the

transaction executed at its delegate replica or site,

whilst T

, k 6= site(i), is a remote transaction of T

, i.e.,

the updates of the transaction executed at a remote

site. An update transaction reads at one site and writes

at every site, while a read-only transaction only ex-

ists at its local site. In the rest of the paper, we con-

sider the general case of update transactions with non-

empty sets.

Let T

= {T

: i ∈ I

} be the set of transactions sub-

mitted at each site k ∈ I

for the set T. Some of these

transactions are local at k while others are remote

ones. In the next, the Assumption 1 implies that each

transaction submitted to the system either commits at

all replicas or in none of them. Thus, the updates ap-

plied in a delegate replica by a given transaction are

also applied in the rest of replicas. Obviously, we con-

sider a fully-replicated system. Since only committed

transactions are relevant, the histories being generated

at each site should be histories over T

, as deﬁned

above.

Assumption 1 (Atomicity). H

is a CCMV history

over T

for all sites k ∈ I

In the considered distributed system there is not

a common clock or a similar synchronization mech-

anism. However, we can use a real time mapping

t :

k∈I

) → R

that totally orders all operations

of the system. This mapping is compatible with each

partial order ≺

deﬁned for H

for each site k ∈ I

In the following, we consider that each DB

provides

SI-schedules under the previous time mapping.

Assumption 2 (SI Replicas). H

is a SI-schedule of

the history H

for all sites k ∈ I

In order to study the level of consistency imple-

mented by this kind of non-blocking protocols is nec-

essary to deﬁne the one copy schedule (1C-schedule)

obtained from the schedules at each site. In the next

and can be seamlessly extended to a system where failures

might arise.

RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS

deﬁnitions, properties and theorems we use the fol-

lowing notation: for each transaction T

, i ∈ I

, C

min(i)

denotes the commit operation of the transaction T

site min(i) ∈ I

such that c

min(i)

= min

k∈I

} under the

considered mapping t().

Deﬁnition 6 (1C-schedule). Let T = {T

: i ∈ I

} be the

set of submitted transactions to a replicated database

system with a non-blocking deferred update strategy

that veriﬁes Assumption 1 and Assumption 2. Let

S =

k∈I

) b

e the set formed by the union of the

histories H

over T

= {T

: i ∈ I

}. And let t : S → R

be the mapping that totally orders the operations in S.

The 1C-schedule, H

′

= (H,t

′

: H → R

), is built from S

and t() as follows. For each i ∈ I

and k ∈ I

(1) Remove from S operations such that: W

)

, with

k 6= site(i), or C

, with k 6= min(i).

(2) H is obtained with the rest of operations in S after

step (1), applying the renaming: W

) = W

)

site(i)

;

) = R

)

site(i)

; and, C

= C

min(i)

(3) Finally, t

′

() is obtained from t() as follows:

′

)) = t(W

)

site(i)

); t

′

)) = t(R

)

site(i)

);

and, t

′

) = t(C

min(i)

)

As t

′

() receives its values from t(), we write, H

instead of H

′

. In the 1C-schedule H

, for each transac-

tion T

, is trivially veriﬁed b

< c

because this tech-

nique guarantees that for all k 6= site(i), b

site(i)

< b

The 1C history H, that is formed by the operations

over the logical DB, is also a history over T. We

prove this fact informally. By the renaming (2) in

Deﬁnition 6, each transaction T

, has its operations

over the data items in RS

and WS

, and ≺

is triv-

ially maintained in a partial order ≺ for H, because

contains the local operations of T

site(i)

. H is also

formed by committed transactions, under Assump-

tion 1; for each T

, C

∈ H. Finally, if R

) ∈ H, then

)

site(i)

∈ H

site(i)

. As H

site(i)

is a history over T

site(i)

then C

site(i)

≺ R

)

site(i)

. By deﬁning C

min( j)

≺ C

site(i)

in S then C

min( j)

≺ R

)

site(i)

and soC

≺ R

). Thus

H can be deﬁned as a history over T.

Transformation (2) on Deﬁnition 6 ensures that a

transaction is committed as soon as it has been com-

mitted at the ﬁrst replica. Finally, no restriction about

the beginning of a transaction is imposed in this def-

inition. Hence, this deﬁnition is valid for the most

general case of non-blocking protocols. Although As-

sumptions 1 and 2 are included in Deﬁnition 6, they

do not guarantee that the obtained 1C-schedule is a

SI-schedule. This is best illustrated in the following

example, where it is also shown how the 1C-schedule

may be built from each site SI-schedules.

Example 4. In this example two sites (A, B) and

the next set of transactions T

, T

are consid-

ered: T

= {R

(Y),W

(X)}, T

= {R

(Z),W

(X)}, T

(X),W

(Z)}, T

= {R

(X), R

(Z),W

(Y)}. Figure 1

illustrates the mapping described in Deﬁnition 6 for

building a 1C-schedule from the SI-schedules seen in

the different nodes I

. T

and T

are locally executed

at site A (RS

0 and RS

0) whilst T

and T

are ex-

ecuted at site B respectively. The writesets are after-

wards applied at the remote sites. Schedules obtained

at both sites are SI-schedules, i.e. transactions read

the latest version of the committed data at each site.

The 1C-schedule is obtained from Deﬁnition 6. For

example, the commit of T

occurs for the 1C-schedule

in the minimum of the interval between C

and C

and so on for the remaining transactions. In the 1C-

schedule of Figure 1, T

reads X

and Z

but the X

version exists between both (since X

was installed at

site A). T

and T

, satisfying that WS

0, are

executed at both sites in the same order. As T

and T

are not executed in the same order with regard to T

the obtained 1C-schedule is neither SI nor GSI.

6 1-COPY-GSI SCHEDULES

The 1C-schedule H

obtained in Deﬁnition 6 will be

a GSI-schedule if it veriﬁes the conditions given in

Deﬁnition 5. The question is what conditions local SI-

schedules, H

, have to verify in order to guarantee that

is a GSI-schedule. Taking into account the order-

ing of conﬂicting transactions in GSI-equivalence, we

consider the kind of protocols that guarantee the same

total order of the commit operations for the transac-

tions with write/write conﬂicts at everysite. However,

the execution of write/write conﬂicting transactions in

the same order at all sites does not offer SI nor GSI, as

it has been shown in Example 4. Therefore, it is also

necessary to consider the need of reading from a con-

sistent snapshot from the notion of GSI-equivalence;

i.e. all update transactions must be committed in the

very same order at all sites. As a result, since all repli-

cas generate SI-schedules and their local snapshots

have received the same sequence of updates, trans-

actions starting at any site are able to read a particular

snapshot, that perhaps is not the latest one, but that is

consistent with those of other replicas.

Assumption 3

(Total Order of Committing Transac-

ions). For each pair T

, T

∈ T, a unique order relation

< c

holds for all SI-schedules H

with k ∈ I

The SI-schedules H

have the same total order of

committed transactions. Without loss of generaliza-

tion, we consider the following total order in the rest

of this section: c

< c

< ... < c

for every k ∈ I

. In

the next property we are going to verify that, thanks to

ICSOFT 2008 - International Conference on Software and Data Technologies

1CS

Time

Figure 1: Replicated one-copy execution not providing CSI nor GSI.

the total order, versions of items read by a transaction

belong to the same snapshot in a given time interval.

This interval is determined for each transaction T

two commit times, denoted c

and c

. The former

corresponds to the commit time of a transaction T

such that T

reads from T

for the last time and from

then it performs no other read operation. The latter

corresponds to the commit time of a transaction T

so that it is the ﬁrst transaction, after T

, that veriﬁes

∩ RS

0 and hence modifying the snapshot of

the transaction T

. In case that T

does not exist, the

correctness interval for T

will extend from c

to b

Property 3. Let H

be a 1C-schedule verifying As-

sumption 3. For each T

∈ T if R

) ∈ H then X

∈

Snapshot(DB, H

, τ) and τ ∈ R

satisﬁes c

≤ τ < c

≤

The aim of the next theorem is to prove that the

1C-schedules generated by any deferred update pro-

tocol that veriﬁes Assumption 3 are actually GSI-

schedules; i.e., they comply with all conditions stated

in Deﬁnition 5. Whilst proving that a transaction al-

ways reads from the same snapshot in a particular

time interval is easy, it is not trivial to prove that for a

given transaction T

there has not been any other trans-

action T

that has impacted T

and that has been com-

mitted whilst T

was being executed. However, due

to the total commit order an induction proof is possi-

ble, showing that the obtained 1C-schedule veriﬁes all

conditions in order to be a GSI-schedule.

Theorem 1. Under Assumption 3, the 1C-schedule H

is a GSI-schedule.

This theorem formally justiﬁes such protocols cor-

rectness and establishes that their resulting isolation

level is GSI; the proof of it is given in (Gonz´alez de

Mend´ıvil et al., 2007). Additionally, it is worth not-

ing that Assumption 3 is a sufﬁcient condition, but

not necessary, for obtaining GSI. Despite this, repli-

cation protocols that comply with such an assumption

are easily implementable. In the next section, we ana-

lyze how to relax this assumption while obtaining GSI

schedules with non-blocking protocols.

7 RELAXING ASSUMPTIONS

Assumption 3 (Total order of committing transact-

ions) is very strong. It forces to install the same snap-

shots in the same order at every replica. Thus, The-

orem 1 guarantees that the 1C-schedule H

is a GSI-

schedule. On the contrary, the total order of conﬂict-

ing transactions is not enough to guarantee SI nor GSI

(see Example 4) and it requires a stronger condition:

it is needed that the snapshot gotten by a transaction

at its delegate replica matches the 1C-schedule, actu-

ally being the latter a GSI-schedule. However, this

fact does not necessarily oblige each replica to install

the same snapshots as in the 1C-schedule. That is, if

) belongs to H

then X

∈ (Snapshot(DB, H

, b

) ∩

Snapshot(DB, H

site(i)

, b

)). From what it has been de-

picted before, it is clear that if you want to relax As-

sumption 3, you have to provide some property that

sets a relation between the reads-from relationship of

a transaction in the 1C-schedule and the reads-from

relationship of the transaction local schedule at its

delegate site. In the next, we provide more relaxing

assumptions to obtain a 1C-schedule providing GSI.

Assumption 4. For each pair T

, T

∈ T with WS

∩

0, a unique order relation c

< c

holds for all

SI-schedule H

with k ∈ I

; and, if there is some trans-

action T

∈ T such that c

< c

holds for some site

k ∈ I

then it holds for every k ∈ I

This assumption states that between two conﬂict-

ing transactions their commit ordering is the same at

every site. Moreover, it also states that between both

transactions, there are the same subset of committed

transactions; no matter the order in which they occur.

Example 5. Let us suppose that there are

two replicas and the next set of transactions:

, T

} with WS

∩ WS

0 and the rest do not conﬂict among

each other. At the ﬁrst site you can ﬁnd the following

local SI-schedule: c

< c

whilst at the second site the derived SI-schedule can

be: c

< c

. In the latter, the

RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS

commit ordering of transactions T

and T

is different

from the scheduling of the former.

As it may be inferred, Assumption 4 becomes As-

sumption 3 whenever the pattern of transactions do

not allow to reorder the commit of transactions. In

Example 5, it cannot happen without violating As-

sumption 4 the following: c

< c

. On the other hand,

taking Assumption 4 to the extreme, if all transac-

tions do not conﬂict among them any committing or-

der can be obtained at each site. To limit these sit-

uations from making their appearance, it is needed

to enforce to each transaction to read from the same

snapshot like for each pair of transactions T

, T

∈ T

with WS

\RS

0: they verify that if c

< b

in H

then

site(i)

< c

site(i)

in H

site(i)

. WS

\RS

0: they verify that

if c

< b

in H

then c

site(i)

in H

site(i)

which is

stated in the next assumption.

Assumption 5 (Compatible Snapshot Read). Let H

be a 1C-schedule, for each T

∈ T there exists s

≤ b

such that if R

) ∈ H

then X

∈ (Snapshot(DB, H

, s

)∩

Snapshot(DB, H

site(i)

, b

)).

This last assumption means that each transaction

reads data items that belong to a valid global snapshot

from the 1C-schedule although their delegate site do

not install the same snapshot version. On the other

hand Assumption 4, it seems clear that a 1C-schedule

serializes the execution of conﬂicting transactions.

Property 4. Under Assumption 4, the 1C-schedule H

veriﬁes that for each pair T

, T

∈ T : ¬(T

impacts T

at b

Proof: By Assumption 2, at any site k ∈ I

, for

each pair T

, T

∈ T

: ¬(T

impacts T

at b

). That

is, WS

∩WS

0∨ ¬(b

< c

(1) If WS

∩ WS

0, by deﬁnition of T

and T

∩WS

0. Then, ¬(T

impacts T

at b

(2) Let WS

∩ WS

0. Again, by deﬁnition of T

and T

, WS

∩ WS

0. Hence, either ¬(T

impacts

at b

) or ¬(T

impacts T

at b

). Thus, c

< b

or c

< b

holds. By Assumption 4, c

< c

for all

sites k ∈ I

. Thus, c

< b

for all k ∈ I

. In particu-

lar, c

site( j)

< b

site( j)

. By deﬁnition of H

: c

< c

and

≤ c

site( j)

< b

holds in H

. Suppose that T

impacts

at b

in H

. That is, WS

∩WS

0 and b

< c

A contradiction with c

< c

is obtained. Therefore,

¬(T

impacts T

at b

). Analogously, if T

impacts T

in H

. That is, WS

∩WS

0 and b

< c

. A con-

tradiction with c

< b

is obtained again, and therefore,

¬(T

impacts T

at b

In the next theorem is proved that 1C-schedules

generated by deferred update protocols following As-

sumption 4 and Assumption 5 verify Deﬁnition 5; i.e.

they generate GSI schedules.

Theorem 2. Under Assumption 4 and Assumption 5,

the 1C-schedule H

is a GSI-schedule.

Proof: Firstly, notice that Assumption 4 im-

plies total order of conﬂicting transactions. Given

this total order of conﬂicting transactions, the 1C-

schedule H

, the 1C-schedule veriﬁes for each T

∈

T that ¬(T

impacts T

at b

) for every T

∈

T. Additionally, by Assumption 5, for each T

∈

T, if R

) ∈ H

then X

∈ (Snapshot(DB, H

, s

) ∩

Snapshot(DB, H

site(i)

, b

site(i)

)) with s

∈ R

and s

≤ b

(recall that b

= b

site(i)

). This fact makes true Con-

dition (1) in Deﬁnition 5. Therefore, if s

= b

for

every T

∈ T then Condition (2) in Deﬁnition 5 triv-

ially holds. We need to prove Condition (2) in gen-

eral. Thus, consider s

< b

; there must be a transac-

tion T

∈ T such that s

< c

< b

and WS

∩ RS

Let T

be the ﬁrst transaction in H

verifying such

condition. Therefore, by Assumption 1 and Assump-

tion 2 (H

site(i)

is a SI-schedule), b

site(i)

< c

site(i)

holds.

As c

< b

then c

< c

also holds. Assume that

∩WS

0, if c

site(i)

< c

site(i)

then by Assumption 4

and construction of H

, c

< c

leading to a contra-

diction. So, b

site(i)

< c

site(i)

< c

site(i)

. This implies, by

Assumption 2 thatWS

∩WS

0 and T

veriﬁes Con-

dition (2) in Deﬁnition 5(¬(T

impactsT

ats

)).

Every transaction T

∈ T such that s

< c

< b

veriﬁes thatWS

∩RS

0 since T

is the ﬁrst one such

that WS

∩ RS

0. So, if WS

∩WS

0 then you can

ﬁnd s

′

∈ R

: s

< c

< s

′

< c

< b

. At s

′

, Assumption 5

is veriﬁed again for T

by Deﬁnition 3 of snapshot.

Furthermore, if c

< b

then c

site(i)

< b

site(i)

< c

site(i)

due to Assumption 4 and construction of H

(recall

that c

min

k∈I

} after renaming c

in( j) for all T

< c

in H

that is a contradiction with the ini-

tial supposition of c

< c

< b

. Thus, WS

∩WS

and Condition (2) in Deﬁnition 5 is veriﬁed for every

transaction. The 1C-schedule is a GSI-schedule under

the given assumptions.

From all discussed throughout this section, one

can infer that a replication protocol that respects As-

sumption 4 and Assumption 4 will provide GSI to its

executed transactions without needing to block trans-

actions. The simplest, and most straightforward, solu-

tion is todeﬁne a conﬂict class (Pati˜no-Mart´ınezet al.,

2005; Amza et al., 2003) and each site is responsible

for one (or several) conﬂict class. Thus, transactions

belonging to different conﬂict classes will commit in

any order at remote replicas while conﬂicting trans-

actions belonging to the same conﬂict class are man-

aged by the underlying DBMS of its delegate replica.

Of course, this solution has its own pros and cons, we

ICSOFT 2008 - International Conference on Software and Data Technologies

assume that each transaction exclusively belongs to a

conﬂict class, i.e. no compound conﬂict classes, and

it will read data and write data belonging to that class.

However, it is a high application dependent and the

granularity of the conﬂict class is undeﬁned: it can

range from coarse (at table level) to ﬁne (at row level)

granularity.

8 CONCLUSIONS

It has been formalized the sufﬁcient conditions to

achieve 1-copy-GSI for non-blocking replication pro-

tocols following the deferred update technique that

exclusively broadcast the writeset of transactions with

SI replicas. They consist in providing global atom-

icity and applying (and committing) transactions in

the very same order at all replicas. This means that

there are other means to provide GSI in a replicated

setting, some come at the cost of blocking the start

of transactions (Lin et al., 2005) (which goes against

the non-blocking nature of SI (Berenson et al., 1995))

or by way of relaxing the total order of commit-

ted transactions given here. In particular, that be-

tween two conﬂicting transactions the same set of

non-conﬂicting transactions must be committed and

transactions started while applying in different or-

der these writesets have read data items that belong

to global valid versions. To sum up, all the prop-

erties that have been formalized in our paper seem

to be assumed in some previous works, but none of

them carefully identiﬁed nor formalized such proper-

ties. As a result, we have provided a sound theoretical

basis for designing and developing future replication

protocols with GSI.

ACKNOWLEDGEMENTS

This work has been supported by the EU FEDER and

Spanish MEC under grant TIN2006-14738-C02.

REFERENCES

Amza, C., Cox, A. L., and Zwaenepoel, W. (2003).

Conﬂict-aware scheduling for dynamic content appli-

cations. In USENIX.

Armend´ariz-I˜nigo, J. E., Ju´arez-Rodr´ıguez, J. R.,

de Mend´ıvil, J. R. G., Decker, H., and Mu˜noz-

Esco´ı, F. D. (2007). K-bound GSI: a ﬂexible database

replication protocol. In SAC, pages 556–560. ACM.

Berenson, H., Bernstein, P. A., Gray, J., Melton, J., O’Neil,

E. J., and O’Neil, P. E. (1995). A critique of ANSI

SQL isolation levels. In SIGMOD, pages 1–10.

Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987).

Concurrency Control and Recovery in Database Sys-

tems. Addison Wesley.

Chockler, G., Keidar, I., and Vitenberg, R. (2001).

Group communication speciﬁcations: a comprehen-

sive study. ACM Comput. Surv., 33(4):427–469.

Elnikety, S., Pedone, F., and Zwaenopoel, W. (2005).

Database replication using generalized snapshot iso-

lation. In SRDS, pages 73–84. IEEE-CS.

Fekete, A., Liarokapis, D., O’Neil, E., O’Neil, P., and

Shasha, D. (2005). Making snapshot isolation seri-

alizable. ACM TODS, 30(2):492–528.

Gonz´alez de Mend´ıvil, J. R., Armend´ariz-I˜nigo, J. E.,

Mu˜noz-Esco´ı, F. D., Ir´un-Briz, L., Garitagoitia, J. R.,

and Ju´arez-Rodr´ıguez, J. R. (2007). Non-blocking

ROWA protocols implement GSI using SI replicas.

Technical Report ITI-ITE-07/10, ITI.

Kemme, B. (2000). Database Replication for Clusters of

Workstations (Nr. 13864). PhD thesis, ETHZ.

Lin, Y., Kemme, B., Pati˜no-Mart´ınez, M., and Jim´enez-

Peris, R. (2005). Middleware based data replication

providing snapshot isolation. In SIGMOD, pages 419–

430. ACM.

Papadimitriou, C. (1986). The Theory of Database Concur-

rency Control. Computer Science Press.

Pati˜no-Mart´ınez, M., Jim´enez-Peris, R., Kemme, B., and

Alonso, G. (2005). Consistent database replication at

the middleware level. ACM TOCS, 23(4):375–423.

Pedone, F. (1999). The database state machine and group

communication issues (N. 2090). PhD thesis, EPFL.

Plattner, C., Alonso, G., and

Ozsu, M. T. (2008). Extending

DBMSs with satellite databases. VLDB J., Accepted

for publication.

Wiesmann, M. and Schiper, A. (2005). Comparison of

database replication techniques based on total order

broadcast. IEEE TKDE, 17(4):551–566.

RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS