RELAXING CORRECTNESS CRITERIA IN DATABASE
REPLICATION WITH SI REPLICAS
J. E. Armend´ariz-
´
I˜nigo, J. R. Gonz´alez de Mend´ıvil, J. R. Garitagoitia, J. R. Ju´arez-Rodr´ıguez
Universidad P´ublica de Navarra, 31006 Pamplona, Spain
F. D. Mu˜noz-Esco´ı, L. Ir´un-Briz
Instituto Tecnol´ogico de Inform´atica, 46022 Valencia, Spain
Keywords:
Database replication, distributed databases, snapshot isolation, read one write all, correctness criteria, formal
proofs.
Abstract:
The concept of Generalized Snapshot Isolation (GSI) has been recently proposed as a suitable extension of
conventional Snapshot Isolation (SI) for replicated databases. In GSI, transactions may use older snapshots
instead of the latest snapshot required in SI, being able to provide better performance without significantly
increasing the abortion rate when write/write conflicts among transactions are low. We study and formally
proof a sufficient condition that replication protocols with SI replicas following the deferred update technique
must obey to achieve GSI. They must provide global atomicity and commit update transactions in the very
same order at all sites. However, as this is a sufficient condition, it is possible to obtain GSI by relaxing certain
assumptions about the commit ordering of certain update transactions.
1 INTRODUCTION
Snapshot Isolation (SI) is the isolation level provided
by several commercial database systems, such as Or-
acle, PostgreSQL, Microsoft SQL Server or InterBase.
Transactions executed under SI allows to read from
the last committed snapshot and, hence, read oper-
ations are never blocked nor conflict with any other
update transaction. In order to prevent the lost update
phenomenon (Berenson et al., 1995), concurrent up-
date transactions (read-only transactions are always
committed) modifying the same data item apply the
first-committer-wins rule: only the first transaction
that commits is allowed to proceed the remainder are
aborted. This turns out into a nice feature because it
provides sufficient data consistency (though not seri-
alizable (Fekete et al., 2005; Elnikety et al., 2005))
for non-critical applications while it maintains a good
performance, since read-only transactions are neither
delayed, blocked nor aborted and they never cause
update transactions to block or abort. This behavior
is important for workloads dominated by read-only
transactions, such as those resulting from dynamic
content Web servers (Plattner et al., 2008).
Many enterprise applications demand high avail-
ability since they have to provide continuous ser-
vice to their users. This also implies to replicate
the information being used; i.e., to manage replicated
databases. The concept of Generalized Snapshot Iso-
lation (GSI, concurrently to this a similar definition
denoted as 1-copy-SI was proposed in (Lin et al.,
2005)) has been recently proposed (Elnikety et al.,
2005) in order to provide a suitable extension of con-
ventional SI for replicated databases based on mul-
tiversion concurrency control. In GSI, transactions
may use older snapshots instead of the latest snap-
shot required in SI (setting up the latest snapshot in
a distributed setting is not trivial). Actually, authors
of (Elnikety et al., 2005) outline an impossibility re-
sult which justifies the use of GSI in database replica-
tion: there is no non-blocking implementation of SI
in an asynchronous system, even if databases never
fail which has been formally justified in (Gonz´alez
de Mend´ıvil et al., 2007).
The deferred update technique (Pedone, 1999)
consists in executing transactions at their delegate
replicas (obtaining their corresponding snapshot) and
setting up a commit ordering for update transac-
tions which is mainly done thanks to the total or-
der broadcast (Chockler et al., 2001). When a trans-
45
E. Armendáriz-Íñigo J., R. González de Mendívil J., R. Garitagoitia J., R. Juárez-Rodríguez J., D. Muñoz-Escoí F. and Irún-Briz L. (2008).
RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS.
In Proceedings of the Third International Conference on Software and Data Technologies - ISDM/ABF, pages 45-53
DOI: 10.5220/0001877700450053
Copyright
c
SciTePress
action requests its commitment (read-only transac-
tions are committed right away) its updates are col-
lected and broadcast (using the total order primitive)
to the rest of replicas. Upon its delivery at replicas
a validation test (i.e. to detect conflicts with other
concurrent transactions in the system) is performed;
namely a certification test (Wiesmann and Schiper,
2005) that performs the distributed first-committer-
wins rule (Elnikety et al., 2005; Lin et al., 2005) in the
same way at al replicas and ensures the same order of
the commit process of transactions. The main advan-
tage of these replication protocols is that transactions
can start at any time without restriction or delay.
In this paper, we formalize the requirements for
achieving GSI over SI replicas using non-blocking
protocols. Thus, the criteria for implementing GSI
are: (i) Each submitted transaction to the system ei-
ther commits or aborts at all sites (atomicity); (ii) All
update transactions are committed in the same total
order at every site (total order of committed trans-
actions). Total order ensures that all replicas see
the same sequence of transactions, being thus able
to provide the same snapshots to transactions, inde-
pendently of their starting replica; i.e. giving the
logical vision of a one copy scheduler (1-Copy-GSI).
Whereas atomicity guarantees that all replicas take
the same actions regarding each transaction, so their
states should be consistent, once each transaction has
been terminated.
One can think that these assumptions are rather
intuitive but they constitute the milestone for our con-
tribution of the paper. It consists in somehow relax-
ing the assumption of the total order of committed
transactions. If a protocol is not careful about that,
those transactions without write/write conflicts might
be applied in different orders in different replicas. So,
transactions would be able to read different versions
in different replicas. However, this optimization is
important since processing messages serially as sup-
posed for replication protocols deployed over a group
communication system (Chockler et al., 2001) would
result in significantly lower throughput rates. A re-
laxing assumption has been already presented in (Lin
et al., 2005), still using the total order broadcast, it lets
validated transactions to apply (and commit) transac-
tions concurrently as long as their respective updates
do not intersect. However, this protocol needs to block
the execution of the first operation of any starting
transaction until the concurrent application of trans-
actions finishes. Thus, it is easy to see that there are
multiple approaches to obtain GSI at the price of im-
posing certain restrictions, in particular, the need to
block the start of transactions to obtain a global con-
sistent snapshot. Finally, we take a look and discuss
how to relax this last contribution, which is actually
too strong, for deploying GSI non-blocking protocols.
The rest of the work is organized as follows
1
. Sec-
tion 2 introduces the concept of multiversion histories
based on (Bernstein et al., 1987). Sections 3 and 4
give the concepts of SI and GSI respectively. In Sec-
tion 5, the structure of deferred update replication pro-
tocols is introduced. Conditions for 1-Copy-GSI is in-
troduced in 6. We take a look at how to relax con-
ditions for 1-Copy-GSI in Section 7. Finally, conclu-
sions end the paper.
2 MULTIVERSION HISTORIES
In the following, we define the concept of multiver-
sion history for committed transactions using the the-
ory provided in (Bernstein et al., 1987). The prop-
erties studied in our paper only require to deal with
committed transactions. To this end, we first define
the basic building blocks for our formalizations, and
then the different definitions and properties will be
shown.
A database (DB) is a collection of data items,
which may be concurrently accessed by transactions.
A history represents an overall partial ordering of the
different operations concurrently executed within the
context of their corresponding transactions. Thus, a
multiversion history generalizes a history where the
database items are versioned.
To formalize this definition, each transaction sub-
mitted to the system is denoted by T
i
. A transaction is
a sequence of read and write operations on database
items ended by a commit or abort operation. Each T
i
s
write operation on item X is denoted W
i
(X
i
). A read
operation on item X is denoted R
i
(X
j
) stating that T
i
reads the version of X installed by T
j
. Finally, C
i
and
A
i
denote the T
i
s commit and abort operation respec-
tively. We assume that a transaction does not read an
item X after it has written it, and each item is read and
written at most once. Avoiding redundant operations
simplifies the presentation. The results for this kind
of transactions are seamlessly extensible to more gen-
eral models. In any case, redundant operations can be
removed using local variables in the program of the
transaction (Papadimitriou, 1986).
Each version of a data item X contained in the
database is denoted by X
i
, where the subscript stands
for the transaction identifier that installed that version
in the DB. The readset and writeset (denoted by RS
i
and WS
i
respectively) express the sets of items read
1
Due to space constraints, the reader is referred
to (Gonz´alez de Mend´ıvil et al., 2007) for a thorough ex-
planation of the correctness proof.
ICSOFT 2008 - International Conference on Software and Data Technologies
46
(written) by a transaction T
i
. Thus, T
i
is a read-only
transaction if WS
i
=
/
0 and it is an update one, other-
wise.
Let T = {T
1
, . . . , T
n
} be a set of committed trans-
actions, where the operations of T
i
are ordered by
T
i
. The last operation of a transaction is the com-
mit operation. To process operations from a trans-
action T
i
T, a multiversion scheduler must translate
T
i
s operations on data items into operations on spe-
cific versions of those data items. That is, there is a
function h that maps each W
i
(X) into W
i
(X
i
), and each
R
i
(X) into R
i
(X
j
) for some T
j
T.
Definition 1. A Complete Committed Multiversion
(CCMV) history H over T is a partial order with or-
der relation such that:
(1) H = h(
S
T
i
T
T
i
) for some translation function h.
(2) ≺⊇
S
T
i
T
T
i
.
(3) If R
i
(X
j
) H, i 6= j, thenW
j
(X
j
) H andC
j
R
i
(X
j
).
In the previous Definition 1 condition (1) indi-
cates that each operation submitted by a transaction
is mapped into an appropriate multiversion operation.
Condition (2) states that the CCMV history preserves
all orderings stipulated by transactions. Condition (3)
establishes that if a transaction reads a concrete ver-
sion of a data item, it was written by a transaction that
committed before the item was read.
Definition 1 is more specific than the one stated
in (Bernstein et al., 1987), since the former only in-
cludes committed transactions and explicitly indicates
that a new version may not be read until the transac-
tion that installed the new version has committed. In
the rest of the paper, we use the following conven-
tions: (i) T = {T
1
, . . . , T
n
} is the set of committed trans-
actions for every defined history; and, (ii) any history
H is a CCMV history over T. Note that these con-
ventions will be also applicable when a superscript is
used to denote the site of the database where the his-
tory is generated.
In general, two histories (H, ) and (H
,
) over
the same set of transactions are view equivalent(Bern-
stein et al., 1987), denoted as H H
if they contain
the same operations, have the same reads-from rela-
tions, and produce the same final writes. The notion
of equivalence of CCMV histories reduces to the sim-
ple condition, H = H
, if the following reads-from re-
lation is used: T
i
reads X from T
j
in a CCMV history
(H, ), if and only if R
i
(X
j
) H.
3 SNAPSHOT ISOLATION
In SI reading from a snapshot means that a transac-
tion T
i
sees all the updates done by transactions that
committed before the transaction started its first oper-
ation. The results of its writes are installed when the
transaction commits. However, a transaction T
i
will
successfully commit if and only if there is not a con-
current transaction T
k
that has already committed and
some of the written items by T
k
are also written by T
i
.
From our point of view, histories generated by a given
concurrency control providing SI may be interpreted
as multiversion histories with time restrictions.
Definition 2. Let (H, ) be a history and t : H R
+
a
mapping such that it assigns to each operation op H
its real time occurrence t(op) R
+
. The schedule H
t
of the history (H, ) verifies:
(1) If op, op
H and op op
then t(op) < t(op
).
(2) If t(op) = t(op
) and op, op
H then op = op
.
The mapping t() totally orders all operations of
(H, ). Condition (1) states that the total order < is
compatible with the partial order . Condition (2) es-
tablishes, for sake of simplicity, the assumption that
different operations will have different times. We are
interested in operating with schedules since it facili-
tates the work, but only with the ones that derive from
CCMV histories over a concrete set of transactions T.
One can note that an arbitrary time labeled sequence
of versioned operations, e.g. (R
i
(X
j
),t
1
), (W
i
(X
k
),t
2
)
and so on, is not necessarily a schedule of a history.
Thus, we need to put some restrictions to make sure
that we work really with schedules corresponding to
possible histories.
Property 1. Let S
t
be a time labeled sequence of ver-
sioned operations over a set of transactions T, S
t
is a
schedule of a history over T if and only if it verifies
the following conditions:
(1) item there exists a mapping h such that S =
h(
S
iT
i
T
i
).
(2) if op,op
T
i
and op
T
i
op
then t(op) < t(op
) in
S
t
.
(3) if R
i
(X
j
) S and i 6= j then W
j
(X
j
) S and t(C
j
) <
t(R
i
(X
j
)).
(4) if t(op) = t(op
) and op, op
S then op = op
.
The proof of this fact can be inferred trivially. In
the following, we use an additional convention: (iii) A
schedule H
t
is a schedule of a history (H, ). Note that
every schedule H
t
may be represented by writing the
operations in the total order (<) induced by t(). We
define the “commit time” (c
i
) and “begin time” (b
i
)
for each transaction T
i
T in a schedule H
t
as c
i
= t(C
i
)
and b
i
= t(first operation of T
i
), holding b
i
< c
i
by def-
inition of t() and
T
i
. In the following, we formalize
the concept of snapshot of the database. Intuitively, it
comprises the latest version of each data item. Firstly,
we will see an example of this:
Example 1. Let us consider the following transac-
RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS
47
tions T
1
, T
2
and T
3
: T
1
= {R
1
(X), W
1
(X), c
1
}, T
2
=
{R
2
(Z),R
2
(X),W
2
(Y), c
2
}, T
3
= {R
3
(Y),W
3
(X), c
3
}. A
sample of a possible schedule of these transac-
tions might be the following one: b
1
R
1
(X
0
) W
1
(X
1
)c
1
b
2
R
2
(Z
0
)b
3
R
3
(Y
0
)W
3
(X
3
)c
3
R
2
(X
1
)W
2
(Y
2
)c
2
. As this
example shows, each transaction is able to include
in its snapshot (and read from it) the latest committed
version of each existing item at the time such transac-
tion was started. Thus T
2
has read version 1 of item
X since T
1
has generated such version and it has al-
ready committed when T
2
started. But it only reads
version 0 of item Z since no update of such item is
seen by T
2
. This is true despite transactions T
2
and T
3
are concurrent and T
3
updates X before T
2
reads such
item, because the snapshot taken for T
2
is previous to
the commit of T
3
.
This example provides the basis for defining what
a snapshot is. For that purpose, we need to define
first the set of installed versions of a data item X in a
schedule H
t
, as the set Ver(X, H) = {X
j
: W
j
(X
j
) H}
X
0
, being X
0
its initial version.
Definition 3. The snapshot of the database DB
at time τ R
+
for a schedule H
t
, is defined as:
Snapshot(DB, H
t
, τ) =
S
XDB
latestVer(X, H
t
, τ) where
the latest version of each item X DB at time τ is
the set: latestVer(X, H
t
, τ) = {X
p
Ver(X, H): ( X
k
Ver(X, H): c
p
< c
k
τ)}
From the previous definition, it is easy to show
that a snapshot is modified each time an update trans-
action commits. If τ = c
m
and X
m
Ver(X, H), then
latestVer(X, H
t
, c
m
) = {X
m
}. In order to formalize the
concept of SI-schedule, we utilize a slight variation
of the predicate impacts for update transactions pre-
sented in (Elnikety et al., 2005). Two transactions T
j
,
T
i
T impact at time τ R
+
in a schedule H
t
, denoted
T
j
impacts T
i
at τ, if the following predicate holds:
WS
j
T
WS
i
6=
/
0 τ < c
j
< c
i
.
Definition 4. A schedule H
t
is a SI-schedule if and
only if for each T
i
T:
(1) if R
i
(X
j
) H then X
j
Snapshot(DB, H
t
, b
i
); and,
(2) for each T
j
T : ¬(T
j
impacts T
i
at b
i
).
Condition (1) states that all the versions read by a
transaction T
i
are obtained from Snapshot(DB, H
t
, b
i
);
that is, versions are obtained from the snapshot of the
database DB at the time the transaction starts its first
operation. Condition (2) states that any pair of trans-
actions T
j
and T
i
, writing over some common data
items, can not overlap their time intervals [b
i
, c
i
] and
[b
j
, c
j
]. In other words, they have to be executed in
a serial way. Other equivalent definitions of SI have
been provided in the literature (Berenson et al., 1995;
Kemme, 2000; Lin et al., 2005; Fekete et al., 2005;
Elnikety et al., 2005).
4 THE GSI LEVEL
The concept of Generalized Snapshot Isolation (or
GSI, for short) was firstly applied to database repli-
cation in (Elnikety et al., 2005). A hypothetical con-
currency control algorithm could have stored some
past snapshots. A transaction may receive a snapshot
that happened in the system before the time of its first
operation (instead of its current snapshot as in a SI
concurrency control algorithm). The algorithm may
commit the transaction if no other transaction impacts
with it from that past snapshot. Thus, a transaction
can observe an older snapshot of the DB but the write
operations of the transaction are still valid update op-
erations for the DB at commit time. These previous
ideas define the concept of GSI.
Definition 5. A schedule H
t
is a GSI-schedule if and
only if for each T
i
T there exists a value s
i
R
+
such
that s
i
b
i
and:
(1) if R
i
(X
j
) H then X
j
Snapshot(DB, H
t
, s
i
); and,
(2) for each T
j
T : ¬(T
j
impacts T
i
at s
i
).
Condition (1) states that every item read by a
transaction belongs to the same (possible past) snap-
shot. Condition (2) also establishes that the time in-
tervals [s
i
, c
i
] and [s
j
, c
j
] do not overlap for any pair of
write/write conflicting transactions T
i
and T
j
. If for all
T
i
T, conditions (1) and (2) hold for s
i
= b
i
then H
t
is
a SI-schedule. Thus, Definition 5 includes as a partic-
ular case the Definition 4. Another observation of the
definition concludes that if there exists a transaction
T
i
T such that conditions (1) and (2) are only veri-
fied for a value s
i
< b
i
then there is an item X RS
i
for
which latestVer(X, H
t
, s
i
) 6= latestVer(X, H
t
, b
i
). That is,
the transaction T
i
has not seen the latest version of X
at the begin time b
i
. There was a transaction T
k
with
W
k
(X
k
) H such that s
i
< c
k
< b
i
. This can be best seen
in the next example.
Example 2. The following is an example of a GSI-
schedule: b
1
R
1
(X
0
)W
1
(X
1
)c
1
b
2
R
2
(X
0
)R
2
(Z
0
)b
3
R
3
(Y
0
)
W
3
(X
3
)c
3
W
2
(Y
2
)c
2
. In this schedule, transaction
T
2
reads X
0
after the commit of T
1
appears. This
would not be correct for a SI-schedule (since
the read version of X is not the latest one), but
it is perfectly valid for a GSI-schedule, taken
the time point of the snapshot provided to T
2
(i.e. s
2
) previous to the commit of T
1
, as it is shown:
b
1
R
1
(X
0
)s
2
W
1
(X
1
)c
1
b
2
R
2
(X
0
)R
2
(Z
0
)b
3
R
3
(Y
0
)W
3
(X
3
)c
3
W
2
(Y
2
)c
2
. The intuition under this schedule in a dis-
tributed system is that the message containing the
modifications of T
1
(the write operation on X) would
have not yet arrived to the site at the time transaction
T
2
began. This may be the reason for T
2
to see this
previous version of item X. The fact that GSI captures
ICSOFT 2008 - International Conference on Software and Data Technologies
48
these delays into schedules makes attractive its usage
on distributed environments.
The value s
i
in Definition 5 plays the same role as
b
i
in Definition 4. Thus, it is possible to think that if
the operations in the GSI-schedule obtained from the
history H had been ‘on time’ then the schedule would
have been a SI-schedule.
Example 3. Let us use Example 2 to show how a
GSI-schedule can be transformed into a SI-schedule.
Thus, to turn that GSI-schedule into a SI-schedule, it
is just needed to move the beginning of T
2
back to
s
2
, and consequently, the resulting schedule will be
a SI-schedule: b
1
R
1
(X
0
) b
2
W
1
(X
1
) c
1
R
2
(X
0
) R
2
(Z
0
)
b
3
R
3
(Y
0
) W
3
(X
3
) c
3
W
2
(Y
2
) c
2
. However, this schedule
does not fit the definition of b
i
, which was described as
the time of the first operation a transaction performs.
Thus, such rst operation of transaction T
2
must be
also moved in the SI-schedule, resulting in the follow-
ing: b
1
R
1
(X
0
) b
2
R
2
(X
0
) W
1
(X
1
) c
1
R
2
(Z
0
) b
3
R
3
(Y
0
)
W
3
(X
3
) c
3
W
2
(Y
2
) c
2
.
The following property describes the previous
transformation in a formal way:
Property 2. Let H
t
be a GSI-schedule. There is a
mapping t
: H R
+
such that H
t
is a SI-schedule.
This last property states that if H
t
is a GSI-
schedule, there will exist a H
t
, which is actually a
SI-schedule, and verify the following H
t
H
t
(in the
sense of view-equivalence).
5 THE DEFERRED UPDATE
TECHNIQUE
The GSI concept is particularly interesting in repli-
cated databases, since many replication protocols ex-
ecute each transaction initially in a delegate replica,
propagating later its updates to the rest of repli-
cas (Lin et al., 2005; Elnikety et al., 2005; Ar-
mend´ariz-I˜nigo et al., 2007). This means that trans-
action writesets cannot be immediately applied in all
replicas at a time and, due to this, the snapshot being
used in a transaction might be “previous” to the one
that (regarding physical time in a hypothetical cen-
tralized system) would have been assigned to it. In
this Section we consider a distributed system that con-
sists of m sites, being I
m
= {1..m} the set of site iden-
tifiers. Sites communicate among them by reliable
message passing. We make no assumptions about the
time it takes for sites to execute and for messages to
be transmitted. We assume a system free of failures
2
.
2
Otherwise, writes will only be applied on the avail-
able replicas, but all our discussion is orthogonal to failures
Each site k runs an instance of the database manage-
ment system and maintains a copy of the database DB.
We will assume that each database copy, denoted DB
k
with k I
m
, provides SI (Berenson et al., 1995).
We use the transaction model of Section 2. Let
T = {T
i
: i I
n
} be the set of transactions submitted to
the system; where I
n
= {1..n} is the set of transaction
identifiers.
The deferred update technique defines for each
transaction T
i
T, the set of transactions {T
k
i
: k I
m
}
in which there is only one, denoted T
site(i)
i
, verifying
RS
site(i)
i
= RS
i
and WS
site(i)
i
= WS
i
; for the rest of the
transactions, T
k
i
, k 6= site(i), RS
k
i
=
/
0 and WS
k
i
= WS
i
.
T
site(i)
i
determines the local transaction of T
i
, i.e., the
transaction executed at its delegate replica or site,
whilst T
k
i
, k 6= site(i), is a remote transaction of T
i
, i.e.,
the updates of the transaction executed at a remote
site. An update transaction reads at one site and writes
at every site, while a read-only transaction only ex-
ists at its local site. In the rest of the paper, we con-
sider the general case of update transactions with non-
empty sets.
Let T
k
= {T
k
i
: i I
n
} be the set of transactions sub-
mitted at each site k I
m
for the set T. Some of these
transactions are local at k while others are remote
ones. In the next, the Assumption 1 implies that each
transaction submitted to the system either commits at
all replicas or in none of them. Thus, the updates ap-
plied in a delegate replica by a given transaction are
also applied in the rest of replicas. Obviously, we con-
sider a fully-replicated system. Since only committed
transactions are relevant, the histories being generated
at each site should be histories over T
k
, as defined
above.
Assumption 1 (Atomicity). H
k
is a CCMV history
over T
k
for all sites k I
m
.
In the considered distributed system there is not
a common clock or a similar synchronization mech-
anism. However, we can use a real time mapping
t :
S
kI
m
(H
k
) R
+
that totally orders all operations
of the system. This mapping is compatible with each
partial order
k
defined for H
k
for each site k I
m
.
In the following, we consider that each DB
k
provides
SI-schedules under the previous time mapping.
Assumption 2 (SI Replicas). H
k
t
is a SI-schedule of
the history H
k
for all sites k I
m
.
In order to study the level of consistency imple-
mented by this kind of non-blocking protocols is nec-
essary to define the one copy schedule (1C-schedule)
obtained from the schedules at each site. In the next
and can be seamlessly extended to a system where failures
might arise.
RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS
49
definitions, properties and theorems we use the fol-
lowing notation: for each transaction T
i
, i I
n
, C
min(i)
i
denotes the commit operation of the transaction T
i
at
site min(i) I
m
such that c
min(i)
i
= min
kI
m
{c
k
i
} under the
considered mapping t().
Definition 6 (1C-schedule). Let T = {T
i
: i I
n
} be the
set of submitted transactions to a replicated database
system with a non-blocking deferred update strategy
that verifies Assumption 1 and Assumption 2. Let
S =
S
kI
m
(H
k
) b
e the set formed by the union of the
histories H
k
over T
k
= {T
k
i
: i I
n
}. And let t : S R
+
be the mapping that totally orders the operations in S.
The 1C-schedule, H
t
= (H,t
: H R
+
), is built from S
and t() as follows. For each i I
n
and k I
m
:
(1) Remove from S operations such that: W
i
(X
i
)
k
, with
k 6= site(i), or C
k
i
, with k 6= min(i).
(2) H is obtained with the rest of operations in S after
step (1), applying the renaming: W
i
(X
i
) = W
i
(X
i
)
site(i)
;
R
i
(X
j
) = R
i
(X
j
)
site(i)
; and, C
i
= C
min(i)
i
.
(3) Finally, t
() is obtained from t() as follows:
t
(W
i
(X
i
)) = t(W
i
(X
i
)
site(i)
); t
(R
i
(X
j
)) = t(R
i
(X
j
)
site(i)
);
and, t
(C
i
) = t(C
min(i)
i
)
As t
() receives its values from t(), we write, H
t
instead of H
t
. In the 1C-schedule H
t
, for each transac-
tion T
i
, is trivially verified b
i
< c
i
because this tech-
nique guarantees that for all k 6= site(i), b
site(i)
i
< b
k
i
.
The 1C history H, that is formed by the operations
over the logical DB, is also a history over T. We
prove this fact informally. By the renaming (2) in
Definition 6, each transaction T
i
, has its operations
over the data items in RS
i
and WS
i
, and
T
i
is triv-
ially maintained in a partial order for H, because
H
t
contains the local operations of T
site(i)
i
. H is also
formed by committed transactions, under Assump-
tion 1; for each T
i
, C
i
H. Finally, if R
i
(X
j
) H, then
R
i
(X
j
)
site(i)
H
site(i)
. As H
site(i)
is a history over T
site(i)
then C
site(i)
j
R
i
(X
j
)
site(i)
. By defining C
min( j)
j
C
site(i)
j
in S then C
min( j)
j
R
i
(X
j
)
site(i)
and soC
j
R
i
(X
j
). Thus
H can be defined as a history over T.
Transformation (2) on Definition 6 ensures that a
transaction is committed as soon as it has been com-
mitted at the first replica. Finally, no restriction about
the beginning of a transaction is imposed in this def-
inition. Hence, this definition is valid for the most
general case of non-blocking protocols. Although As-
sumptions 1 and 2 are included in Definition 6, they
do not guarantee that the obtained 1C-schedule is a
SI-schedule. This is best illustrated in the following
example, where it is also shown how the 1C-schedule
may be built from each site SI-schedules.
Example 4. In this example two sites (A, B) and
the next set of transactions T
1
, T
2
, T
3
, T
4
are consid-
ered: T
1
= {R
1
(Y),W
1
(X)}, T
2
= {R
2
(Z),W
2
(X)}, T
3
=
{R
3
(X),W
3
(Z)}, T
4
= {R
4
(X), R
4
(Z),W
4
(Y)}. Figure 1
illustrates the mapping described in Definition 6 for
building a 1C-schedule from the SI-schedules seen in
the different nodes I
m
. T
2
and T
3
are locally executed
at site A (RS
2
6=
/
0 and RS
3
6=
/
0) whilst T
1
and T
4
are ex-
ecuted at site B respectively. The writesets are after-
wards applied at the remote sites. Schedules obtained
at both sites are SI-schedules, i.e. transactions read
the latest version of the committed data at each site.
The 1C-schedule is obtained from Definition 6. For
example, the commit of T
1
occurs for the 1C-schedule
in the minimum of the interval between C
A
1
and C
B
1
and so on for the remaining transactions. In the 1C-
schedule of Figure 1, T
4
reads X
1
and Z
3
but the X
2
version exists between both (since X
2
was installed at
site A). T
1
and T
2
, satisfying that WS
1
T
WS
2
6=
/
0, are
executed at both sites in the same order. As T
1
and T
2
are not executed in the same order with regard to T
3
,
the obtained 1C-schedule is neither SI nor GSI.
6 1-COPY-GSI SCHEDULES
The 1C-schedule H
t
obtained in Definition 6 will be
a GSI-schedule if it verifies the conditions given in
Definition 5. The question is what conditions local SI-
schedules, H
k
t
, have to verify in order to guarantee that
H
t
is a GSI-schedule. Taking into account the order-
ing of conflicting transactions in GSI-equivalence, we
consider the kind of protocols that guarantee the same
total order of the commit operations for the transac-
tions with write/write conflicts at everysite. However,
the execution of write/write conflicting transactions in
the same order at all sites does not offer SI nor GSI, as
it has been shown in Example 4. Therefore, it is also
necessary to consider the need of reading from a con-
sistent snapshot from the notion of GSI-equivalence;
i.e. all update transactions must be committed in the
very same order at all sites. As a result, since all repli-
cas generate SI-schedules and their local snapshots
have received the same sequence of updates, trans-
actions starting at any site are able to read a particular
snapshot, that perhaps is not the latest one, but that is
consistent with those of other replicas.
Assumption 3
(Total Order of Committing Transac-
t
ions). For each pair T
i
, T
j
T, a unique order relation
c
k
i
< c
k
j
holds for all SI-schedules H
k
t
with k I
m
.
The SI-schedules H
k
t
have the same total order of
committed transactions. Without loss of generaliza-
tion, we consider the following total order in the rest
of this section: c
k
1
< c
k
2
< ... < c
k
n
for every k I
m
. In
the next property we are going to verify that, thanks to
ICSOFT 2008 - International Conference on Software and Data Technologies
50
A
T
1
1
T
1
2
T
1
3
T
1
4
W
1
1
(X
1
)C
1
1
R
1
2
(Z
0
)W
1
2
(X
2
)C
1
2
R
1
3
(X
2
)W
1
3
(Z
3
)C
1
3
W
1
4
(Y
4
)C
1
4
B
T
2
1
T
2
3
T
2
4
T
2
2
R
2
1
(Y
0
)W
2
1
(X
1
)C
2
1
W
2
3
(Z
3
)C
2
3
R
2
4
(X
1
)R
2
4
(Z
3
)W
2
4
(Y
4
)C
2
4
W
2
2
(X
2
)C
2
2
1CS
T
1
T
2
T
3
T
4
R
1
(Y
0
)W
1
(X
1
)C
1
R
2
(Z
0
)W
2
(X
2
)C
2
R
3
(X
2
)W
3
(Z
3
)C
3
R
4
(X
1
)R
4
(Z
3
)W
4
(Y
4
)C
4
Time
Figure 1: Replicated one-copy execution not providing CSI nor GSI.
the total order, versions of items read by a transaction
belong to the same snapshot in a given time interval.
This interval is determined for each transaction T
i
by
two commit times, denoted c
i
0
and c
i
1
. The former
corresponds to the commit time of a transaction T
i
0
such that T
i
reads from T
i
0
for the last time and from
then it performs no other read operation. The latter
corresponds to the commit time of a transaction T
i
1
,
so that it is the first transaction, after T
i
0
, that verifies
WS
i
1
RS
i
6=
/
0 and hence modifying the snapshot of
the transaction T
i
. In case that T
i
1
does not exist, the
correctness interval for T
i
will extend from c
i
0
to b
i
.
Property 3. Let H
t
be a 1C-schedule verifying As-
sumption 3. For each T
i
T if R
i
(X
j
) H then X
j
Snapshot(DB, H
t
, τ) and τ R
+
satisfies c
i
0
τ < c
i
1
b
i
.
The aim of the next theorem is to prove that the
1C-schedules generated by any deferred update pro-
tocol that verifies Assumption 3 are actually GSI-
schedules; i.e., they comply with all conditions stated
in Definition 5. Whilst proving that a transaction al-
ways reads from the same snapshot in a particular
time interval is easy, it is not trivial to prove that for a
given transaction T
i
there has not been any other trans-
action T
j
that has impacted T
i
and that has been com-
mitted whilst T
i
was being executed. However, due
to the total commit order an induction proof is possi-
ble, showing that the obtained 1C-schedule verifies all
conditions in order to be a GSI-schedule.
Theorem 1. Under Assumption 3, the 1C-schedule H
t
is a GSI-schedule.
This theorem formally justifies such protocols cor-
rectness and establishes that their resulting isolation
level is GSI; the proof of it is given in (Gonz´alez de
Mend´ıvil et al., 2007). Additionally, it is worth not-
ing that Assumption 3 is a sufficient condition, but
not necessary, for obtaining GSI. Despite this, repli-
cation protocols that comply with such an assumption
are easily implementable. In the next section, we ana-
lyze how to relax this assumption while obtaining GSI
schedules with non-blocking protocols.
7 RELAXING ASSUMPTIONS
Assumption 3 (Total order of committing transact-
ions) is very strong. It forces to install the same snap-
shots in the same order at every replica. Thus, The-
orem 1 guarantees that the 1C-schedule H
t
is a GSI-
schedule. On the contrary, the total order of conflict-
ing transactions is not enough to guarantee SI nor GSI
(see Example 4) and it requires a stronger condition:
it is needed that the snapshot gotten by a transaction
at its delegate replica matches the 1C-schedule, actu-
ally being the latter a GSI-schedule. However, this
fact does not necessarily oblige each replica to install
the same snapshots as in the 1C-schedule. That is, if
R
i
(X
j
) belongs to H
t
then X
j
(Snapshot(DB, H
t
, b
i
)
Snapshot(DB, H
site(i)
t
, b
i
)). From what it has been de-
picted before, it is clear that if you want to relax As-
sumption 3, you have to provide some property that
sets a relation between the reads-from relationship of
a transaction in the 1C-schedule and the reads-from
relationship of the transaction local schedule at its
delegate site. In the next, we provide more relaxing
assumptions to obtain a 1C-schedule providing GSI.
Assumption 4. For each pair T
i
, T
j
T with WS
i
WS
j
6=
/
0, a unique order relation c
k
i
< c
k
j
holds for all
SI-schedule H
k
t
with k I
m
; and, if there is some trans-
action T
p
T such that c
k
i
< c
k
p
< c
k
j
holds for some site
k I
m
then it holds for every k I
m
.
This assumption states that between two conflict-
ing transactions their commit ordering is the same at
every site. Moreover, it also states that between both
transactions, there are the same subset of committed
transactions; no matter the order in which they occur.
Example 5. Let us suppose that there are
two replicas and the next set of transactions:
{T
1
, T
2
, T
3
, T
4
, T
5
, T
6
, T
7
} with WS
1
WS
4
6=
/
0,
WS
3
WS
7
6=
/
0 and the rest do not conflict among
each other. At the first site you can find the following
local SI-schedule: c
1
1
< c
1
2
< c
1
3
< c
1
4
< c
1
5
< c
1
6
< c
1
7
whilst at the second site the derived SI-schedule can
be: c
2
1
< c
2
2
< c
2
3
< c
2
4
< c
2
6
< c
2
5
< c
2
7
. In the latter, the
RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS
51
commit ordering of transactions T
5
and T
6
is different
from the scheduling of the former.
As it may be inferred, Assumption 4 becomes As-
sumption 3 whenever the pattern of transactions do
not allow to reorder the commit of transactions. In
Example 5, it cannot happen without violating As-
sumption 4 the following: c
2
4
< c
2
3
. On the other hand,
taking Assumption 4 to the extreme, if all transac-
tions do not conflict among them any committing or-
der can be obtained at each site. To limit these sit-
uations from making their appearance, it is needed
to enforce to each transaction to read from the same
snapshot like for each pair of transactions T
i
, T
j
T
with WS
j
\RS
i
6=
/
0: they verify that if c
j
< b
i
in H
t
then
c
site(i)
j
< c
site(i)
i
in H
site(i)
t
. WS
j
\RS
i
6=
/
0: they verify that
if c
j
< b
i
in H
t
then c
site(i)
j
<
site(i)
i
in H
site(i)
t
which is
stated in the next assumption.
Assumption 5 (Compatible Snapshot Read). Let H
t
be a 1C-schedule, for each T
i
T there exists s
i
b
i
such that if R
i
(X
j
) H
t
then X
j
(Snapshot(DB, H
t
, s
i
)
Snapshot(DB, H
site(i)
t
, b
i
)).
This last assumption means that each transaction
reads data items that belong to a valid global snapshot
from the 1C-schedule although their delegate site do
not install the same snapshot version. On the other
hand Assumption 4, it seems clear that a 1C-schedule
serializes the execution of conflicting transactions.
Property 4. Under Assumption 4, the 1C-schedule H
t
verifies that for each pair T
i
, T
j
T : ¬(T
j
impacts T
i
at b
i
).
Proof: By Assumption 2, at any site k I
m
, for
each pair T
k
j
, T
k
i
T
k
: ¬(T
k
j
impacts T
k
i
at b
k
i
). That
is, WS
k
j
WS
k
i
=
/
0 ¬(b
k
i
< c
k
j
< c
k
i
).
(1) If WS
k
j
WS
k
i
=
/
0, by definition of T
j
and T
i
,
WS
j
WS
i
=
/
0. Then, ¬(T
j
impacts T
i
at b
i
).
(2) Let WS
k
j
WS
k
i
6=
/
0. Again, by definition of T
j
and T
i
, WS
j
WS
i
6=
/
0. Hence, either ¬(T
k
j
impacts
T
k
i
at b
k
i
) or ¬(T
k
i
impacts T
k
j
at b
k
j
). Thus, c
k
i
< b
k
j
or c
k
j
< b
k
i
holds. By Assumption 4, c
k
i
< c
k
j
for all
sites k I
m
. Thus, c
k
i
< b
k
j
for all k I
m
. In particu-
lar, c
site( j)
i
< b
site( j)
j
. By definition of H
t
: c
i
< c
j
and
c
i
c
site( j)
i
< b
j
holds in H
t
. Suppose that T
j
impacts
T
i
at b
i
in H
t
. That is, WS
j
WS
i
6=
/
0 and b
i
< c
j
< c
i
.
A contradiction with c
i
< c
j
is obtained. Therefore,
¬(T
j
impacts T
i
at b
i
). Analogously, if T
i
impacts T
j
at
b
j
in H
t
. That is, WS
j
WS
i
6=
/
0 and b
j
< c
i
< c
j
. A con-
tradiction with c
i
< b
j
is obtained again, and therefore,
¬(T
i
impacts T
j
at b
j
).
In the next theorem is proved that 1C-schedules
generated by deferred update protocols following As-
sumption 4 and Assumption 5 verify Definition 5; i.e.
they generate GSI schedules.
Theorem 2. Under Assumption 4 and Assumption 5,
the 1C-schedule H
t
is a GSI-schedule.
Proof: Firstly, notice that Assumption 4 im-
plies total order of conflicting transactions. Given
this total order of conflicting transactions, the 1C-
schedule H
t
, the 1C-schedule verifies for each T
i
T that ¬(T
j
impacts T
i
at b
i
) for every T
j
T. Additionally, by Assumption 5, for each T
i
T, if R
i
(X
j
) H
t
then X
j
(Snapshot(DB, H
t
, s
i
)
Snapshot(DB, H
site(i)
t
, b
site(i)
i
)) with s
i
R
+
and s
i
b
i
(recall that b
i
= b
site(i)
i
). This fact makes true Con-
dition (1) in Definition 5. Therefore, if s
i
= b
i
for
every T
i
T then Condition (2) in Definition 5 triv-
ially holds. We need to prove Condition (2) in gen-
eral. Thus, consider s
i
< b
i
; there must be a transac-
tion T
m
T such that s
i
< c
m
< b
i
and WS
m
RS
i
6=
/
0.
Let T
m
be the first transaction in H
t
verifying such
condition. Therefore, by Assumption 1 and Assump-
tion 2 (H
site(i)
t
is a SI-schedule), b
site(i)
i
< c
site(i)
m
holds.
As c
m
< b
i
then c
m
< c
i
also holds. Assume that
WS
m
WS
i
6=
/
0, if c
site(i)
i
< c
site(i)
m
then by Assumption 4
and construction of H
t
, c
i
< c
m
leading to a contra-
diction. So, b
site(i)
i
< c
site(i)
m
< c
site(i)
i
. This implies, by
Assumption 2 thatWS
m
WS
i
=
/
0 and T
m
verifies Con-
dition (2) in Definition 5(¬(T
m
impactsT
i
ats
i
)).
Every transaction T
p
T such that s
i
< c
p
< c
m
< b
i
verifies thatWS
p
RS
i
=
/
0 since T
m
is the first one such
that WS
m
RS
i
6=
/
0. So, if WS
p
WS
i
6=
/
0 then you can
find s
i
R
+
: s
i
< c
p
< s
i
< c
m
< b
i
. At s
i
, Assumption 5
is verified again for T
i
by Definition 3 of snapshot.
Furthermore, if c
p
< b
i
then c
site(i)
p
< b
site(i)
i
< c
site(i)
i
due to Assumption 4 and construction of H
t
(recall
that c
=
min
kI
m
{c
k
j
} after renaming c
m
j
in( j) for all T
j
),
c
p
< c
m
< c
i
in H
t
that is a contradiction with the ini-
tial supposition of c
m
< c
p
< b
i
. Thus, WS
p
WS
i
=
/
0
and Condition (2) in Definition 5 is verified for every
transaction. The 1C-schedule is a GSI-schedule under
the given assumptions.
From all discussed throughout this section, one
can infer that a replication protocol that respects As-
sumption 4 and Assumption 4 will provide GSI to its
executed transactions without needing to block trans-
actions. The simplest, and most straightforward, solu-
tion is todefine a conflict class (Pati˜no-Mart´ınezet al.,
2005; Amza et al., 2003) and each site is responsible
for one (or several) conflict class. Thus, transactions
belonging to different conflict classes will commit in
any order at remote replicas while conflicting trans-
actions belonging to the same conflict class are man-
aged by the underlying DBMS of its delegate replica.
Of course, this solution has its own pros and cons, we
ICSOFT 2008 - International Conference on Software and Data Technologies
52
assume that each transaction exclusively belongs to a
conflict class, i.e. no compound conflict classes, and
it will read data and write data belonging to that class.
However, it is a high application dependent and the
granularity of the conflict class is undefined: it can
range from coarse (at table level) to fine (at row level)
granularity.
8 CONCLUSIONS
It has been formalized the sufficient conditions to
achieve 1-copy-GSI for non-blocking replication pro-
tocols following the deferred update technique that
exclusively broadcast the writeset of transactions with
SI replicas. They consist in providing global atom-
icity and applying (and committing) transactions in
the very same order at all replicas. This means that
there are other means to provide GSI in a replicated
setting, some come at the cost of blocking the start
of transactions (Lin et al., 2005) (which goes against
the non-blocking nature of SI (Berenson et al., 1995))
or by way of relaxing the total order of commit-
ted transactions given here. In particular, that be-
tween two conflicting transactions the same set of
non-conflicting transactions must be committed and
transactions started while applying in different or-
der these writesets have read data items that belong
to global valid versions. To sum up, all the prop-
erties that have been formalized in our paper seem
to be assumed in some previous works, but none of
them carefully identified nor formalized such proper-
ties. As a result, we have provided a sound theoretical
basis for designing and developing future replication
protocols with GSI.
ACKNOWLEDGEMENTS
This work has been supported by the EU FEDER and
Spanish MEC under grant TIN2006-14738-C02.
REFERENCES
Amza, C., Cox, A. L., and Zwaenepoel, W. (2003).
Conflict-aware scheduling for dynamic content appli-
cations. In USENIX.
Armend´ariz-I˜nigo, J. E., Ju´arez-Rodr´ıguez, J. R.,
de Mend´ıvil, J. R. G., Decker, H., and Mu˜noz-
Esco´ı, F. D. (2007). K-bound GSI: a flexible database
replication protocol. In SAC, pages 556–560. ACM.
Berenson, H., Bernstein, P. A., Gray, J., Melton, J., O’Neil,
E. J., and O’Neil, P. E. (1995). A critique of ANSI
SQL isolation levels. In SIGMOD, pages 1–10.
Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987).
Concurrency Control and Recovery in Database Sys-
tems. Addison Wesley.
Chockler, G., Keidar, I., and Vitenberg, R. (2001).
Group communication specifications: a comprehen-
sive study. ACM Comput. Surv., 33(4):427–469.
Elnikety, S., Pedone, F., and Zwaenopoel, W. (2005).
Database replication using generalized snapshot iso-
lation. In SRDS, pages 73–84. IEEE-CS.
Fekete, A., Liarokapis, D., O’Neil, E., O’Neil, P., and
Shasha, D. (2005). Making snapshot isolation seri-
alizable. ACM TODS, 30(2):492–528.
Gonz´alez de Mend´ıvil, J. R., Armend´ariz-I˜nigo, J. E.,
Mu˜noz-Esco´ı, F. D., Ir´un-Briz, L., Garitagoitia, J. R.,
and Ju´arez-Rodr´ıguez, J. R. (2007). Non-blocking
ROWA protocols implement GSI using SI replicas.
Technical Report ITI-ITE-07/10, ITI.
Kemme, B. (2000). Database Replication for Clusters of
Workstations (Nr. 13864). PhD thesis, ETHZ.
Lin, Y., Kemme, B., Pati˜no-Mart´ınez, M., and Jim´enez-
Peris, R. (2005). Middleware based data replication
providing snapshot isolation. In SIGMOD, pages 419–
430. ACM.
Papadimitriou, C. (1986). The Theory of Database Concur-
rency Control. Computer Science Press.
Pati˜no-Mart´ınez, M., Jim´enez-Peris, R., Kemme, B., and
Alonso, G. (2005). Consistent database replication at
the middleware level. ACM TOCS, 23(4):375–423.
Pedone, F. (1999). The database state machine and group
communication issues (N. 2090). PhD thesis, EPFL.
Plattner, C., Alonso, G., and
¨
Ozsu, M. T. (2008). Extending
DBMSs with satellite databases. VLDB J., Accepted
for publication.
Wiesmann, M. and Schiper, A. (2005). Comparison of
database replication techniques based on total order
broadcast. IEEE TKDE, 17(4):551–566.
RELAXING CORRECTNESS CRITERIA IN DATABASE REPLICATION WITH SI REPLICAS
53