To ensure data consistency in a replication proto-
col, the number of nodes to which read operations are
replicated, denoted as R, plus the number of nodes
to which write operations are replicated, denoted as
W , must always be strictly larger than n (Ahamad
and Ammar, 1989). For example, we can set R =
W = d(n + 1)/2e for all operations. This gives us
f = b(n −1)/2c as the number of nodes allowed to
fail before there is a risk for data loss or inconsisten-
cies. To optimize performance of read operations, a
common replication method, used by e.g. Redis
2
and
Spread
3
, is to simply broadcast all updates to all nodes
in the system, either directly or via an elected master.
This corresponds to the case where W = n and R = 1,
which obviously satisfies R +W > n.
Broadcasting the operations allows the client to
use any one of the n nodes to perform the required
operations, which is critical for functionalities such
as shopping carts. An item added to a shopping cart
in a request to one web server node, should still be
there when when another request, which is routed to
another web server node, adds a second item to the
cart. However, as n increases, the required bandwidth
for the data replication using existing methods also
increases, in the order of kn per node.
Despite their popularity, we will not consider web
servers in this work. Instead we will focus on the
requirements for a replication protocol as it would
be used by a store-and-forward system, a software
architecture which provides a buffer between pro-
ducers and consumers of data (Eugster et al., 2003).
This architecture decouples producers and consumers
in time, thereby allowing them to work at different
paces. It also gives the possibility to dynamically add
and remove consumers in response to varying loads
from the producers.
Store-and-forward systems have an important trait
as compared to a more general data storage: there are
no external readers. Once a data tuple has been re-
ceived and stored by the system, it is up to the system
itself to select which tuple is going to be forwarded
next, and by which node. The case of a reader ac-
cesses accessing a random node to request the value
of a particular data tuple, simply does not manifest
itself in these systems.
Ensuring a consistent relative order between the
data tuples is now the only remaining reason to broad-
cast read and write operations to a majority of the
nodes. If this ordering requirement can be disre-
garded, as is the case for data tuples representing in-
dependent or commutative operations (Shapiro et al.,
2011), we can fundamental change the replication
2
https://redis.io
3
http://www.spread.org
logic. Under these conditions, a store-and-forward
system can freely choose any f + 1 nodes for the stor-
age of each data tuple, and any subset of f nodes can
still fail without risking data loss.
This work aims to define a data replication proto-
col specifically tailored for store-and-forward systems
handling independent data tuples. The bandwidth re-
quired for the replication should be less than the or-
der of kn per node, to avoid using most of the band-
width replicating the data tuples as opposed to deliv-
ering them. Ideally, the replication overhead should
be in the order of kf per node. Each data tuple should
be forwarded only once, though a minuscule number
of duplications are acceptable in exceptional circum-
stances. We do not specify this requirement as simply
“at least once”, because that would allow for all data
tuples to be forwarded repeatedly which breaks the
bandwidth requirement.
The replication protocol we will describe in this
work satisfies the aforementioned requirements, and
allows replication writes in all network partitions with
at least f +1 nodes. We claim the following contribu-
tions in relation with this protocol.
1. A high level description of its functionality.
2. An open sourced proof-of-concept implementa-
tion of the data replication parts of the protocol.
3. A performance analysis on throughput and la-
tency, both when deployed within a local network
and for a geo-distributed system configuration.
Following this introduction is a description of the
assumptions we have made about our system model,
and a sample application context. Section 2 describes
the proposed protocol. Section 3 describes the exper-
iment conducted to evaluate its performance, with the
results presented in Section 4 and discussed in Sec-
tion 5. Finally, Section 6 discusses related work, and
Section 7 holds conclusions and possible future work.
1.1 System Model
Our system model comprises a collection of n nodes,
named node
1
,node
2
,. ..,node
n
. Each node can ex-
change data with any other node, and may join and
leave the system at any time. The nodes are crash-
recovery, so they may also rejoin after crashing. Our
model is asynchronous as the nodes may be geograph-
ically distant from each other.
In accordance with the store-and-forward archi-
tecture, we have a set of producers and consumers,
each one connected to a subset of the system nodes.
Data tuples are received from producers, stored in
a local queue and subsequently forwarded to one of
the consumers, after which they are removed from
Superlinear and Bandwidth Friendly Geo-replication for Store-and-forward Systems
329