Predicting Read- and Write-Operation Availabilities of Quorum

Protocols based on Graph Properties

Robert Schadek, Oliver Kramer and Oliver Theel

Department of Computer Science,

Carl von Ossietzky University of Oldenburg, Germany

Keywords:

Distributed Systems, Fault Tolerance, Data Replication, Quorum Protocols, Operation Availability Prediction,

K-nearest Neighbors.

Abstract:

Highly available services can be implemented by means of quorum protocols. Unfortunately, using real-world

physical networks as underlying communication medium for quorum protocols turns out to be difﬁcult, since

efﬁcient quorum protocols often depend on a particular graph structure imposed on the replicas managed by

it. Mapping the replicas of the quorum protocol to the vertices of the real-world physical network usually

decreases the availability of the operation provided by the quorum protocol. Therefore, ﬁnding mappings

with little decrease in operation availability is the desired goal. The mapping with the smallest decrease in

operation availability can be found by iterating all mappings. This approach has a runtime complexity of O(N!)

where N is the number of vertices in the graph structure. Finding the optimal mapping with this approach,

therefore, quickly becomes unfeasible. We present, an approach to predict the operation availability of the

best mapping based on properties like e. g. degree or betweenness centrality. This prediction can then be used

to decide whether it is worth to execute the O(N!) algorithm to ﬁnd the best possible mapping. We test this

new approach by cross-validating its predictions of the operation availability with the operation availability of

the best mapping.

1 INTRODUCTION

Providing highly available access to a data object is

a core problem in the ﬁeld of computer science. Re-

lying on a single replica of a data object greatly limits

the availability of this data. This problem can be mi-

tigated by creating multiple replicas of the same data.

Using multiple replicas increases the availability of

the data object as it can be accessed using different

replicas. But replicating the data object also introdu-

ces the need for synchronization. Let the data object

be replicated on ﬁve replicas, as shown in Figure 1. If

1 2

Figure 1: Five replicas of a data object.

the data object located on replica 0 is updated with a

new value, then reading the data object from replica

4 does not yield the up-to-date value. Usually, this

is not the intended behavior of a read operation. An

additional problem is that two concurrent write opera-

tions can be executed on different replicas of the same

data object at the same time. For example, value a is

written to the data object on replica 2 and at the same

time value b is written to the data object on replica

3. This raises the following question: Which value

is the correct one? Both examples show that simply

creating multiple replicas of a data object does not

necessarily lead to the expected results. Usually, the

goal is that all operations behave as they are expected

to behave on a non-replicated data object. More for-

mally, this non-replicated behavior can be achieved

by a control protocol that guarantees one-copy seriali-

zability (1SR) (Bernstein et al., 1987). Many quorum

protocols (QPs) implement such behavior. In gene-

ral, QPs provide highly available access to data by

means of replication and at the same time maintain

1SR. In most cases, QPs provide a read and a write

operation to read and write data. QPs manage a set

of replicas. These QPs use read quorums (RQs) and

write quorums (WQs) to execute the desired opera-

tion. Quorums are speciﬁc subsets of replicas of the

set of all replicas. Commonly, a read operation reads

the data of all replicas of a RQ and identiﬁes an up-

to-date replica, for example, by means of version IDs.

Write operations, on the other hand, use an atomic

550

Schadek, R., Kramer, O. and Theel, O.

Predicting Read- and Write-Operation Availabilities of Quorum Protocols based on Graph Properties.

DOI: 10.5220/0006645705500558

In Proceedings of the 10th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2018) - Volume 2, pages 550-558

ISBN: 978-989-758-275-2

commit protocol, like the Two-Phase Commit Proto-

col (Bernstein et al., 1987), to write the new data to all

replicas of a WQ and thereby providing a new highest

version ID.

One possibility to achieve 1SR with a QP is to

have all RQs intersect with all WQs, and all have all

WQs intersect with all other WQs. Additionally, re-

plicas contained in a WQ are locked for writing, and

replicas contained in RQ are locked for reading. Any

replica if locked can only exclusively be locked for

reading or writing at any point in time. To execute

a read (write) operation, all replicas in the RQ (WQ)

have to be locked for the desired operation. As an ex-

ample, consider a WQ consisting of three out-of ﬁve

replicas of Figure 1. The write operation using this

WQ writes the value c with a version ID 12 to three

replicas 0,1, and, 2. Then, a read operation using a

RQ, again with three out-of ﬁve replicas, is execu-

ted. This read operation reads a replica that hosts

the last written data c with version ID 12, no matter

which three replicas are chosen for the RQ. This QP

is commonly known as the Majority Consensus Pro-

tocol (MCS) (Thomas, 1979), and is shown in more

detail in Section 3.1.

In order for read (write) operations to work, the

data of the operations need to be sent to and recei-

ved by the replicas of the read (write) quorum used.

Most QPs implicitly rely on a completely connected

graph structure (GS) as a communication medium be-

tween its replicas (Schadek and Theel, 2017). We call

this assumed GS a logical network topology (LNT). A

physical network topology (PNT) is a GS that actually

arranges and connects the replicas in the real-world.

QPs have no inﬂuence on the PNT. In (Schadek and

Theel, 2017), it is shown that the PNT used as a com-

munication medium between the replicas, has to be

considered in the cost and availability analyses of a

QP to improve the accuracy of those analyses. Gi-

ven a QP with N replicas and a PNT with N vertices,

there are N! possibilities to place these replicas on the

N vertices of the GS of the PNT. One such assign-

ment is called a mapping. Which mapping is chosen

as the best one, can depend on different criteria. For

example, the best mapping can be a mapping where

the difference of the availability of the QP and the

availability of the QP mapped to a PNT is the smal-

lest. The selected criterion then has to be tested for all

N! mappings to ﬁnd the best mapping. For a growing

number of replicas, ﬁnding the best mapping beco-

mes increasingly hard, due to the factorial nature of

the problem.

The execution of the costly O(N!) algorithm

should be avoided, whenever the GS supposed to be

used as a PNT is not well suited to be used as a PNT

for the given QP.

We show how properties like e. g. betweenness

centrality (BC) can be used to estimate the opera-

tion availability of a QP when mapped to a PNT.

These properties are usually much easier to compute

than the computational expansive O(N!) mapping al-

gorithm. Given the operation availability prediction

based on these properties, the user then can decide

whether it is worth ﬁnding the best mapping. The

K-nearest neighbor (kNN) (Cover and Hart, 1967)

approach is used for our predictions. We evaluate a

number of properties and combinations of these pro-

perties for their prediction accuracy.

This paper is structured as follows. In Section 2,

we present the system model used in this paper. In

Section 3, related work and the evaluated properties

are presented. The mapping approach is presented

in Section 4. Section 5 discusses the use of kNN in

the presented prediction approach. This section also

includes an evaluation of the resulting approach. A

conclusion and future work are given in Section 6.

2 SYSTEM MODEL

In order to analyze QPs, their characteristics, their use

on PNTs, and the prediction capabilities of the proper-

ties, we ﬁrst deﬁne a coherent model.

2.1 Graph Structure

A graph structure GS = (V,E) is a two-tuple of a set

of vertices V and a set of edges E. Edges connect ver-

tices. V (GS) gives the set of vertices of a GS. E(GS)

gives the set of edges of a GS. A vertex v ∈ V is a

three tuple v = (i,c

), where i ∈ N is the ID of the

vertex and c

and c

are the coordinates (i. e. its lo-

cation of the vertex in the corresponding dimensions.

The shorthand notation v

gives a vertex with ID i.

N = |V (GS)| represents the number of vertices in a

GS.

An edge e

i, j

∈E is deﬁned as e

i, j

:= (v

), where

∈V .

A path hv

,...,v

i between v

and v

exists in

GS, iff:

∀i,0 ≤ i ≤n : ∃v

∈V (GS) and (1)

∀i,0 ≤ i < n : ∃e = (v

i+1

) ∈ E(GS) (2)

If a path exists, then the two vertices v

and v

are

called connected. V(hv

,...,v

i) denotes the set

of vertices of a path such that:

V(hv

,...,v

i) :={v

,...,v

}. (3)

Predicting Read- and Write-Operation Availabilities of Quorum Protocols based on Graph Properties

551

E(hv

,...,v

i) denotes the set of the edges of a

path such that:

E(hv

,...,v

i) :={e

0,1

...,e

n−1,n

}. (4)

The shorthand notation ”∃hv

,...,v

i ∈ GS“ can

be used to state that the shown path exists in the GS.

Each vertex is assumed to be hosting exactly one

replica of the replicated data object.

2.2 Graph Properties

The betweenness centrality (BC) property (Freeman,

1977) deﬁnes how often a vertex v occurs in all shor-

test paths of a GS, it is deﬁned as:

g(v) =

∑

s6=v6=t

(v)

. (5)

Where σ

(v) gives the number of shortest paths bet-

ween the vertices s and t in which vertex v is part of.

represents the number of shortest paths between

the vertices s and t. To use this vertex property in a

way that describes the complete GS, we compute the

minimum, the average, the mode, the median, and the

maximum of the BC values of all vertices of the GS.

The second property we evaluate are distance pro-

perties between vertices. We call this property the di-

ameter (Harary, 1969). Let ε(v) be the longest shor-

test path of vertex v to any other vertex v ∈ V(GS).

This is used to compute the minimum, the average,

the mode, the median and the maximum distance ba-

sed on all vertices v ∈V (GS).

The degree deg(v) (Harary, 1969) describes how

many edges a vertex v is connected to. Again, the

minimum, the average, the mode, the median, and the

maximum degree are considered.

Finally, the connectivity (Harary, 1969) is consi-

dered. Connectivity is the minimum number of verti-

ces that need to be removed to disconnect one or more

vertices from the rest of the vertices of the GS.

2.3 Consistency Criterion

In this paper, we discuss QPs that guarantee 1SR. In-

formally, 1SR states that read and write operations on

replicated data objects have the same observable ef-

fect as operations on non-replicated data (Bernstein

et al., 1987). QPs use quorums in the course of exe-

cuting operations. Usually, QPs provide a read ope-

ration, using a RQ and a write operation, using on a

WQ. Let Q be a QP providing read and a write ope-

rations that upholds the 1SR property. When 1) every

RQ of Q intersects

with every WQ of Q, 2) all WQs

Two quorums a and b intersect, if a ∩b 6=

of Q intersect with each other, and 3) replicas of Q

can be locked exclusively for a read operation, or a

write operation, then 1SR is guaranteed. Q upholds

1SR, since only a single write operation can write all

replicas of its WQ, or one or more read operations can

read the replicas of its RQ. This quorum intersection

approach is used by many QPs to provide 1SR. This

also holds for the QPs discussed in this paper.

2.4 Fault Model

Replicas are assumed to exhibit a fail-silent behavior.

All failures are assumed to be independent of each

other. The availability of a replica is described by p,

where 0 ≤ p ≤1. A p value of 1 means that the replica

is available with a probability of 100% at an arbitrary

point in time. A p value of 0 means that the replica

is available with a probability of 0%. All replicas are

assumed to have the same p value. Communication

channels aka. edges are assumed to be always avai-

lable. These simpliﬁcations gives way to a feasible

analysis.

2.5 Read and Write Availability

The probability that a read or write operation is avai-

lable for a given QP under the replica availability p

is described by a

(p) and a

(p), respectively, where

0 ≤a

(p), a

(p) ≤ 1. The minimal average costs per

read and write operation are given by c

(p) and c

(p).

Let N be the total number of replicas.

RQS ={{(q

,{sq

1,1

,...,sq

1,m

}),

...,(q

,{sq

n,1

,...,sq

n,z

})} | (6)

∈ P(V(GS)) (7)

∧sq

i, j

∈ P(V(GS)) (8)

∧isReadQuorum(q

) (9)

∧(q

,sq

i, j

) : sq

i, j

⊃ q

(10)

∧q

: q

6⊇ q

(11)

∧(q

,sq

i,n

),(q

,sq

j,m

) : sq

i,n

6= sq

j,m

(12)

}

(p) =

∑

∀(q,sq)∈RQS

|q|

(1 − p)

N−|q|

∑

∀(t∈sq∧(q,sq)∈RQS)

|t|

(1 − p)

N−|t|

(13)

(p) =

∑

∀(q,sq)∈RQS

|q|(p

|q|

(1 − p)

N−|q|

)

∑

∀(t∈sq∧(q,sq)∈RQS)

|q|(p

|t|

(1 − p)

N−|t|

) (14)

(p) = ct

(p)/a

(p) (15)

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

552

For some QPs, there exists a closed formula to com-

pute a

(p), a

(p), c

(p), and c

(p) respectively. In

general, QPs allow to test whether a set of replicas is

a RQ, or a WQ. Formula 9 shows how these tests can

be used to compute the a

(p), and the c

(p). The for-

mulas for a

(p), and c

(p) are analogous to a

(p),

and c

(p). A read quorum set (RQS) is used to eva-

luate the a

(p), and the c

(p). It consists of a set of

tuples that consists of a quorum q and a set of all other

quorums that are supersets of q and are not present in

any other such superset. Formulas 9 to 12 restrict the

form of the set. The form of the sets is restricted in a

way that no quorum appears more than once, as this

would erroneously add to the availability mass of the

set. P(s) (Devlin, 1979) denotes the power set of all

replicas of a QP. The value of a

(p) is then calcula-

ted by totaling the probability of the quorums being

available as well as their supersets. ct

(p) serves as

a temporary in the calculation of the c

(p). ct

(p) is

calculated similar to the a

(p). For each q

, and each

i, j

the availability is calculated. This availability is

then multiplied with the number of replicas of q

each element of the RQS. The ct

(p) value is depen-

ded on the availability of the complete RQS. To re-

move this inﬂuence and thereby normalize the a

(p)

over all p, ct

(p) is divided by a

(p). The result of the

division is the c

(p). The number of replicas in q

also used for the supersets of q

as it is assumed that

QPs use smallest possible quorum. And the smallest

possible quorum is represented by q

. The calcula-

tion for a

(p) and c

(p) is only different in that they

use the write quorum set (WQS) instead of the RQS.

WQSs differs from RQSs in that its elements are WQs

instead of RQs for the given QP.

3 DISCUSSION OF RELATED

WORKS

3.1 The Majority Consensus Protocol

The Majority Consensus Protocol (MCS) (Thomas,

1979) is a QP, that reads

N/2

replicas and writes

(N + 1)/2

replicas, where N is the total number of

replicas. The MCS guarantees 1SR. The read availa-

bility a

(p) of the MCS is

(p) =

∑

N/2





(1 − p)

N−k

(16)

and the write availability a

(p) is

(p) =

∑

(N+1)/2





(1 − p)

N−k

(17)

1 2

(a) A GS used by the Trian-

gular Lattice Protocol.

(b) A random GS.

Figure 2.

(Koch, 1994). The MCS assumes that all vertices hos-

ting the replicas are directly connected (Schadek and

Theel, 2017). Therefore, the LNT implicitly used by

the MCS is a complete GS (Chartrand et al., 2010).

MCS works in the manner as described in Section 1.

3.2 The Triangular Lattice Protocol

The Triangular Lattice Protocol (TLP) (Wu and Bel-

ford, 1992) is an example of a QP using a LNT that

is not a complete GS. Figure 2a shows an example of

a GS used by the Triangular Lattice Protocol (TLP).

The TLP is a very efﬁcient QP only requiring

√

N re-

plicas to read and write in the best case, if the LNT

used is a square. Every RQ consists of a complete

vertical or a complete horizontal path through the GS.

Every WQ consists of a complete vertical and a com-

plete horizontal path through the GS. In Figure 2a,

the diagonal path h0,4,8i connecting the replicas re-

presents a minimal path that crosses the GS vertically

as well as horizontally. This diagonal path, since it

is very short, can therefore be used as a very efﬁcient

WQ. Due to the layout of the GS, vertical and hori-

zontal paths always intersect in at least one replica.

This way, it guaranties 1SR in the previously descri-

bed way. As quorums for the TLP are created by ﬁn-

ding paths through a GS, no simple closed formula ex-

ists yet that calculates the read and write availability.

Therefore, we use the formulas presented in Section

2.5 for this purpose.

4 THE MAPPING APPROACH

A mapping is an injection from one GS to another GS.

This requires that the number of replicas in the codo-

main structure is at least equal to the number of the

replicas of the original structure. Formally, a map-

ping M(GS,GS

) from GS to GS

is deﬁned as:

Predicting Read- and Write-Operation Availabilities of Quorum Protocols based on Graph Properties

553

M(GS,GS

) = {(v

),...,(v

)} (18)

∀(v,v

) ∈ M : v ∈V,v

∈V

(19)

∀(v,v

),(v, v

) ∈ M : v

= v

(20)

∀(v

,v),(v

,v) ∈ M : v

= v

(21)

(Schadek and Theel, 2017). When a mapping has

been deﬁned, the QP works on its LNT. For every

replica selected to participate in a quorum by the QP,

the mapping selects the mapped replica of the PNT.

After the QP has selected all necessary replicas to

construct a quorum based on its LNT, the mapping

is used to tests whether the replicas are connected in

the PNT. As mappings are usually not between homo-

morphic GSs, in the general case, it can be assumed

that the replicas of the quorum are not directly con-

nected with one another in the PNT. Consequently, a

mapping has to add additional vertices to reestablish

the connectedness and therefore the communication

between the replicas of the quorum. The availability

analysis of a mapped QP has to consider the additio-

nal replicas required on the PNT level. It is therefore a

desired goal, to obtain mappings that require only few

additional replicas on the PNT level in relation to the

LNT level, as thus the expected availability characte-

ristics of the QP on the LNT level can be matched the

closest. For that reason, the important question is how

to ﬁnd the best mapping.

To ﬁnd the best mapping, we need to compare

mappings. In order to compare mappings, we have

to deﬁne a comparison criterion. Any number of cri-

teria can be selected depending on the intended use of

the QP. Possible criteria are the average write costs,

maximal read costs, average write costs, write availa-

bility, etc.

In this paper, we use the average read and write

availability value (ARW) as the comparison criterion.

The ARW approximates the weighted accumulation

of the numerical integration of the read and write avai-

lability.

ARW = (wor ·

100

∑

p=0

(p/100))·

((1 −wor) ·

100

∑

p=0

(p/100)) (22)

wor ∈ [0, ... ,1] (23)

The particular value of the write over read (wor) is

a weighting factor between the read and write avai-

lability. A wor equal to 0.5 is used for the rest of

the paper, to not favor any operation. The higher the

ARW, the better is the mapping.

4.1 Optimal Mapping

In this section, we discuss the OPTIMALMAPPING al-

gorithm. The algorithm ﬁnds the optimal mapping

under the given ARW measurement criterion. “Opti-

mal” in the scope of this paper means: the mapping

where the ARW is the highest.

To get an intuition for the mapping approach,

we give the following example. Let the TLP of Fi-

gure 2a be mapped to the PNT in Figure 2b with

the mapping M(GS,GS

) = {(0,0), (1,1), (2,2),

(3,3), (4, 4), (5,5), (6, 6), (7,7), (8, 8)}. We as-

sume that in the current state of the system, replicas

0,1,2,4,5,6,7, and 8 are available. Let the TLP have

identiﬁed a WQ consisting of the replicas 0,4 and 8.

This is the currently availability WQ with the fewest

replicas. None of these replicas are directly connected

in the GS in Figure 2b under the current mapping. To

reestablish communication between the replicas, we

have to reconnect them with additional vertices. The

fewer additional vertices the better. Two additional

replicas are needed to reestablish communication be-

tween the elements of the WQ, e.g. replicas 3 and 6.

The reconnected quorum consisting of ﬁve replicas is

less likely to be available than the original quorum

consisting of three replicas.

Given a PNT, a RQS, and a WQS, the procedure

OPTIMALMAPPING, shown in Algorithm 3, ﬁnds the

optimal mapping. The runtime complexity of the pro-

cedure OPTIMALMAPPING is O(N!), where N is the

number of vertices in the PNT. It is O(N!), as there

are N! possible mappings for the N replicas to iterate.

The algorithm does not require any knowledge of the

Algorithm 1: Procedure APPLYMAPPING.

Input: quorums = (RQS,WQS)

mapping = mapping to use

GS = the graph structure to use

Result: a modiﬁed copy of quorums; where

for each quorum in RQS and WQS

the procedure FINDSMALLEST is

called

1 a ←

{FINDSMALLEST(MAP(q,mapping),GS)

| q ∈ RQS}

2 b ←

{FINDSMALLEST(MAP(q,mapping),GS)

| q ∈ WQS}

3 return (a,b)

QP used to generate the RQS and WQS, nor any kno-

wledge of the LNT used by the QP. It only requires

the RQS and the WQS created by the QP. This makes

the algorithm applicable to a wide variety of QPs. The

procedure APPLYMAPPING shown in Algorithm 1 is

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

554

used for each mapping. The procedure APPLYMAP-

PINGS ﬁnds the smallest set of mapped vertices for

each of the quorums in the RQSs and WQSs that re-

connects the replicas of the quorums. APPLYMAP-

PING does this by calling the procedure FINDSMAL-

LEST. The loop in line 3 in Algorithm 2 iterates the

subsets in increasing order of the number of replicas

contained in each subset. The result of APPLYMAP-

Algorithm 2: Procedure FINDSMALLEST.

Input: GS = graph structure in which the

verticesToConnect need to be

connected in

verticesToConnect = vertices for

which a path needs to exists, so that

they can communicate

Result: the smallest subset of V (GS) for

which the verticesToConnect are

connected

1 subsets ← P(V (GS))

2 smallest ←V (GS)

3 forall sbs ∈ subsets do

4 if ∀i, j ∈ verticesToConnect

|∃hi,..., ji ∈ GS ∧ |sbs| ≤ |smallest|

then

5 smallest ← sbs

6 end

7 end

8 return smallest

PINGS is then compared with the currently best known

mapping. Depending on the comparison criterion, it

is tested whether the currently tested mapping is the

best mapping. If it is better, then the current map-

ping becomes the new best known mapping. After all

mappings have been tested, the best known mapping

is indeed the optimal mapping.

5 THE MACHINE LEARNING

APPROACH

In this paper, the aim of the machine learning appro-

ach is twofold. The ﬁrst goal is to ﬁnd out whether

properties can be used to predict the read and write

availability of QPs when mapped to a particular GS.

If the ﬁrst goal can be achieved, then the second goal

is to identify properties or combinations of properties

that yield the most accurate predictions.

The K-nearest neighbor (kNN) (Cover and Hart,

1967) (Bailey and Jain, 1978) approach is used to

achieve these goals.

The basic idea of the approach is as follows. First,

we compute RQS and WQS for a given QP with N

Algorithm 3: Procedure OPTIMALMAPPING.

Input: RW s,W Qs = the RQS and WQS

created by the unmapped QP

GS = the graph structure, the QP

should be mapped to

Result: the best mapping i according to the

user supplied mapping

optMapping ← empty tuple

1 forall i ∈ MAPPINGS(GS) do

2 tmp ←

APPLYMAPPING((RQs,WQs),i,GS)

3 if tmp > optMapping then

4 optMapping ← tmp

5 end

6 end

7 return optMapping

replicas. Then, we generate a number of graphs with

N vertices, that are not isomorphic to each other. For

all these graphs, we compute the properties and stan-

dardize them (Mohamad and Usman, 2013). In the

next step, we compute the optimal mappings of our

tested QPs to all graphs. Then, we split the graphs

into m equally sized parts in preparation for the cross-

validation (CV)

. For all elements t in the powerset

P(T ) where T is the set of all properties we execute

the CV. During the CV, we select the kNNs based

on t for the currently tested GS and mapping. Based

on these k neighbors, we predict the read and write

availability of the optimal mapping for the currently

tested graph. Finally, we compare the prediction with

the actual values by means of the mean squared error

(MSE). Comparing different sets of properties based

on the MSE allows us to identify properties that are

well suited to estimate optimal mappings.

5.1 Test Data Generation

We generate two sets of random graphs. One set

of graphs has eight vertices each, the other set of

graphs has nine vertices each. Each set consists of

255 graphs. Having 255 graphs with nine vertices is

currently the upper limit of what is possible to simu-

late in an acceptable time frame. No graph in a set

is isomorphic to any other graph in its set. It exists a

path between every vertex to any other vertex in the

same graph. Each vertex is connected to one up to

N −1 other vertex, where N is the number of vertices

in the graph.

In our case, m is equal to 5.

Predicting Read- and Write-Operation Availabilities of Quorum Protocols based on Graph Properties

555

5.2 Implementation

The implementation is as follows. We begin by con-

structing RQS and WQS for the MCS for eight and

nine vertices. We repeat this process for the TLP with

a LNT being a 2×4, a 4×2, and a 3×3 grid. With the

help of the OPTIMALMAPPING procedure, we deter-

mine the optimal mappings for the MCS and the TLP

variants for all graphs in all groups.

Algorithm 5 (FINDBESTESTIMATOR) is the en-

try point into the ﬁnding of the best estimator or a

combination thereof. The graph properties and their

variants serve as estimators. The algorithm requires

a set of graphs, the estimators to consider, the RQS,

and the WQS created by the QP that is to be opti-

mally mapped. In line 3 of Algorithm 5, the algo-

rithm uses OPTIMALMAPPING to compute the opti-

mal mapping of the given graph with the given para-

meter. The results of the OPTIMALMAPPING will be

later used for the cross-validation. After the optimal

mapping has been computed for all given graphs, the

algorithm uses FINDBESTESTIMATORIMPL in Algo-

rithm 5 to begin the comparison of the estimators. In

line 3 in Algorithm 3 the set of graphs is partitioned

into k equally sized subsets. This is done in prepara-

tion for the cross-validation. Starting in Algorithm 4,

we begin the estimation process. P(Es) yields the po-

wer set of the investigated estimators. In other words,

we iterate all combinations of estimators starting on

this line. The variable tmpMSE is used to accumu-

late the MSE produced by the iterations of the cross-

validation starting of Algorithm 6. The kNN proce-

dure is called for all graphs in gss and the prediction

is stored in the variable est. The procedure COMPU-

TEMSE then computes the MSE between the estima-

tion est and the actual optimal mapping om. After m

executions of the cross-validation, it is checked in Al-

gorithm 13 whether the current estimator has a smal-

ler summarized MSE than the currently best one. If

that is the case, then the current estimator is taken as

the new best estimator. This part of the algorithm is

simpliﬁed for readability reasons. Technically, actu-

ally two comparisons take place. These two compa-

risons compare the MSE of the read availability and

the MSE of the write availability. This procedure con-

tinues until all elements of the power set have been

tested. Finally, the best estimator and its MSE are re-

turned. The choice of k for the kNN algorithm was

empirically set to 7. We use ﬁve strategies to combine

the seven estimations. These strategies are based on

the minimum, the average, the median, the mode, and

the maximum. For example, the minimum selects the

minimum read and write availability prediction for all

101 p values from all seven neighbors.

Algorithm 4: Procedure FINDBESTESTIMATO-

RIMPL.

Input: Gs = set of graphs to ﬁnd the graph

property-based estimator for

Es = set of estimators

Os = optimal mapping availability

results for all the graphs in GS for a

given QP

m = number of subsets to use for the

cross-validation

k = number of neighbors to ﬁnd

Result: set of estimators with the smallest

MSE availability prediction and its

MSE

1 lowestMSE ← {}

2 bestEstimator ← {}

3 gss ← split(Gs, k)

4 for e ∈ P(Es) do

5 tmpMSE ← {}

6 for i ∈ [0,m) do

7 for g ∈ gss(i) do

8 est ← kNN(k,g,e,gss,i)

9 om ←

getOptimalMapping(g,Os)

10 tmpMSE = tmpMSE +

computeMSE(est,om)

11 end

12 end

13 if tmpMSE < lowestMSE then

14 lowestMSE ← tmpMSE

15 bestEstimator ← e

16 end

17 end

18 return (bestEstimator, lowestMSE)

5.3 Evaluation

As mentioned in Section 2.2, the graph properties

which we call base properties, betweenness centrality

(BC), diameter, degree, and connectivity are evalua-

ted as estimators. Except for the Connectivity pro-

perty, all properties are vertex-based properties. But

as we need properties for the whole graph, we com-

pute the minimum, the average, the median, the mode,

and the maximum of these four properties. It has also

mentioned earlier that we are not only testing the in-

dividual properties, but also nearly all combinations

of properties. Each combination of graph properties

is only allowed to consist of different base properties.

For example, a combination of 1) maximum Degree

and 2) the median Degree has not been tested. Table

1 shows a selection of properties and combinations

of properties that predicted the read and write avail-

ability of the optimal mapping best in at least one

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

556

Table 1: The graph properties and graph properties combinations used in the kNN predictions that lead to the best predictions

in at least one instance.

Property \ ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

BetweennessAverage x x x x x

BetweennessMax x x x x

BetweennessMedian x x x x x x x x

BetweennessMin x x

BetweennessMode x x x x

Connectivity x x x x x x x x x x x x

DegreeAverage x x x x x x x

DegreeMax x x

DegreeMedian x x x x x

DegreeMin x x x

DegreeMode x x

DiameterAverage x x x x x x x x x x x x x

DiameterMax x x x x

DiameterMedian x

Table 2: MSE of the prediction of the read operation availability.

Min Avg Median Mode Max

MSE (ID) MSE (ID) MSE (ID) MSE (ID) MSE (ID)

MCS 8 130.92 (6) 0.00 (4) 0.00 (4) 3.65 (21) 149.21 (20)

MCS 9 84.16 (5) 0.00 (4) 0.00 (4) 2.98 (8) 175.70 (14)

TLP 4 ×2 0.22 (23) 0.00 (4) 0.00 (4) 0.22 (23) 0.22 (9)

TLP 2 ×4 0.67 (1) 0.00 (4) 0.00 (4) 0.67 (1) 0.45 (24)

TLP 3 ×3 8.34 (16) 0.00 (4) 0.00 (4) 1.07 (10) 66.48 (19)

Table 3: MSE of the prediction of the write operation availability.

Min Avg Median Mode Max

MSE (ID) MSE (ID) MSE (ID) MSE (ID) MSE (ID)

MCS 8 41.11 (3) 0.00 (4) 0.00 (4) 1.99 (18) 69.21 (9)

MCS 9 84.16 (5) 0.00 (4) 0.00 (4) 2.98 (8) 175.70 (14)

TLP 4 ×2 42.93 (11) 0.00 (4) 0.00 (4) 0.66 (12) 103.20 (17)

TLP 2 ×4 69.96 (7) 0.00 (4) 0.00 (4) 1.04 (13) 209.22 (15)

TLP 3 ×3 8.65 (2) 0.00 (4) 0.00 (4) 0.85 (10) 138.31 (22)

Algorithm 5: Procedure FINDBESTESTIMATOR.

Input: Gs = The set of graphs to ﬁnd the

graph properties based estimator for

Es = The set of estimators

(RQs,W Qs) = The RQS and WQS of

the QP to ﬁnd the best estimator for

1 Os ← {}

2 for g ∈ Gs do

3 Os ←

Os ∪OptimalMapping(RQs,W Qs,g,r)

4 end

5 FindBestEstimatorImpl(Gs, Es, Os,7,r)

instance. Note the ID used to label the sets. Table 2

and Table 3 show the results of the predictions. Each

of these tables show the MSE for the prediction of

the read and write operation availability using the ﬁve

combination functions for the kNN approach. In each

entry, for instance “130.92 (6)” from Table 2, the ﬁrst

value represents the MSE and the second value given

the ID of the estimator in Table 1. The set of proper-

ties identiﬁed by ID (6) consists of BetweennessA-

verage, and DegreeMin. The “x” marks the properties

that belong to the property sets that is identiﬁed by

their ID at the top of the table. The MSE is calculated

over 101 p values that are scaled up from the range of

0 ≤ p ≤ 1.0 to 0 ≤ p ≤ 100.0. After the MSE evalua-

tes the difference, it computes the power of that value.

The power of these small values would be even smal-

ler after taking them to the power of two. This goes

against the intention of the MSE. The presentation of

the MSE values was limited to two fractional digits,

with the exception of the 0.0 value.

In all tables, the average and the median of the

MSE is 0.0 for all QPs. In all cases, the estimator

used is BetweennessMax.

The dominance of the BetweennessMax can be

explained by looking into its meaning. Mappings are

Predicting Read- and Write-Operation Availabilities of Quorum Protocols based on Graph Properties

557

shortest paths through a graph. The BC property ma-

kes a statement about how often a particular vertex

is part of all shortest paths through a graph. Bet-

weennessMax expresses how often the replica, that

is part of the most shortest paths in a graph, is part

of a shortest path. Therefore, BetweennessMax ba-

sically states the BC value of the most important re-

plica in the graph in regards of mapping quorums. As

we are using graphs with the same number of vertices

for each kNN iteration, BetweennessMax turns out to

be a very good estimator for the quality of the map-

ping that we can expect from a graph. This is because

graphs with the same BetweennessMax value have a

very similar structure.

Overall, it can be said that the kNN methods in

combination with graph properties predict the read

and write operation availability of the optimal map-

ping very well. Especially, if the average or median is

used in the predictions.

Since an MSE of 0.0 can not be improved, we re-

frained from testing methods like e. g support vector

machines.

6 CONCLUSION AND FUTURE

WORK

In this paper, we presented an approach to predict

the read and write availability of mappings based on

graph properties. We have shown the high quality

of these predictions based on ﬁve examples with 255

graphs. Additionally, we have demonstrated that bet-

weenness centrality is a good property to use for pre-

dicting the read and write operation availability of

mappings of Quorum Protocols. With the approach

presented in this Paper, the reader has the opportu-

nity to make an informed decision whether or not it

is worth executing the computational expensive algo-

rithm to determine the optimal mapping.

Going forward, we will test more graphs and

graphs with more vertices. The next step would be

to test with 12 vertices as this would allow to test the

TLP on a 3×4 or 4×3 grids. Testing TLP on a 1×9,

1 ×10, or 1 ×11 grid is not useful, since degenerated

grids will transform the analyzed TLP into the Read-

One/Write-All protocol. In order to achieve this, ﬁrst

we have to improve the approach of ﬁnding an opti-

mal mapping signiﬁcantly. Currently, identifying an

optimal mapping with nine vertices takes about se-

ven hours. A graph with 12 vertices is 1320 times

more complex and would therefore take about a year

of computation time. Depending on the results obtai-

ned from these extended analyses, different prediction

methods may be utilized.

Our second goal is to use the predictions of graphs

with n vertices to give predictions of graphs with

n + m vertices, where m > 0. This approach could

signiﬁcantly reduce computation time.

REFERENCES

Bailey, T. and Jain, A. (1978). Note on distance-weighted

k-nearest neighbor rules. IEEE Transactions on Sys-

tems, Man and Cybernetics, SMC-8(4):311–313.

Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987).

Concurrency Control and Recovery in Database Sys-

tems. Addison Wesley.

Chartrand, G., Lesniak, L., and Zhang, P. (2010). Graphs &

Digraphs, Fifth Edition. Chapman & Hall/CRC, 5th

edition.

Cover, T. and Hart, P. (1967). Nearest neighbor pattern clas-

siﬁcation. IEEE Transactions on Information Theory,

13(1):21–27.

Devlin, K. (1979). Fundamentals of contemporary set the-

ory. Springer-Verlag, New York.

Freeman, L. C. (1977). A set of measures of centrality based

on betweenness. Sociometry, 40(1):35–41.

Harary, F. (1969). Graph theory. Addison-Wesley Publis-

hing Co., Reading, Mass.-Menlo Park, Calif.-London.

Koch, H.-H. (1994). Entwurf und Bewertung von Replika-

tionsverfahren (in German). PhD thesis, Department

of Computer Science, University of Darmstadt, Ger-

many.

Mohamad, I. B. and Usman, D. (2013). Standardization

and its effects on k-means clustering algorithm. Re-

search Journal of Applied Sciences, Engineering and

Technology, 6.

Schadek, R. and Theel, O. E. (2017). Increasing the accu-

racy of cost and availability predictions of quorum

protocols. In 22nd IEEE Paciﬁc Rim International

Symposium on Dependable Computing, PRDC 2017.

Thomas, R. H. (1979). A majority consensus approach

to concurrency control for multiple copy databases.

ACM Transactions on Database Systems, 4(2):180–

207.

Wu, C. and Belford, G. G. (1992). The triangular lat-

tice protocol: A highly fault tolerant and highly ef-

ﬁcient protocol for replicated data. In Proceedings of

the 11th Symposium on Reliable Distributed Systems

(SRDS’92). IEEE Computer Society Press.

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

558