SCHEME FOR COMPARING RESULTS OF DIVERSE
SOFTWARE VERSIONS
Viktor Mashkov and Jaroslav Pokorny
Department of Software Engineering Faculty of Mathematics and Physics, Charles University
Malostranske nam. 25, 11800 Praha 1
Czech Republic
Keywords: Design diversity, software fault-toleran
ce, adjudicator, system-level diagnosis.
Abstract: The paper presents a scheme for comparing the results produced b
y diversely designed SW versions in order
to select and deliver presumably correct result. It also allows to determine all faulty versions of SW and all
faulty comparators. As compared to the majority voting scheme, it requires a lesser number of result
comparisons and is able, in most situations, to deliver presumably correct service even if the number of
faulty SW versions is greater than the number of correct ones. The scheme is based on system-level
diagnosis technique, particularly, on the comparison-based testing model. The proposed scheme can be used
for designing fault-tolerant diverse servers and for improving adjudicator in N-version programming
technique.
1 INTRODUCTION
The value of redundancy and diversity as a means of
tolerating faults in computing systems has long been
recognized. For SW faults, non-diverse replication
will fail to detect, or recover from, all those failures
that do not produce obvious symptoms like crashes,
or that occur in identical ways on all the copies of a
replicated system, and at each retry of the same
operations. For these kinds of failures, diverse
redundancy (often referred to as “design diversity”)
is required (Lyu, 1995).
One of the main mechanism
s for building a fault-
tolerant server from two or more diverse servers is
the adjudicating mechanism responsible for
delivering correct service by selecting the one
among the multiple services produced by different
servers. Adjudicating the results produced by
different servers is considered in tight connection
with such design issues as synchronization between
the servers to guarantee data consistency between
them, possible indeterminism of servers, translation
of the client queries to be “understood” by different
servers (e.g., SQL servers), etc. It must be
recognized that the success of a fault-tolerance
scheme depends to a great extent upon its
adjudicator and unreliability in the adjudicator can
have a dramatic impact on the overall system
reliability (Lee et al., 1990). One original and
effective solution for constructing the simple
adjudicator is presented in (Xu, 1991). The author
suggests new specific fault tolerance scheme, called
t/(n-1)-variant programming, based on the theory of
system-level fault diagnosis. It is worth noting that
the adjudicator in this scheme intends to detect only
the correct variant. We also are going to exploit the
theory of system-level fault diagnosis to construct
the simple (with minimum number of result
comparisons) and, thus, reliable adjudicator. We
extend the functionality of adjudicator to detect not
only the correct variant but also all the faulty ones,
as well as all the faulty comparators. In some cases,
adjudicator can select and deliver presumably
correct service even if the number of faulty SW
variants is greater than the number of fault-free ones.
2 COMPARISON-BASED
TESTING MODEL
A diagnosable system S consists of n units denoted
by the set U={u
1
, u
2
,…, u
n
}. Each unit u
i
, u
i
U, is
assigned a particular subset of the remaining units in
S to compare its own result with result of each unit
341
Mashkov V. and Pokorny J. (2007).
SCHEME FOR COMPARING RESULTS OF DIVERSE SOFTWARE VERSIONS.
In Proceedings of the Second International Conference on Software and Data Technologies - SE, pages 341-344
Copyright
c
SciTePress
from this subset. Comparison of results of two units,
u
i
and u
j
, (also termed as test link) is carried out by a
comparator, and denoted as ω
ij
. For each pair of
units there can be only one test link. The complete
collection of comparators is called comparison
assignment Ω={ω
ij
}, and is represented by an
undirected graph G=(V, E), where each unit u
i
U is
represented by a vertex v
i
V, and each edge e(v
i
, v
j
)
is in E if and only if ω
ij
is a comparator in the
comparison assignment Ω. The outcome produced
by comparator ω
ij
(denoted as r
ij
) can take the value
either 0 or 1 if the results of units u
i
and u
j
agree or
disagree respectively. The outcomes r
ij
and r
ji
present the same outcome produced by one
comparator, and are always r
ij
=r
ji
. The collection of
all outcomes is called the comparison syndrome,
R={r
ij
}. Two classes of faults, independent and
related, are taken into account. So, two incorrect
results produced by the units can be the same due to
the manifestation of related faults.
As distinct from t/(n-1)- diagnosability
(Friedman, 1975), we intend to identify all correct
and all incorrect results produced by the units (i.e.,
perform the system diagnosis). Our approach to
system diagnosis is based on the consistency
examination of multiple unit sets.
3 CONSISTENT SETS AND
SYSTEM DIAGNOSIS
For our purposes, we will now introduce several
definitions, assumptions and simple lemmas. In
order not to overload the paper with the details, the
proofs of lemmas are omitted.
Definition 1. For a system S and a comparison
syndrome R, a subset YU is a consistent set if and
only if:
1) u
i
Y and u
j
Y if r
ij
=0;
2) u
i
Y and u
j
U-Y if r
ij
=1;
3) u
i
Y and u
j
Y if P(v
i
, v
j
), P={e(v
i
, v
l
), e(v
l
,
v
p
),…, e(v
k
, v
j
)}, u
i
, u
l
, u
p
,…, u
k
, u
j
Y and
r
il
=r
lp
=…=r
kj
=0.
Following the widely accepted approach
according to which the result produced by the
majority of units are trusted (e.g., NVP), we make
the following assumptions.
Assumption 1. If the number of faulty units
doesn’t exceed t, then the consistent set Y, for which
Y⎪≥t+1, is a set of fault-free units. We named such
consistent set as consistent fault-free set, Y
FF
.
Definition 2. Two consistent sets, Y
i
and Y
j
, are
in contradiction with each other when
Y
i
Y
j
: r
ij
=1, u
i
Y, u
j
Y, i,j1¯,N.
Lemma 1. Given any syndrome, and if the
number of faulty units doesn’t exceed t, then all
fault-free units are either in Y
FF
or in consistent sets
which satisfy the following two conditions:
1) they are not in contradiction with each other;
2) their total number of units is greater than t.
Assumption 2. The state of the unit can be
correctly diagnosed if and only if there is a test link
between this unit and at least one fault-free unit.
With the account of the above assumptions, we
will now introduce the following lemma.
Lemma 2. A system S composed of N units is t-
diagnosable if and only if, given any syndrome, each
unit u
i
, u
i
U, i=1, 2,.., N, has z(u
i
)⎪≥ t+1 test links
with other units, provided that the number of faulty
units in S doesn’t exceed t, where z(u
i
)= {u
j
:
ω
ij
∈Ω}.
The credibility of system diagnosis result will be
greater when all fault-free units are in Y
FF
, since it
is evident that probability of the hypothesis that t+1
or more faulty units produce the same incorrect
result is lesser than probability of the hypothesis that
these faulty units produce any incorrect results. In
view of this, we examine how many test links are
needed in order that all fault-free units be in Y
FF
.
The sought number of test links depends, to a great
extent, on the number of units in system, N, and on
the comparison assignment.
4 COMPARISON ASSIGNMENT
From the assumption that majority of units are
trusted it follows that for correct system diagnosis
the number of fault-free units, C, must be greater
than the number of faulty units, F. Let t be the
number of faulty units, F=t. Then N-t>t or N>2t
or N2t+1, which is the same result as the one of the
PMC model (Preparata et al., 1967). Since even
numbers N do not increase the value t as compared
to odd numbers, the further consideration is only
related to the odd numbers N. We consider the worst
situation when the number of faulty units is equal to
t, and all faulty units produce the same incorrect
result. From Lemma 2, it follows that for a system S
to be t-diagnosable it is sufficient that each system’s
unit has t+1 test links with other units. Such
comparison assignment is called basic. Since each
test link engages two units, the minimal number of
test links providing system t-diagnosability, T
min
, is
equal to T
min
= N(t+1)/2. At the first stage, we
examine whether T
min
is sufficient in order that all
ICSOFT 2007 - International Conference on Software and Data Technologies
342
fault-free units be in Y
FF
. For this we inspect the
minimal possible number of test links between fault-
free units, R, given T
min
. The sought number R can
be determined as R=(q-k)/2, where k- is the
maximum number of test links between units from
the sets F and C; q- is the total number of test links
of all fault-free units. For F=t and C=t+1,
k=t(t+1) and q=(t+1)
2
. Then R=(t+1)/2. Consider
the corresponding subgraph G(C
u
) of graph G(V,E).
Vertices of G(C
u
) correspond to the fault-free units
and edges correspond to the test links among units of
C. The G(C
u
) is connected if its number of edges is
greater than t(t-1)/2. Thus, in order that all fault-free
units be in Y
FF
it is necessary that R > t(t-1)/2 or
2t+1 > t
2
. This inequality is met for t<3 (i.e., for
N<7). For N7 the T
min
is not sufficient in order that
all fault-free units be always in Y
FF
. At the second
stage, we examine for N7 how many additional test
links are needed in order that all fault-free units be
in Y
FF
. We consider the worst case when faulty
units don’t have mutual test links. It means that the
corresponding subgraph G(C
u
) may have the
maximum π=(N-t)/2 components, each of which
consists either of two vertices for N=3+4a,
a=0,1,2,…, or of two vertices except the one
consisting of three vertices for N=5+4a, a=0,1,2,…
In order to connect these π components, (π-1)
additional edges are necessary.
The Table 1 presents the numbers of test links
needed for system t-diagnosability for three cases:
Case1: basic comparison assignment, T
min
;
Case 2: comparison assignment providing that all
fault-free units be in Y
FF
;
Case 3: comparison assignment based on pairwise
comparison needed for majority voting.
Table 1: Number of test links needed for system diagnosis.
N Case 1 Case 2 Case 3
3 3 3 3
5 8 8 10
7 14 15 21
9 23 24 36
11 33 35 55
13 46 48 78
15 60 63 105
17 77 80 136
The total number of result comparisons needed
for system diagnosis (cases 1 and 2) is lesser than
the one needed for majority voting (case 3) since
system diagnosis doesn’t require pairwise
comparison of all units’ results. The diagnosis
algorithm presented below is designed for the
systems with comparison assignment providing that
all fault-free units be in Y
FF
.
Algorithm 1
. (Given any syndrome R={r
ij
})
Step 1: set L
ui
=, i=1,2,..,N;
Step 2: For i=1 to N
For j=1 to N
if r
ij
=0 then L
ui
= L
ui
{u
j
} ;
Step 3: s=1; i=1;
Step 4: if i=t+2 then STOP (“Number of faulty
units >t”);
Step 5: L
ui
s
= L
ui
;
Step 6: For j=1 to N
if u
j
L
ui
s
then L
ui
s+1
= L
ui
s
L
ui
;
Step 7: if L
ui
s+1
- L
ui
s
= then proceed with next Step ;
otherwise L
ui
s
= L
ui
s+1
and GOTO Step 6 ;
Step 8: if L
ui
s+1
<t+1 then i=i+1 and GOTO
Step 4 ;
otherwise STOP (“Y
FF
= L
ui
s+1
”) ;
The comparison-based model also allows to detect
the faulty comparators (i.e., incorrect outcomes of
result comparisons). The correct result of diagnosing
is guaranteed when the total number of faulty units
and faulty comparators doesn’t exceed t. The
detection of faulty comparators is based on checking
the outcomes of result comparisons which should be
consistent with the result of system diagnosis. It
means that the outcomes of comparing the results of
fault-free units must be “0”, and the outcomes of
comparing the results produced by fault-free units
and faulty units must be “1”. When the outcome
different from those is found, it means that the
corresponding comparator is faulty.
The approach to system diagnosis based on the
examination of consistent sets can also be used to
deliver presumably correct service in most
situations when the total number of faulty units and
faulty comparators exceeds t. It is obvious that the
service in these cases will have lesser credibility as
compared to the cases when total number of faulty
units in the system doesn’t exceed t. However, it is
possible to set lower bound on the credibility of
system diagnosis result which would be acceptable
for some practical applications. When the
credibility of system diagnosis result is above this
bound, the service can be delivered to the client.
The credibility of system diagnosis result can be
determined by way of computing the probabilities
of hypotheses that different consistent sets are the
sets of fault-free units.
In the situations when there are more than t
faulty units, we suggest attempting to find the
consistent set of size t, and if there is only one such
consistent set (i.e., this consistent set is the greatest
one), it can be considered as a set of presumably
SCHEME FOR COMPARING RESULTS OF DIVERSE SOFTWARE VERSIONS
343
fault-free units. This statement is based on the fact
that a posteriori probability that the units of the
greatest consistent set are fault-free satisfies the
measure ε (used in majority voting techniques) when
comparing this hypothesis with all the other ones.
There is only one exception when there exist t units,
which are not in this greatest consistent set, and they
don’t have mutual test links. If t units don’t have
mutual test links, then the probability P(R/H
t
) of
obtaining syndrome R under the situation when
these t units are considered as fault-free ( hypothesis
H
t
), is not negligible, and the measure ε may not be
satisfied. Therefore, the diagnostic algorithm for the
situation when there are more than t faulty units in
the system, is designed so that it tries to find the
greatest consistent set of size t (Steps 1÷4), and to
check that there are no t presumably faulty units that
don’t have mutual test links (Steps 5÷8).
Algorithm 2
. (Given syndrome R and consistent
sets Y
1
,Y
2
,.., Y
τ
).
Step 1 : i=1 ; α=0 ;
Step 2 : if i=τ+1 then GOTO Step 4;
Step 3: if Y
i
=t then D=U-Y
i
; C=Y
i
; α=α+1; i=i+1;
and GOTO Step 2;
otherwise i=i+1 and GOTO Step 2;
Step 4: if α=1 then proceed with next Step;
otherwise STOP (“there is no the greatest
consistent set of size t”);
Step 5: k=1;
Step 6: if k=N+1 then STOP (“C is the set of fault-
free units”);
Step 7: if u
k
D then D=D-{u
k
} and proceed with
next Step;
otherwise k=k+1 and GOTO Step 6;
Step 8: For i=1 to N
For j=1 to N
if u
i
D and u
j
D and ω
ij
=1 then k=k+1
and GOTO Step 6;
otherwise STOP (“there is no acceptable
system diagnosis result”);
5 CONCLUSIONS
Design diversity used for achieving fault tolerance
needs comparing of results produced by different
SW variants (servers). For many servers (e.g., SQL
servers) the comparison procedure is non-trivial and
usually requires complex operations. In view of this,
the comparators cannot be considered as absolutely
reliable, and it is important to reduce the total
number of comparators in the adjudication
mechanisms. One of the practical approaches
allowing to simplify the adjudicator relies on the
system-level fault diagnosis technique. The novelty
of this research is that the system-level fault
diagnosis is used for broader purpose, namely, not
only for detecting the fault-free unit, but also for
detecting all faulty units and comparators. In many
situations, it also allows to deliver service which can
be considered as correct (with acceptable
probability) when the total number of faulty units
and comparators exceeds the measure of
diagnosability, t. We consider the consistent sets of
units which are derived from the obtained syndrome
as a core element of system diagnosis. Based on the
examination of possible consistent sets, we have
designed the comparison assignment which is
simpler as compared to the one required for majority
voting (Kuncheva, 2003). We have also developed
the simple diagnosis algorithm for the system with
the designed comparison assignment. The proposed
comparison assignment providing system diagnosis
can be used for designing simple and reliable
adjudicator of fault-tolerant diverse servers or for
improving adjudicator in N-version programming
scheme.
ACKNOWLEDGEMENTS
This research was supported in part by the National
program of research (Information society project
1ET100300419).
REFERENCES
Friedman, A., 1975. A new measure of digital system
diagnosis. Proc. Int. Symp. Fault-tolerant computing,
Paris, France, pp. 167-170.
Kuncheva, L., Whitaker, J., Shipp, C., 2003. Limits on the
majority vote accuracy in classifier fusion, Pattern
analysis and applications No. 6, pp. 22-31.
Lee, P., Anderson, T., 1990. Fault tolerance: Principles
and practice, Prentice-Hall, 2
nd
edition.
Lyu, M., 1995. Software fault tolerance, trends in
software series. Wiley.
Preparata, F., Metze, G., Chien, R., 1967. On the
connection assignment problem of diagnosable
systems. IEEE Trans. Electron. Comput., Vol. EC-16,
pp. 848-854.
Xu, J., 1991. The t/(t-1)-diagnosability and its applications
to fault tolerance. Technical report series No. 340,
University of Newcastle upon Tyne.
ICSOFT 2007 - International Conference on Software and Data Technologies
344