Timed Transition Tour for Race Detection in Distributed Systems
Evgenii Vinarskii
1,
, Natalia Kushik
1
, Nina Yevtushenko
2,3
, Jorge L
´
opez
4
and Djamal Zeghlache
1
1
SAMOVAR, T
´
el
´
ecom SudParis, Institut Polytechnique de Paris, Palaiseau, France
2
Ivanikov Institute for System Programming, Russian Academy of Sciences, Moscow, Russia
3
Higher School of Economics, Moscow, Russia
4
Airbus, Issy-Les-Moulineaux, France
jorge.lopez-c@airbus.com
Keywords:
Races, Model Based Testing, Timed Finite State Machines, Timed Transition Tour.
Abstract:
The paper is devoted to detecting output races in distributed systems. We perform such detection through test-
ing their implementations. As an underlying model for our test generation strategy we consider a Timed Finite
State Machine or a TFSM (for short), where each input/output transition is augmented with a timed guard and
an output delay. A potential output race can thus be simulated as an output delay mutant; this formalism is
introduced in the paper. In order to build a test suite, we adapt a well-known test generation strategy, a transi-
tion tour method. The novelty of the proposed method relies on choosing appropriate timestamps for inputs,
yielding a timed transition tour. We discuss its fault coverage for output race detection. As an application case
study, we consider a Software Defined Networking (SDN) framework where the system under test is repre-
sented by the composition of a controller and a switch. Experimental results show that the timed transition
tour can detect races in the behavior of the widely used ONOS controller.
1 INTRODUCTION
Software and hardware (distributed) systems become
more complex and thorough testing and verification
of them is crucial. If a system implementation has
races its behavior can be different from the specifi-
cation (expected behavior) and the system can pro-
duce wrong responses to submitted requests, can have
deadlocks and livelocks (Baier and Katoen, 2008),
etc. Correspondingly, detecting races in a system un-
der test is important. In order to guarantee the absence
of races, model based verification and testing can be
utilized (see Section 2 for the related work). In Model
Based Testing (MBT), the system specification and its
implementation are described by the same model; fi-
nite transition systems are widely used in MBT (Ben-
harrat et al., 2017). When talking about tests for
detecting races, the model of a classical Finite State
Machine (FSM) becomes useless as there are no out-
put races in this model. Instead, Input/Output au-
tomata (Lynch and Tuttle, 1989) are more appropri-
ate for this purpose. However, in an Input/Output au-
tomaton there are no restrictions regarding how long
*
The work was partially supported by Erasmus program
On leave from Higher School of Economics, Russia
a tester can wait for an output, and thus, the testing
process can become really hard or even chaotic. One
possible solution is to use proper timed FSMs where
inputs can be applied in a row without waiting for a
produced output. However, in order to escape poten-
tial chaos, an output should be produced only after the
appropriate number of time units.
In this paper, we rely on the notion of a Timed
Finite State Machine as defined in (Vinarskii and Za-
kharov, 2020). In the aforementioned model, the be-
havior depends on its current state, the time instance
when an input is applied, and the time required to
process the input. Each input/output transition in the
TFSM is augmented with a timed guard and an out-
put delay; a transition is only executed if the corre-
sponding input is applied at a time instance which be-
longs to the interval guarding it. The output delay re-
flects the number of time units needed for the output
to be produced after the input has been applied. Note
that, in this case, a TFSM can accept an input while
the output to the previous one has not been produced
yet. In other words, a TFSM considered in the pa-
per implicitly contains concurrent procedures for han-
dling inputs (see Section 3 for more details), and thus
races between outputs are relevant. We assume that a
Vinarskii, E., Kushik, N., Yevtushenko, N., López, J. and Zeghlache, D.
Timed Transition Tour for Race Detection in Distributed Systems.
DOI: 10.5220/0011986700003464
In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 613-620
ISBN: 978-989-758-647-7; ISSN: 2184-4895
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
613
TFSM is (output) race-free if for each input sequence
there are no two different outputs that can be produced
at the same time instance. When deriving test suites,
we assume that the specification of the system is out-
put race-free, however, its implementations can have
output races. Further, we showcase that the behavior
of each implementation can be adequately modeled
by output delay mutants of the specification that have
the same transition graph, but output delays differ. As
the classical transition tour for FSMs is known to de-
tect many implementation faults, in this paper, we fo-
cus on studying the properties of a timed transition
tour, and its capabilities for detecting output races in
TFSM implementations (see Section 4).
As an application scenario, we consider Software
Defined Networking systems and related components.
As a system under test, we study the composition of
a controller and a switch, in order to assure that the
implemented composition is output race-free. Experi-
mental results show that output races in the SDN com-
ponents can be detected by the timed transition tour.
However, the choice of the timestamps significantly
affects the fault coverage. This position paper there-
fore, raises a number of challenges that concern an
appropriate assignment of the timestamps to the test
inputs as well as the timed transition tour fault cover-
age against other types of faults, e.g., transition/output
faults. We discuss some of these challenges and pro-
vide some findings and observations (see Section 5).
The main contributions of this work are: i) tran-
sition tour adaption for timed FSMs and its effective-
ness for detecting output races in distributed systems,
ii) experimental evaluation of the timed transition tour
for detecting output races in a real distributed system
(i.e., a programmable network framework).
2 RELATED WORK
The problem of race detection in distributed sys-
tems, as well as preventive design strategies for race-
free systems have been discussed in a number of
publications. One of the known approaches in this
case relies on the definition of a specific (partial) or-
der between possible events, i.e., the idea is to es-
tablish a happened-before
hb
relationship between
events relevant to races (Pereira et al., 2020). This
order is presented by so-called “logical clocks” C
which indicate the timestamps of the events’ execu-
tion, i.e., e
1
hb
e
2
if and only if C(e
1
) < C(e
2
). Such
Happened-Before model (Φ
hb
) has been studied, for
example, in (Wen et al., 2022) and (Pereira et al.,
2020). Given a pair (e
1
,e
2
) representing the com-
peting actions, the system is race-sensitive if there
exist such logical clocks that the following holds:
ϕ
race
= Φ
hb
(C(e
1
) = C(e
2
)). We note that there are
race detection tools, based on the Happened-Before
(HB) model. For example, in (Liu et al., 2017), the
authors present the tool DCatch, which uses the z3
SMT solver (de Moura and Bjørner, 2008) to check
whether ϕ
race
holds for each pair (e
1
,e
2
). Moreover,
in (Pereira et al., 2020), this approach is improved
by eliminating redundant events from the distributed
system without affecting the accuracy of race detec-
tion (corresponding statements are proven in (Pereira
et al., 2020)). A similar approach can be applied
for programmable networks (Lu et al., 2019), (El-
Hassany et al., 2016) and (Li et al., 2022). In particu-
lar, in (El-Hassany et al., 2016) and (Li et al., 2022)
the authors have used the HB model for SDN, and re-
lated races can be detected by the tools SDNRacer and
SPIDER. In (Vinarskii et al., 2019), the authors have
studied the advantages of model checking techniques
for proactive testing in SDN race detection.
Another possibility is to take a preventive path,
i.e., to derive the components of a distributed system
that are carefully synchronized, so that races cannot
show up (McClurg et al., 2017), (McClurg, 2021).
In (McClurg et al., 2017), an approach which inserts a
number of synchronization processes to the SDN con-
troller is proposed for avoiding races in SDN frame-
works. This approach has been improved in (Mc-
Clurg, 2021). Another prevention strategy is pro-
posed in (Rouzaud-Cornabas et al., 2010) and (Ra-
ducu et al., 2022). Namely, the authors focus on pre-
venting a so-called Time Of Check To Time Of Use
(TOCTTOU) bug. The latter means that during the
time between checking the possibility of execution
of a function f and its actual execution, no requests
changing the internal state of a program can be re-
ceived. In (Rouzaud-Cornabas et al., 2010), a race-
free formula for preventing the TOCTTOU bug is pre-
sented together with the tool that implements the pro-
posed solution.
To the best of our knowledge, there are no exist-
ing works in the area of test derivation strategies for
detecting races in Timed FSMs, and in this paper, we
study the effectiveness of the timed transition tour to
provoke races between observable actions.
3 BACKGROUND
As mentioned in the introduction, classical finite state
machines are widely used when deriving test suites
with guaranteed fault coverage for discrete event and
hybrid systems. In this section, we introduce the no-
tion of a TFSM model whose behavior depends on its
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
614
current state, the time instance when an input is ap-
plied, and the time required to process the input. A
timestamp is represented by a real number t R
+
0
,
which indicates a time instance, when the system re-
ceives an input or generates an output. A timed guard
is an interval (u,v) where u,v N
+
and u < v which
indicates the period of time when a transition is en-
abled for processing an input. An output delay (or
simply a delay) d N
+
indicates the time needed for
producing an output after receiving an input.
Let I (O) be a finite input (output) alphabet, i I
(o O) be an input (output) and t be a timestamp.
As usual, a timed input is a pair (i,t) and a timed
output is a pair (o,t). A timed input sequence is a
finite sequence α = (i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
) where the
sequence t
1
t
2
.. .t
n
is monotonically increasing. At the
same time, a timed output sequence is defined as a fi-
nite sequence β = (o
1
,t
1
)(o
2
,t
2
). ..(o
n
,t
n
) where the
sequence t
1
t
2
.. .t
n
is monotonically non-decreasing.
An untimed projection of β (α), i.e., β
O
= o
1
o
2
.. .o
n
(α
I
= i
1
i
2
.. .i
n
), denotes the sequence obtained after
deleting the timestamps (Bresolin et al., 2021).
A TFSM A is a tuple (S, I, O,G, h
S
,D, s
0
), where
I and O are finite input and output alphabets, S is a
finite non-empty set of states, G is a finite non-empty
set of timed guards, h
S
(S × I × G × O × D × S) is a
set of transitions, D is a finite non-empty set of non-
zero integer delays, s
0
is the initial state. A transi-
tion (s, i,g,o,d, s
) h
S
is denoted as s
i,g/(o,d)
s
. If
the machine receives an input i after t time units be-
ing at state s, where t g, then the machine moves
to state s
and produces output o after d time units.
Given t
0
= 0, a run of the TFSM A for a timed in-
put sequence α = (i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
) is a finite se-
quence run(A,α) = s
0
(i
1
,t
1
)
(o
1
,τ
1
)
s
1
(i
2
,t
2
)
(o
2
,τ
2
)
.. .
(i
n
,t
n
)
(o
n
,τ
n
)
s
n
such that for each j {1,...,n} there exists a transi-
tion s
j1
i
j
,g
j
/(o
j
,d
j
)
s
j
of A for which t
j
t
j1
g
j
,
and τ
j
= t
j
+ d
j
. If t
j
t
j1
g
j
for each j, we say
that the timed input sequence α is enabled for A. No-
tation s
j1
(i
j
,t
j
)
(o
j
,τ
j
)
s
j
means that if the machine being
at state s
j1
receives input i
j
at time instance t
j
, then
the machine immediately moves to state s
j
and pro-
duces o
j
at time instance τ
j
= t
j
+ d
j
. TFSM A is
deterministic if for every two transitions s
i,g/(o,d)
s
and s
i,g
/(o
,d
)
s
′′
it holds that g g
=
/
0.
In order to get an output reaction for α =
(i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
), the timed outputs of the se-
quence (o
1
,τ
1
)(o
2
,τ
2
). ..(o
n
,τ
n
) where τ
1
= t
1
+ d
1
,
τ
2
= t
2
+ d
2
, . . . , τ
n
= t
n
+ d
n
are ordered in such a
way that the timed instances are not decreasing. Note
that even a deterministic TFSM can produce several
output reactions for a timed input sequence. Con-
sider TFSM M
4
e
1
shown in Fig. 2a
1
and timed in-
put sequence α
1
= (i
1
,1.5)(i
1
,3.0)(i
1
,4.5) which is
enabled for M
4
e
1
. TFSM M
4
e
1
produces both timed
output sequences β
1
= (o
2
,4.0)(o
1
,5.5)(o
3
,5.5) and
β
2
= (o
2
,4.0)(o
3
,5.5)(o
1
,5.5), i.e., β
1
O
= o
2
o
1
o
3
and β
2
O
= o
2
o
3
o
1
, due to the competition of two out-
puts that can be produced at the same time instance.
In this case, we say that a TFSM is output/output (or
simply, output) race-sensitive
2
. Note that there is a
known criterion (Vinarskii and Zakharov, 2018) for
checking if a TFSM is output race-free.
Thus, given a deterministic TFSM A,
α = (i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
) with a run run(A,α) =
s
0
(i
1
,t
1
)
(o
1
,τ
1
)
s
1
(i
2
,t
2
)
(o
2
,τ
2
)
.. .
(i
n
,t
n
)
(o
n
,τ
n
)
s
n
, an output reaction
of A for α is a set of all such permutations of
(o
1
,τ
1
)(o
2
,τ
2
). ..(o
n
,τ
n
) that the time instances are
not decreasing, written out(A, α). Let out(A, α)
O
be a set of all output projections for timed out-
put sequences of out(A,α). Coming back to the
TFSM M
4
e
1
and α
1
= (i
1
,1.5)(i
1
,3.0)(i
1
,4.5),
the output reaction out(M
4
e
1
,α
1
) =
{(o
2
,4.0)(o
1
,5.5)(o
3
,5.5), (o
2
,4.0)(o
3
,5.5)(o
1
,5.5)}.
Differently from a classical TFSM (Bresolin et al.,
2021), for the TFSM considered in this paper, the
next timed input can be applied before the machine
has produced an output to the previous one. Con-
sider a TFSM S in Fig. 1. If the machine is at state
s
0
then input i
1
can be processed if and only if it is
applied at time instance t
1
(1,2). Therefore, the
transition s
0
i
1
,(1,2)/(o
1
,5)
s
1
is enabled for timed input
(i
1
,1.3) and S moves to state s
1
. Output o
1
will be
produced 5 time units later, i.e., there will be a timed
output (o
1
,6.3). Transition s
1
i
2
,(1,2)/(o
2
,2)
s
0
is en-
abled for the timed input (i
2
,2.6) due to the fact that
2.6 1.3 (1,2), and S moves to state s
0
. Output o
2
is produced 2 time units after applying the input i
2
,
i.e., there is a timed output (o
2
,4.6). The run of the
TFSM S on α = (i
1
,1.3)(i
2
,2.6) is r = run(S ,α) =
s
0
(i
1
,1.3)
(o
1
,6.3)
s
1
(i
2
,2.6)
(o
2
,4.6)
s
0
, and the timed output se-
quence is β = (o
2
,4.6)(o
1
,6.3), i.e., β
O
= o
2
o
1
.
With this example, we illustrate the key difference
between the TFSM of interest and timed machines
considered in (Bresolin et al., 2021). Being at state
s and receiving a timed input (i,t), the TFSM imme-
diately moves to state s
simultaneously running the
procedure f in order to handle the input i and compute
the output o; the execution of the procedure f takes d
1
We will further explain the notation M
4
e
1
.
2
A formal definition is introduced in Section 4.1.
Timed Transition Tour for Race Detection in Distributed Systems
615
time units (an output delay). The next input can be ap-
plied when the TFSM has not produced the output yet.
In this case, the TFSM moves immediately to the next
state s
′′
and simultaneously runs the procedure f
in
order to compute the output o
in a parallel way with
the procedure f which is not finished yet. Therefore,
the TFSM considered in the paper implicitly contains
concurrent procedures and thus, represents potential
output races in distributed systems. We say that the
TFSM is feasible if the number of procedures running
in parallel is always finite. We refer to (Vinarskii and
Zakharov, 2020), for checking if a TFSM is feasible.
We hereafter consider only feasible TFSMs.
4 TRANSITION TOUR FOR
TFSMs AND OUTPUT RACES
In MBT, the behavior of the specification and an Im-
plementation Under Test (IUT) is described by the
same model; in our case a TFSM. We would like
to check whether there exists an IUT for which two
outputs could be produced at the same time instance,
i.e., two outputs could compete. In this section, we
formally introduce the notion of an output/output (or
simply output) race for a TFSM and consider so-
called output delay mutants of the specification TFSM
which can have such races. As a transition tour of the
specification that is a classical FSM is known to detect
many transition and output faults, we define a timed
transition tour for a TFSM and analyze its effective-
ness with respect to output delay mutants. As an ap-
plication scenario, we utilize a distributed networking
system represented by an SDN framework.
4.1 Output Races in TFSMs
Given a TFSM A, a timed input sequence α =
(i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
) such that α is enabled for A
with a run run(A,α) = s
0
(i
1
,t
1
)
(o
1
,τ
1
)
s
1
(i
2
,t
2
)
(o
2
,τ
2
)
.. .
(i
n
,t
n
)
(o
n
,τ
n
)
s
n
, we say that α provokes an output race if there exist
k, m {1,. .. ,n} such that o
k
̸= o
m
and τ
k
= τ
m
. This
means that o
k
and o
m
compete to be produced, and can
be produced in any order: ...o
k
o
m
.. . or ...o
m
o
k
.. ..
TFSM A is output race-sensitive if there exists an in-
put sequence α which provokes an output race in A.
Otherwise, A is output race-free.
As an example of an output race-sensitive TFSM,
consider a machine M
4
e
1
shown in Fig. 2a. α
1
=
(i
1
,1.5)(i
1
,3.0)(i
1
,4.5) is enabled for M
4
e
1
with the
corresponding r = q
0
(i
1
,1.5)
(o
1
,5.5)
q
1
(i
1
,3.0)
(o
2
,4.0)
q
2
(i
1
,4.5)
(o
3
,5.5)
q
1
. Therefore, τ
1
= τ
3
= 5.5, and o
1
and o
3
com-
pete to be produced at time instance 5.5, i.e., α
1
pro-
vokes an output race in TFSM M
4
e
1
. Similarly, α
2
=
(i
1
,1.3)(i
1
,2.6)(i
1
,3.9)(i
2
,5.3) provokes an output
race between o
1
and o
2
at time instance τ
1
= τ
4
= 6.3
in TFSM M
1
e
4
shown in Fig. 2b as r
= q
0
(i
1
,1.3)
(o
1
,6.3)
q
1
(i
1
,2.6)
(o
2
,3.6)
q
2
(i
1
,3.9)
(o
3
,4.9)
q
1
(i
1
,5.3)
(o
2
,6.3)
q
0
.
Note that races and in particular, output races can
appear in both, software specifications and implemen-
tations. In this section, we assume that the specifica-
tion is always race-free and an output race can hap-
pen only in an implementation. In other words, we
do not focus on the specification validation, but rather
study test generation strategies where the test purpose
is to detect as many output races as possible. Fol-
lowing classical model based testing approaches, we
assume that the specification and its implementation
can be modeled by the same formalism, which in our
case, are TFSMs, and we further discuss output race-
sensitive implementations which can be obtained as
output delay mutants of the specification.
4.2 Output Delay Mutants of TFSMs
Let Spec = (S,I,O,G,h
S
,D, s
0
) be an initially
connected race-free specification TFSM and e =
s
i,(u,v)/(o,d)
s
h
S
be one of its transitions. Con-
sider also such a delay d
N
+
that d ̸= d
and a
TFSM M
d
e
(Spec) = (S,I,O,G, h
S
,D
,s
0
) where h
S
differs from h
S
in one transition only, i.e., h
S
\h
S
=
{e
} where e
= s
i,(u,v)/(o,d
)
s
h
S
. In other words,
M
d
e
(Spec) has a transition e
instead of e. We refer
to M
d
e
(Spec) as to a first order output delay mutant
of the TFSM Spec.
s
0
s
1
s
2
e
1
= i
1
,(1,2)/(o
1
,5)
e
2
= i
1
,(1,2)/(o
2
,1)
e
3
= i
1
,(1,2)/(o
3
,1)
e
4
= i
2
,(1,2)/(o
2
,2)
Figure 1: Running example, TFSM S.
By definition, the following statement holds.
Proposition 1. Given Spec, M
d
e
(Spec) and a timed
input sequence α, it holds that 1) α is enabled for
Spec if and only if α is enabled for M
d
e
(Spec) and 2)
Spec is feasible if and only if M
d
e
(Spec) is feasible.
Proposition 1 assures that any sequence which can
be applied to Spec, can be also applied to its mutant
M
d
e
(Spec). Moreover, the number of parallel proce-
dures for computing the output reaction of M
d
e
(Spec)
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
616
is finite and guaranteed by the same property of Spec.
q
0
q
1
q
2
e
1
= i
1
(1,2)/(o
1
,4)
e
2
= i
1
,(1,2)/(o
2
,1)
e
3
= i
1
,(1,2)/(o
3
,1)
e
4
= i
2
,(1,2)/(o
2
,2)
(a) M
4
e
1
= M
4
e
1
(S)
q
0
q
1
q
2
e
1
= i
1
,(1,2)/(o
1
,5) e
2
= i
1
,(1,2)/(o
2
,1)
e
3
= i
1
,(1,2)/(o
3
,1)
e
4
= i
2
,(1,2)/(o
2
,1)
(b) M
1
e
4
= M
1
e
4
(S)
Figure 2: Two mutants of TFSM Spec = S in Fig. 1.
Fig. 2 contains two first order output delay mu-
tants for the specification TFSM S in Fig. 1, for tran-
sitions e
1
and e
4
respectively. The output delay in a
mutated transition of M
1
e
4
is 1, while the output delay
in S is 2. TFSM M
4
e
1
is another mutant of S where
the delay d in e
1
is assigned to 4 instead of 5.
Given α = (i
1
,t
1
)(i
2
,t
2
). ..(i
n
,t
n
) which is en-
abled for Spec and M
d
e
= M
d
e
(Spec) which is an
output race-sensitive mutant, according to Proposi-
tion 1, α is enabled for M
d
e
. We say that α de-
tects a race-sensitive mutant M
d
e
if α provokes a race
in M
d
e
, i.e., out(M
d
e
,α)
O
is not a singleton. To
check that α detects an output race in M
d
e
, during
testing, we need to observe the set out(M
d
e
,α)
O
which can contain several output sequences. There-
fore, when applying α, to observe all possible outputs
of out(M
d
e
,α)
O
, we need to rely on the all weather
conditions assumption (Milner, 1980). This means
that a tester possesses enough resources to apply as
many times as needed the sequence α, for the imple-
mentation to show all possible output reactions
3
.
In our example, both race-sensitive mu-
tants M
4
e
1
and M
1
e
4
of S (Fig. 2) are de-
tected by α
1
= (i
1
,1.5)(i
1
,3.0)(i
1
,4.5) and
α
2
= (i
1
,1.3)(i
1
,2.6)(i
1
,3.9)(i
2
,5.3), accord-
ingly. A timed test suite (TTS) for TFSM Spec is a set
of timed input sequences which are enabled for Spec.
TTS detects a race-sensitive implementation whose
behavior is described by a mutant M
d
e
, if there exists
such α in TTS that detects M
d
e
.
4.3 TTT for Detecting Output Races
Given the specification TFSM Spec and α =
(i
1
,t
1
). ..(i
n
,t
n
) which is enabled for Spec with a run
3
In our particular case, it is enough that two output re-
actions are shown to conclude there exists a race.
s
0
(i
1
,t
1
)
(o
1
,τ
1
)
s
1
(i
2
,t
2
)
(o
2
,τ
2
)
.. .
(i
n
,t
n
)
(o
n
,τ
n
)
s
n
, we say that α cov-
ers a transition e = s
i,(u,v)/(o,d)
s
h
S
if there ex-
ists such j {1, ...,n} that s
j1
= s, s
j
= s
, i
j
= i
and t
j
t
j1
(u,v). A timed transition tour (TTT)
for Spec is a finite set of timed input sequences en-
abled for Spec such that for each transition e of Spec,
the TTT contains a timed input sequence which cov-
ers e. When deriving input sequences of the TTT
it is not only crucial to carefully choose each in-
put, but also each time instance when the corre-
sponding input should be applied. TTT does not
introduce any optimality constraints on the deriva-
tion of the test sequences. Consider S in Fig. 1 and
two TTTs ts
1
= {(i
1
,1.3)(i
1
,2.6)(i
1
,3.9)(i
2
,5.3)}
and ts
2
= {(i
1
,1.3)(i
1
,2.6)(i
1
,3.9), (i
1
,1.3)(i
2
,2.6)}.
The second test suite is longer and requires a tester
to submit one RESET between two test sequences.
Unlike a classical transition tour, in TTT we have
the freedom in choosing timestamps. For example,
ts
1
= {(i
1
,1.4)(i
1
,2.7)(i
1
,4.0)(i
2
,5.4)} is a TTT for
S which is different from ts
1
only in timestamps.
4.4 Experimenting with SDN and
Related Output Races
In this section, we study the timed transition tour ef-
fectiveness for a real use-case. We consider an SDN
framework as a system under test and check which
output races can be detected and also discuss some
hardly detectable races and related TFSM mutants.
4.4.1 SDN Application Scenario
We performed a preliminary experimental study
where the IUT was a composition of two network en-
tities. Namely, we studied SDN that allows imple-
menting a user request on a data plane, when the SDN
controller pushes the rules of interest to the switches
(forwarding devices) (McKeown et al., 2008). In our
case, the IUT is the composition of the ONOS con-
troller (McKeown et al., 2008) and an Open vSwitch.
The main test objective was to assure the absence of
output races in such composition, that can provoke
potential misconfigurations on the data plane.
In order to derive the TFSM modeling the system
behavior, we chose a very simple external applica-
tion which can only request to push two flow rules.
Those are: the flow rule 1 (2) with the priority 40001
(40002) and the hard
4
timeout of 5 (2) seconds. The
SDN controller receives these requests and pushes the
4
The number of time units (after installation) for the
flow rule to expire in the switch.
Timed Transition Tour for Race Detection in Distributed Systems
617
rules to the switch. Therefore, our IUT is assumed
to have only two inputs: PostFlow1 (p f 1) and Post-
Flow2 (p f 2) and two outputs: FlowExpire1 ( f e1) and
FlowExpire2 ( f e2). The input p f 1 (p f 2) indicates
that the composition receives a request for adding the
flow rule with the priority 40001 (40002) and the hard
timeout 5 (2). The output f e1 ( f e2) indicates that the
composition produces a message of expiring a flow
rule with the priority 40001 (40002). We derived the
output race-free specification TFSM with four states
(Fig. 3). This specification is output race-free and
contains the following states: empty the initial state,
F1 (F2) the state that indicates that the flow rule 1
(2) has been received (to be pushed to the flow table),
and F1 F2 the state that indicates that the switch
has received the request to push both flow rules to its
flow table.
Figure 3: TFSM Spec.
We focus on output race detection related to the
flow rule expiration which is why we are interested in
expressing the states related to changes in the switch
flow table. The output delays reflect the hard timeouts
of the rules, accordingly. As an example, consider a
transition F1
p f 2(1,2)/( f e2,2)
F1 F2, which means that
if the SDN controller and switch composition resides
at state F1 while receiving p f 2 after t (1,2) sec-
onds, then it moves to state F1 F2, i.e., the second
rule is added and this rule will expire in 2 seconds.
We derived a timed transition tour
ts = {α
1
,α
2
}, where α
1
= (p f
1
,2)(p f
2
,4)(p f
1
,5.5)
(p f
2
,7.5)(p f
1
,10)(p f
1
,12) and α
2
=
(p f
2
,2)(p f
2
,8)(p f
2
,11)(p f
1
,13)(p f
1
,14.5)(p f
2
,20).
The set of expected output responses is
{β
1
O
,β
2
O
} where β
1
O
= f e
2
f e
1
f e
2
f e
1
f e
1
f e
1
and β
2
O
= f e
2
f e
2
f e
2
f e
1
f e
1
f e
2
. The test cases
were executed against the ONOS controller com-
posed with an Open vSwitch by pushing the flow
rules via the REST interface. For that matter, we
ran the Mininet (de Oliveira et al., 2014) simulator.
Experiments were performed on a virtual machine
running on Ubuntu 20.04 LTS with 16GB of RAM.
All scripts utilized in the experimental setup are
accessible via (Vinarskii, 2023).
The timed transition tour ts was executed two
times. The output reaction of the IUT to the test suite
ts was different. On the first execution we observed
that the flow rule with the priority 40001 expired be-
fore the flow rule with the priority 40002. On the sec-
ond execution we noticed that the flow rule with the
priority 40002 expired before the flow rule with the
priority 40001
5
. Therefore, we can conclude that the
SDN composition is output race-sensitive.
4.4.2 Hardly Detectable Output Delay Mutants
and TTT Fault Coverage
As shown above, TTT seems to provide good prac-
tical results when it comes to output delay faults that
provoke races. However, it is interesting to check how
often these faults are detected by the TTT
6
. In order
to evaluate such fault coverage, for the specification
of the SDN framework, considered above (Fig. 3), we
generated a set of first order output delay mutants and
verified if they can be detected by various TTTs. Note
that output delays in the specification TFSM represent
the expiration time for flow rules in the SDN frame-
work, therefore it is interesting to see if this hard time-
out for a rule is implemented wrongly but “not far
away” from the original value. In other words, we
propose a slight change of these values, i.e., d
= d ±1
for a transition e = s
i,g/(o,d)
s
in M
d
e
(Spec).
Table 1: TTT fault coverage.
Transition in Spec Mutation
Does ts de-
tect races ?
Does ts
de-
tect races ?
empty
p f
1
(1,9)/( f e
1
,5)
F
1
d
= 4 Yes No
F
1
p f
1
(1,9)/( f e
1
,5)
F
1
d
= 4 No No
F
1
p f
2
(1,3)/( f e
2
,2)
F
2
d
= 3 Yes No
F
2
p f
1
(1,9)/( f e
1
,5)
F
1
d
= 4 No No
F
2
p f
1
(1,2)/( f e
1
,5)
F
1
F
2
d
= 4 No No
F
1
F
2
p f
1
(2,9)/( f e
1
,5)
F
1
d
= 4 No No
F
1
F
2
p f
1
(1,2)/( f e
1
,5)
F
1
F
2
d
= 4 Yes Yes
F
1
F
2
p f
2
(1,3)/( f e
2
,2)
F
1
F
2
d
= 3 Yes Yes
In this way, we mutated several transitions
of Spec and obtained output race-sensitive mu-
tants. In total, 8 mutated output race-sensitive
TFSMs were obtained, that are shown in Ta-
ble 1. We considered various TTTs and ob-
served some interesting results. For the test suite
ts = {α
1
,α
2
} that detected output races in the ONOS
5
See (Vinarskii, 2023) for a detailed description.
6
It is possible to show that the TTT does not deliver a
complete test suite against output delay faults.
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
618
framework, with α
1
= (p f
1
,2)(p f
2
,4)(p f
1
,5.5)
(p f
2
,7.5)(p f
1
,10)(p f
1
,12) and α
2
=
(p f
2
,2)(p f
2
,8)(p f
2
,11)(p f
1
,13)(p f
1
,14.5)(p f
2
,20),
only 4 mutants were detected. Another test suite is
ts
= {α
1
,α
2
} where α
1
= (p f
1
,2)(p f
2
,4.5)(p f
1
,6)
(p f
2
,8)(p f
1
,10)(p f
1
,12) and α
2
=
(p f
2
,2)(p f
2
,7)(p f
2
,11)(p f
1
,13)(p f
1
,14.5)(p f
2
,20)
which differs from ts only in timestamps. However,
ts
detected only 2 mutants. These experiments show-
case the importance of properly choosing timestamps
to detect output races (Section 5).
The latter confirms our assumption that a slight
change in the output delay value can significantly af-
fect the fault coverage of the timed transition tour. As
not many mutants of this kind were detected, we refer
to them as hardly detectable output delay mutants.
5 DISCUSSIONS
On Choosing Timestamps for a TTT. According
to the definition of the timed transition tour, there can
be different sets of finite timed input sequences that
contain exactly the same untimed input sequences.
Indeed, different timestamps for each input i to be
applied next, provide different test cases which have
different abilities for detecting output races. The
question therefore arises: to traverse a transition e =
s
i,(u,v)/(o,d)
s
h
S
, how should we choose the times-
tamp t for the input i? Shall we get closer to the lower
or the upper bound of the interval (u,v) or would it
be better to consider its mean value? This choice can
have impact on the fault coverage.
Consider again the specification TFSM S in
Fig. 1 and two output race-sensitive implementa-
tions in Fig. 2. Let ts = {α
1
,α
2
} (ts
= {α
1
,α
2
})
be the timed transition tour where all timestamps
are close to the lower bound (mean value) of the
interval. That is, α
1
= (i
1
,1.3)(i
1
,2.6)(i
1
,3.9),
α
2
= (i
1
,1.3)(i
1
,2.6)(i
1
,3.9)(i
2
,5.3),
α
1
= (i
1
,1.5)(i
1
,3.0)(i
1
,4.5) and α
2
=
(i
1
,1.5)(i
1
,3.0)(i
1
,4.5)(i
2
,6.0), i.e., ts and ts
con-
tain the same set of untimed input sequences. Due to
the fact that out(M
1
e
4
,α
2
)
O
= {o
2
o
3
o
1
o
2
,o
2
o
3
o
2
o
1
},
ts detects an output race in M
1
e
4
(o
1
and o
2
compete to be produced), however, ts cannot
detect an output race in M
4
e
1
. Similarly, as
out(M
4
e
1
,α
1
)
O
= {o
2
o
3
o
1
,o
2
o
1
o
3
}, ts
detects
an output race in M
4
e
1
(o
1
and o
3
compete to be
produced), however, ts
cannot detect an output race
in M
1
e
4
. Finally, note that ts
′′
= {α
1
,α
2
} detects
output races in both mutants. Thus, even a slight
change in one or several timestamps can affect the
overall race detection capability of the TTT.
It seems that the timestamps should be chosen
based on the types of mutants which we want to de-
tect. If we want to provoke a race and push some out-
puts to permutate in the implementation, we need to
“play” over the output delays d and d
for two of them
to be produced at the same time. This knowledge can
probably allow backtracking of some feasible times-
tamps but this is out of the scope of this position pa-
per. Instead, this is a challenge for future work.
On Detecting Output Faults by a TTT. Transi-
tion tour of a classical untimed FSM is capable to
detect all output faults, i.e., whenever a transition
(s,i, o,s
) is changed in an implementation to the tran-
sition (s,i, o
,s
), o ̸= o
such a fault will always be
detected by a transition tour (guaranteed fault cover-
age). Naturally, the question arises: does a TTT detect
all output faults as well? Interestingly, this is not nec-
essarily the case which can simply be illustrated by
using a counterexample shown in Fig. 4.
s
0
s
1
e
1
= i
1
,(1, 3)/(o
1
,3)
e
2
= i
2
,(1, 3)/(o
2
,1)
(a) TFSM S
1
q
0
q
1
e
1
= i
1
,(1, 3)/(o
2
,3)
e
2
= i
2
,(1, 3)/(o
1
,1)
(b) M
o
2
,o
1
e
1
,e
2
(S
1
)
Figure 4: Specification TFSM and its output fault mutant.
A timed input sequence α = (i
1
,2.0)(i
2
,4.0) cov-
ers all transitions in S
1
, i.e., {α} is a timed transition
tour for S
1
. Nevertheless, α does not detect a wrong
implementation described by a second order output
fault mutant shown in Fig. 4, because out(S
1
,α)
O
=
out(M
o
2
,o
1
e
1
,e
2
,α)
O
= {o
1
o
2
,o
2
o
1
}. It is crucial that in
the counterexample above, the mutant contains not a
single, but multiple faults, i.e., outputs for two tran-
sitions are mutated (second order mutant). Therefore,
differently from classical FSMs, output fault detec-
tion does not hold for a timed transition tour for mul-
tiple faults. On the contrary, it can be shown that sin-
gle output faults can always be detected by any timed
transition tour. A corresponding proposition can be
proven by contradiction. It is thus interesting to study
other capabilities of the timed transition tour. Our pa-
per focuses on specific output delay mutants that rep-
resent output races, but does not consider, for exam-
ple, transition faults, or faults in time intervals. This
is another open challenge left for future work.
Timed Transition Tour for Race Detection in Distributed Systems
619
6 CONCLUSION
This paper adapted the transition tour test generation
strategy for timed FSMs, with the test purpose of de-
tecting output races in implementations. Such faulty
implementations can be represented by first order out-
put delay mutants of the specification TFSM. To es-
timate the fault coverage of the timed transition tour
against output race detection in distributed systems,
we performed a preliminary experimental study. In
particular, an SDN framework was considered as a
case study. As a result, we observed races between
the flow rules in the ONOS controller. The order
of the flow rule expiration in ONOS implementa-
tion can differ from the specified order and the timed
transition tour detects this difference. This work-in-
progress raises a number of research challenges. For
future work, we plan to further study how to properly
choose the timestamps in the timed transition tour.
The TTT test suite completeness should also be thor-
oughly studied, against races and other types of faults,
for various types of distributed systems. Finally, TTT
should be compared with other test generation strate-
gies with respect to performance and fault coverage,
and we plan to make such comparison as well.
REFERENCES
Baier, C. and Katoen, J. (2008). Principles of model check-
ing. MIT Press.
Benharrat, N., Gaston, C., Hierons, R. M., Lapitre, A., and
Gall, P. L. (2017). Constraint-based oracles for timed
distributed systems. In Testing Software and Systems
- 29th IFIP WG 6.1 International Conference, pages
276–292.
Bresolin, D., El-Fakih, K., Villa, T., and Yevtushenko, N.
(2021). Equivalence checking and intersection of de-
terministic timed finite state machines. Formal Meth-
ods Syst. Des., 59(1):77–102.
de Moura, L. M. and Bjørner, N. S. (2008). Z3: an efficient
SMT solver. In Tools and Algorithms for the Con-
struction and Analysis of Systems, 14th International
Conference, pages 337–340.
de Oliveira, R. L. S., Schweitzer, C. M., Shinoda, A. A.,
and Prete, L. R. (2014). Using mininet for emulation
and prototyping software-defined networks. In 2014
IEEE Colombian Conference on Communications and
Computing, pages 1–6.
El-Hassany, A., Miserez, J., Bielik, P., Vanbever, L., and
Vechev, M. T. (2016). Sdnracer: concurrency analysis
for software-defined networks. In Proceedings of the
37th ACM SIGPLAN, pages 402–415.
Li, A., Padhye, R., and Sekar, V. (2022). Spider: A prac-
tical fuzzing framework to uncover stateful perfor-
mance issues in sdn controllers. https://doi.org/10.
48550/arXiv.2209.04026.
Liu, H., Li, G., Lukman, J. F., Li, J., Lu, S., Gunawi, H. S.,
and Tian, C. (2017). Dcatch: Automatically detecting
distributed concurrency bugs in cloud systems. In Pro-
ceedings of the Twenty-Second International Confer-
ence on Architectural Support for Programming Lan-
guages and Operating Systems, pages 677–691.
Lu, G., Xu, L., Yang, Y., and Xu, B. (2019). Predictive anal-
ysis for race detection in software-defined networks.
Sci. China Inf. Sci., 62(6):62101:1–62101:20.
Lynch, N. A. and Tuttle, M. R. (1989). An introduction to
input/output automata. CWI quarterly, 2:219–246.
McClurg, J. (2021). Correct-by-construction network pro-
gramming for stateful data-planes. In SOSR’21: The
ACM SIGCOMM Symposium on SDN Research, Vir-
tual Event, pages 66–79.
McClurg, J., Hojjat, H., and Cern
´
y, P. (2017). Synchro-
nization synthesis for network programs. In Com-
puter Aided Verification - 29th International Confer-
ence, pages 301–321.
McKeown, N., Anderson, T. E., Balakrishnan, H., Parulkar,
G. M., Peterson, L. L., Rexford, J., Shenker, S., and
Turner, J. S. (2008). Openflow: enabling innovation in
campus networks. Comput. Commun. Rev., 38(2):69–
74.
Milner, R. (1980). A Calculus of Communicating Systems.
Springer.
Pereira, J. C., Machado, N., and Pinto, J. S. (2020). Test-
ing for race conditions in distributed systems via SMT
solving. In Tests and Proofs - 14th International Con-
ference, TAP@STAF 2020, pages 122–140.
Raducu, R., Rodr
´
ıguez, R. J., and
´
Alvarez, P. (2022). De-
fense and attack techniques against file-based TOC-
TOU vulnerabilities: A systematic review. IEEE Ac-
cess, 10:21742–21758.
Rouzaud-Cornabas, J., Clemente, P., and Toinard, C.
(2010). An information flow approach for prevent-
ing race conditions: Dynamic protection of the linux
OS. In Fourth International Conference on Emerging
Security Information Systems and Technologies, pages
11–16.
Vinarskii, E. (2023). Timed transition tour for race de-
tection in distributed systems. https://github.com/
vinevg1996/Timed-Transition-Tour.
Vinarskii, E. and Zakharov, V. (2020). On some properties
of timed finite state machines. System Informatics,
pages 11–20.
Vinarskii, E. M., L
´
opez, J., Kushik, N., Yevtushenko, N.,
and Zeghlache, D. (2019). A model checking based
approach for detecting SDN races. In Testing Soft-
ware and Systems - 31st IFIP WG 6.1 International
Conference, pages 194–211.
Vinarskii, E. M. and Zakharov, V. A. (2018). On the veri-
fication of strictly deterministic behavior of timed fi-
nite state machines. In Proceedings of ISP RAS, pages
325–340.
Wen, C., He, M., Wu, B., Xu, Z., and Qin, S. (2022). Con-
trolled concurrency testing via periodical scheduling.
In 44th IEEE/ACM 44th International Conference on
Software Engineering, pages 474–486.
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
620