failures of the nodes. To do so we replace the de-
terministic times and values of the profiles with stan-
dard probability distributions. The STOCHASTIC key-
word must be added at the beginning of the profile,
then each line contains, separated by spaces, the time
distribution (either DET, UNIF, NORMAL or EXP), then
the parameters of the time distribution, then the value
distribution, and finally the parameters of the value
distribution. The following profile describes a param-
eter, e.g. a latency, that for the two first seconds is
set to its default value. At time 2 a new value for the
latency is drawn uniformly between 10ms and 20ms,
then a time instant t is drawn accordingly to an expo-
nential law of mean
1
0.05
and at time 2 + t the latency
drawn according to the normal law of mean 45ms and
standard deviation 5ms. Finally, at time
2
2 + t + 10,
the latency is drawn following to an exponential law
of mean
1
20
.
STOCHAST I C
DE T 2 U N IF 0. 0 10 0.02 0
EX P 0.0 5 N ORMAL 0.0 4 5 0 .00 5
DE T 10 EX P 20
As for non stochastic profiles, is possible to loop
a stochastic profile by adding the LOOP keyword after
STOCHASTIC. In that case, the last drawn time will be
used as a base for the loop. In the case of our example
of stochastic profile, at time 2+t +10+2, the latency
is drawn again according to the uniform law, due to
the looping of the profile.
Observed Variables and Protocol for Simulation
Observation. Before building tools to perform sta-
tistical model-checking, we introduce a protocol for
the observation of the simulation. The tool will com-
municate with the simulator, listening for a number
of observed variables that are defined for the study by
the SimGrid user, and controlling whether the simula-
tion should continue or not. These observed variables
must be initialized before the start of the simulation,
and their value may be modified by the actors during
the simulation. The communication with the simula-
tor is done by hooks on SimGrid signals; these signals
are sent at key moments of the simulation (start, end,
completion of a step). At each step of the simulation,
a line composed of the current time and the value of
each observed variable is sent to our tool; then the
simulator waits for the reply of our tool, i.e. whether
it should or not continue the simulation.
2
Note that we don’t handle time between changes as in
the original profiles that were specifying the time instants of
the changes since the start of the simulation (or of the loop).
Here, to avoid overlap of time intervals, the timing values
sampled denote the delay between two changes.
Randomness in SimGrid. SimGrid is meant to per-
form reproducible simulations of a distributed pro-
gram, yet we need different executions in order to
perform a statistical analysis. We also want to keep,
as best as we can, the reproducibility of the statisti-
cal analysis. In the SimGrid framework, the simula-
tions are made using the standard library’s Mersenne-
Twister random number generator. Calls from both
the actors (in the case of a stochastic distributed pro-
gram) and the generation of events from the profile
are redirected to the unique Mersenne-Twister ran-
dom number generator.
When performing multiple simulations in a row,
at the end of each simulation the current state of the
generator is saved to a file, to be read at the start of
the next simulation. Moreover, in the case of paral-
lel simulations, the first batch of executions is per-
formed by seeding the generator with consecutive in-
tegers. These two practices should ensure that the ran-
dom number generation avoids biases in the statistical
evaluation.
HASL. We now introduce the formalism that we
use for the statistical model-checking toolset that will
be introduced in the next paragraph. It comes from
the statistical model-checker Cosmos (Ballarini et al.,
2015) and is called the Hybrid Automata Stochas-
tic Language. A HASL formula consists of two el-
ements:
• First, an hybrid automaton that synchronizes with
the execution of the observed program (or more
generally of a Discrete Event Stochastic Process).
It permits both to select relevant paths and to
maintain indicators, using data variables evolving
along the path and the observed variables of the
distributed program;
• Second, an expression based on the data variables,
that describes the quantity to be evaluated. These
expressions include path operators, such as the
minimum and maximum values reached during an
execution, the last value, the integral over time or
the time average.
Note that the performance indices corresponding to
these expressions are conditional expectations over
the successful paths of the hybrid automaton. More
precisely, the results of the simulation count in the
computation of the value of the expression only if the
automaton reaches a final state during the execution.
Since we cannot in our tool synchronize with the sim-
ulator as precisely as we would with the Cosmos mod-
els, we have added rejecting states. If such a state is
reached, the simulation is ignored. This is equivalent
to a failed synchronization in Cosmos.
Statistical Model Checking of Distributed Programs within SimGrid
235