(Amadio, 1997) and DπLoc, (Francalanza, 2006).
But these process algebras have no means for
expressing quantitative aspects of reliability and
resource requirements in time and space. One
extension of π-calculus with quantitative properties
is Stochastic π-calculus, Sπ, (Priami, 1995). Sπ can
via its stochastic reaction rate extension to π-
calculus quantify system performance but Sπ has no
notion of failure.
We have considered whether we should extend
e.g. π
l
or DπLoc with quantitative properties but
have abandoned doing this for the following reasons.
π
l
and DπLoc has fault detection logic, FD, a ping
“are you alive” construct, as a part of the syntax. FD
in π
l
and DπLoc cannot fail, unlike FD in real
systems (see Section 2), so we cannot use π
l
and
DπLoc to study fault detection. Another issue is that
π
l
and DπLoc have syntactical fault injection
constructs. We prefer for clarity reasons that a model
is defining a system’s functional specification,
which does not include fault injection logic.
Therefore we introduce a new (π-calculus based)
process algebra which is able to express “location
failure” and quantitative properties of reliability and
resource requirements in time and space. Our
process algebra, named Gπ-calculus, adds to π-
calculus a stochastic fail-able process group
construct for location failure. We adapt reaction
rates from (Priami, 1995) in the form of transition
time labels (see section 3) for quantitative estimation
of performance. Component based job distribution is
expressed via a distribution rule. The semantics of
Gπ is given in the form of a structural operational
semantics (Plotkin, 1981). The emphasis will,
however, be on practical applications.
The paper has the following outline. In Section 2,
we present a motivating example and give a flavour
of how the method quantitatively can estimate
reliability and descriptive statistics of resource
requirements in time and space of a simple CPU
scavenging grid. Technical details are left to Section
6. In Section 3, we give an informal introduction to
Gπ-calculus. In Section 4 we stress the
expressiveness of Gπ by presenting design patterns
for how to use it to implement advanced behaviour.
In Section 5, we present the simulator tool which
can estimate descriptive statistics of quantitative
properties of models. In Section 6 we introduce how
to model fault-tolerance techniques by elaborating
on the example from Section 2. The paper ends with
a conclusion.
2 A MOTIVATING EXAMPLE
Dependability on a system requires that the system
has predictable reliability and predictable
quantitative resource requirements in time,
(performance), and space, e.g. network size (here
defined as the number of concurrent computers
which simultaneously is having a job assigned),
number of job distributions (the number of
assignments of a computational problem to a new
computational resource) and workload (here defined
as the number of computations/reductions). With
predictable we understand that standard deviation is
relatively low in respect to the mean and that min
and max is relatively close to the mean. We define
reliability of a system as the probability that a
system service will answer an arbitrary request in
accordance with its system specification.
We shall consider an example, a volunteering
CPU grid, CPU-GRID, from High Performance
Computing. High Performance Computing, HPC, is
today applied in solving complex computation
intensive problems. One way to achieve HPC is via
a CPU-GRID. Examples of CPU-GRIDs are:
folding@- home, seti@home and world, community
grid which respectively have the following urls:
http://folding.stanford.edu/
http://setiathome.berkeley.edu/
http://www.worldcommunitygrid.org
A CPU-GRID is in its simplest form based on a
central computer, a grid master, which has a set of
volunteering computers, CPUs, to which it can
delegate/schedule computational problems.
Volunteering computers can usually join the grid by
downloading and installing a screensaver which will
connect the volunteering computer to the grid. When
a grid master receives a job request by a grid user it
will usually break the job request into sub-problems
which it will delegate to volunteering CPUs. When a
sub-problem is solved the volunteer CPU will return
the sub-result to the grid master. The grid master
will assemble all sub-results into one final result and
return it to the user. The owner of a volunteer CPU
can at any time chose to (temporarily or indefinitely)
disconnect his CPU from the grid. From the point of
view of the grid master then this disconnection can
be considered as the failure of the sub-problem
delegated to that CPU. To insure that the CPU-
GRID can achieve reliable computing with such
unreliable computational resources it needs a fault-
tolerance strategy for handling the failure of
volunteering CPUs.
ICSOFT 2008 - International Conference on Software and Data Technologies
46