the requesting agent as well as dependent agents
make a temporary antecedence graph of the events
occurred during execution of checkpointing
operation. The time of this temporary logging is
overlapped with actual execution of the transaction
and checkpointing and so it does not have any extra
load for system and is therefore non blocking. The
distinctiveness of our scheme is that the checkpoint
request is distributed through all the agents in a
parallel manner. After final checkpointing, the
previous message logs and antecedence graphs are
deleted which considerably reduces the size of the
graph piggybacked on the message thereby helping
to maintain the efficiency of algorithm in scenario
where large number of agents participate in
performing a transaction. After successful
completion of checkpointing, the involved agents for
construction of new antecedence graphs may
continue from the temporarily saved antecedence
graphs.
In case of failure the recovering agents request
the BA to send the maximum length antecedence
graph. The recovering agent reconstructs its own
graph from the received last checkpointed
antecedence graph. If in self state, MA
j
decides for
checkpointing, then would call following algorithm:
Requesting Agent MA
j
send for GDj from
Dependent Agents(DA)
For each Agent
א
Ancedence graph(AG)
Create CheckAgent(CA)
MAj send a CA with temp-checkpoint
request and value 1/|GDj | to all MA
i
(
where i < j)
W=0
For each agent
א
AG
MAj receives reply to temp-check
request.
for each reply compute:
W=W + 1/|GDj|,
if W≠1then
cancel checkpointing & wait for
threshold event
else if W=1 then
At MAj and all DAs:
Save antecedence graph as
checkpoint.
Send the final checkpointed AG to
BA.
Discard suceessfully checkpointed
nodes from AG.
Continue again from temporary AG.
At BA:
Construct maximum length AG from
received AGs.
Write it to stable storage.
The checkpointed state at BA is used to provide
fault tolerance and recovery in case of agent failure.
4 PERFORMANCE ANALYSIS
AND COMPARTIVE STUDY
In proposed system multiple agents are performing
in a group. Suppose that MA
k
is related to MA
k + 1
in
antecedence graph. In the scheme given in (Khokhar
et al, 2006) as the checkpoint is not optimized the
requesting agent sends the checkpointing request to
other all the agents, if MA
k
starts the checkpointing
request, the checkpointing request distributes from
MA
k
to MA
1
through all the MA
k-1
, MA
k-2
, …, MA
2
and MA
1
. In this case, the connection between the
agent forms a message request path. So the length of
this path is n-1 that is presented as Lkt(n). In the
proposed scheme, in the most optimized form, there
is one dependent agent for the agent that request the
checkpoint and in the worst form, all the agents are
dependent to this agent. This is the same n-1 that
existed in the former scheme. So for this, the
presented average is shown as:
Lc(n) = n/2
Lc(n)/Lkt(n) = lim[ n/2 / (n-1)] = 1 / 2
Due to space limitation, we are eliminating the
detailed theoretical part.
To implement, we have used AGLETS (Lange
1998) that is a graphical interface for developing the
distributed multi-agent systems. For the suggested
scheme implementation, the tasks and the behaviour
of every agent has been made in the form of classes.
First for better verification and getting the more
enhanced results, 170 agents are defined and made
on the mobile host. Then the agents that manage
these agents are activated in order to wait for the
messages for the checkpointing. Each time some of
these 170 agents are defined as the dependent agent
and we measure the time of the checkpointing agent
with the counter that has been provided in the
graphical interface. We also test this environment
using the scheme in (Khokhar et al, 2006). In this
test a checkpoint message is sent to all the agents
without regarding their dependency to the starting
agent. Results as shown in figure 3 were obtained
after the implementation of the checkpointing part of
proposed scheme with a different list of the
dependent agents out of these 170 agents. As it can
be seen, as the number of the dependent agents is
increased in relation to the total number of agents in
group, the time increases and approaches to the
scheme in (Khokhar et al, 2006).
ANTECEDENCE GRAPH APPROACH TO CHECKPOINTING FOR FAULT TOLERANCE IN MULTI AGENT
SYSTEM
141