update of t h e reward value is done according to the
following formula:
Q
s, a
←
1∝ ts, a
Q
s, a
∝t
s, a
r γmaxQ
s
,a
The goal of the Q learning algorithm is to build the
reward function Q for each couple (s,a) according to
the result given after the use of the handler a. αt has
a value in [0,1] it refers to the learning rate.
is the actuel lisation factor, it allows the learning
agent to determinr the best reward. If the agent
needs immediate rewards so the actual lisation factor
must be near 0.
4.3 The Learning Agent’s Memory
The learning agent has to remember the historical
value of rewards for each handler in order to choose
the best handler in case where an exception appears
for the second time.
As we show in the previous paragraphs the Q
learning algorithm builds its decisions depending on
its perception of the system. But in our approach we
need a decision that depends also on the agent’s
historical experiences. As a solution we propose to
use the learning agent memory that includes the past
decisions about handlers. According to this idea the
agent will be able to recognize an action that has
appears in the past and choose its best handler.
In case where the exception has reward =0 using all
the available handlers. We give the learning agent
the authority to ask for new solutions from the
extern designer. The knowledge base will be
extended and the new added solution will be treated
as the initial ones.
5 CONCLUSIONS AND FUTURE
WORK
Throughout this paper, we have proposed an
effective approach for fault tolerance in multi-agent
systems based on learning agent. We use a formal
model for agent activities representation called
hierarchical plans. It allows the detection of errors in
a simple and automatic way. We choose Q learning
algorithm to handle exceptions. Learning agent has
the opportunity to handle exceptions according to his
experiences; to choose the most effective handler in
case where may handlers exist, and we give the
agent the ability to learn from the extern in case of
new exceptions.
Finally, we are interested in validating this work
through a simulation that can provide real results on
the effectiveness of this approach and compare it
with other approaches.
REFERENCES
Anand T. and Robert M., 2000. Exception Handling in
Agent Oriented Systems, Springer-Verlag,
Bouzahzah M. And Maamri R., 2012. A Proposed
Architecture for a Fault Tolerant Multi Agents System
Using Extern Agents, 6th International Conference,
K.E.S.-A.M.S.T.A, proceedings Springer LNAI 7327pp
282-289,
Bradley J. C., Edmund H. D., 1999. Identifying and
Resolving Conflicts among Agents with Hierarchical
Plans, American Association for Artificial Intelligence.
Brenner T., 2005. Handbook of Computational Economics
Vol: 2. Agent-Based Computational Economics,
(Handbooks in Economics Series),
Hagg S., 1996. A Sentinel Approach to Fault Handling in
Multi-Agent Systems. In: Proceedings of the Second
Australian Workshop on Distributed AI, Cairns,
Australia.
Hoet S., and Sabouret N., 2009. Apprentissage par
Reforcement d’acte de Communication Dans un
Contexte Multi agents”, RJCIA.
Kchir S., 2010. Gestion des Exceptions dans un Système
Multi Agents avec Replication, Master2, Laboratoire
d’Informatique, de Robotique et de Micro-
electronique, Montpellier.
Klein M., Dellarocas C., 1999. Exception Handling in
Agent Systems, Antonymous agents, USA.
REJEB L., 2005. Simulation Multi agents des Modèles
Economiques Vers des Systèmes Multi Agent
Adaptatif’’, Reims Champagne-Ardennes University.
Platon E., 2007. Modeling Exception Management in
Multi Agent Systems, Doctorate thesis, department of
informatics, the graduate university for advanced
studies.
Souchon F., Christophe D., Christelle U., Sylvain V.,and
Jacques F., 2002. SaGE: une proposition pour la
gestion des exceptions dans les system multi agents,
Internal repport-LIRMM-02205,
Q-learning –Wikipedia, the free encyclopaedia:
http://en.wikipedia.org/wiki/Q-learning
Watkins C. J. C. H., 1989. Learning from delayed
rewards, PhD thesis, Cambridge University,
Cambridge, United Kingdom.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
426