Authors:
Mathieu Lelerre
and
Abdel-Illah Mouaddib
Affiliation:
Université de Caen Normandie, France
Keyword(s):
Behavior, Recognition, MDP, Reinforcement Learning.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolution Strategies
;
Evolutionary Computing
;
Evolutionary Robotics and Intelligent Agents
;
Soft Computing
Abstract:
The coordination between cooperative autonomous agents is mainly based on knowing or estimating the behavior policy of each others. Most approaches assume that agents estimate the policies of the others by considering the optimal ones. Unfortunately, this assumption is not valid when we face the coordination problem between semi-autonomous agents where an external entity can act to change the behavior of the agents in a non-optimal way. We face such problems when the external entity is an operator guiding or tele-operating a system where many factors can affect the behavior of the operator such as stress, hesitations, preferences, ... In such situations the recognition of the other agent policies become harder than usual since considering all situations of hesitations or stress is not feasible.
In this paper, we propose an approach able to recognize and predict future actions and behavior of such agents when they can follow any policy including non-optimal ones and different hesi
tations and preferences cases by using online learning techniques. The main idea of our approach is based on estimating, initially, the policy by the optimal one then we update it according to the observed behavior to derive a new estimated policy. In this paper, we present three learning methods of updating policies, show their stability and efficiency and compare them with existing approaches.
(More)