![](bg2.png)
2 RELATED WORK
The pioneer study on the multi-robot patrolling
problem was performed by (Machado et al., 2003).
In that work, the authors define an evaluation criteria
based on idleness. The Idleness is the number
of cycles that all nodes of a graph have remained
unvisited over n simulation cycles.
Moreover, the problem of generating patrol paths
in a target area is tackled in (Elmaliach et al.,
2007). The solution presented in that work utilizes
a Spanning Tree Coverage method to find a cyclic
patrol path of minimal costs that visits all points in
an environment. When this path is obtained, a group
of robots is uniformly distributed along this path and
each robot follows the same patrol route over and
over. Finally, an approach that dividesan environment
in regions utilizing a balanced graph partitioning
approach is presented in (Portugal and Rocha, 2010).
Each of these areas is assigned to a robot that follows
a local patrolling route. The procedure to obtain this
route can perform up to four stages to find Hamilton,
Euler, longest or Non-Hamilton paths.
The good performance of the approaches
presented in those works could be explained by its
centralized and explicit coordinator scheme (Almeida
et al., 2004). However, centralized, predefined
and fixed schemes are not suitable for security
applications in some situations such as dynamic
environments, huge graphs and environments where
regions have different properties. By implementing
a learning model of Game Theory, this work differs
from others in the manner in which the multi-robot
patrolling problem was solved. The learning model
selected is called Stochastic Fictitious Play (SFP)
(Fudenberg, 1995).
3 CONCEPTS FROM GAME
THEORY
This section presents some concepts as well as some
definitions of game theory (Fudenberg, 1998). In
this work, an environment was represented as an
undirected weighted graph G, which is an ordered
pair consisting of a set of edges and a set of nodes.
Each edge depicts a path as a number corresponding
to the cost proportional to its length. To minimize
the time between two visits to the same node, robots
must interact and select an action in order to choose
the next node to visit. Taking into account this
interaction, a fixed number of normal-form games
were defined at each node of the graph.
Formally, a finite n-robot normal-form game Γ
consist of:
• A finite set R of robots i = 1,...,n.
• A finite set A = A
1
×·· ·×A
n
, where A
i
is the finite
set of actions available for robot i.
• A finite set S = S
1
× · ·· × S
n
, where S
i
is the finite
set of strategies available for robot i.
• A real-valued payoff function π = (π
1
× ·· · × π
n
),
where π
i
: S
i
7−→ ℜ is the payoff for robot i
Thus, at every time step, the robots that interact
in these types of games reach a node and play its
corresponding normal-form game. As a result, they
choose a strategy to select an action that maximizes
their expected payoff considering the actions selected
by all other robots, denoted by −i = [1,. .. ,i − 1,i +
1,. .. ,n]. This process is called best response and it
leads to the central concept of game theory, the Nash
equilibrium.
Each action selected is related to an edge that
leads the robot to the next node. Depending on
the strategy chosen, each robot can select an action
with probability one or by randomizing over the set
of available actions according to some probability
distribution. Such strategies are called pure and
mixed, respectively.
In this work, the expected payoffs were defined
as follows: let τ
−i
( j) be the times that other robots
select the strategy j = [1,. .. ,k]. Therefore, the payoff
for robot i playing such strategy is defined as π
i
( j) =
|R| − τ
−i
( j), where |R| represents the cardinality of
the set R. In the games defined in this implementation,
the robots have no conflicting interests and their
sole challenge is to coordinate on actions that are
maximally beneficial to all.
4 STOCHASTIC FICTITIOUS
PLAY (SFP) MODEL
Stochastic Fictitious Play (SFP) (Fudenberg, 1995) is
a belief-based learning model. In SFP, robots form
beliefs about what other robots will play in the future
based on past observations. Thus, they attempt to
define processes that lead to a Nash Equilibrium by
choosing a best response strategy that maximizes their
expected payoff to their beliefs.
In the prediction of SFP, each robot i has an initial
weight function k
0
i
(s
j
−i
) : S
−i
−→ ℜ
+
which assigns a
real value defined by k
0
i
(s
j
−i
) =
|S
−i
|− j
|S
−i
|
,∀ j = 1,· ·· , k,
where |S
−i
| represents the cardinality of the set S
−i
which is the finite set of strategies available for robots
−i.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
408