A Multi-robot System for Patrolling Task via Stochastic Fictitious Play
Erik Hern´andez, Antonio Barrientos, Jaime del Cerro and Claudio Rossi
Center for Robotics and Automation, Technical University of Madrid, C/ Jos´e Gutierrez Abascal 2, Madrid, Spain
Keywords:
Multi-robot System, Security Tasks, Patrolling problem, Stochastic Fictitious Play, Game Theory.
Abstract:
A great deal of work has been done in recent years on the multi-robot patrolling problem. In such
problem a team of robots is engaged to supervise an infrastructure. Commonly, the patrolling tasks are
performed with the objective of visiting a set of points of interest. This problem has been solved in the
literature by developing deterministic and centralized solutions, which perform better than decentralized and
non-deterministic approaches in almost all cases. However, deterministic methods are not suitable for security
purpose due to their predictability. This work provides a new decentralized and non-deterministic approach
based on the model of Game Theory called Stochastic Fictitious Play (SFP) to perform security tasks at
critical facilities. Moreover, a detailed study aims at providing additional insight of this learning model into
the multi-robot patrolling context is presented. Finally, the approach developed in this work is analyzed and
compared with other methods proposed in the literature by utilizing a patrolling simulator.
1 INTRODUCTION
This work addresses the activity in which a set of
robots is engaged to scan an area by visiting a set of
points for security purposes. Such activity is called
area patrolling and it is suitable to be performed in
domains where distributed surveillance or inspection
are required (Machado et al., 2003).
Currently, most of the security systems are made
of security devices and human operators that handle
such devices. In these conditions, the performance
of the system can be affected due to human beings
limitations. Furthermore, in some environments
people have to effect their job under dangerous
conditions or in hazardous scenarios. Since mobile
robots do not experience human beings limitations,
they can be applied to enhance the security systems.
The security systems that utilize mobile robots in
these types of applications have a great deal of
advantages. Moreover, the multi-robot systems can
be utilized when some tasks are too complex to be
performed by a single robot. A multi-robot system
is defined as a set of robots that operate in the same
environment (Farinelli et al., 2004).
Additionally, in recent years several works
available in the literature have been tackled a
problem defined as multi-robot patrolling, in which
a team of mobile robots performs patrolling tasks.
However, almost all the methods proposed in those
works are based on centralized and deterministic
solutions, which present vulnerability, scalability and
fault-tolerance problems.
By contrast, this work presents a decentralized
and non-deterministic approach based on SFP. To
this end, the multi-robot patrolling problem has
been formulated using concepts of Graph Theory
to represent an environment where nodes depict
specific points of interest and edges represent paths.
Since the multi-robot patrolling problem aims at
maximizing the number of visits to each node, a
good patrolling strategy must reduce the time between
two consecutive visits to the same node (Chevaleyre,
2004).
The main contributions of this work can be
summarized as follows: An analysis of the SFP model
in the multi-robot patrolling context. A detailed
study of the performance of the parameters of the
implemented model. Finally, a comparison with best
suited methods in the literature. The remainder of
this paper is organized as follows. Section 2 briefly
describes related work. Section 3 gives definitions of
game theory and introduces the multi-robot patrolling
problem. Section 4 shows the implemented model.
Section 5 presents the evaluation and experimental
results. Finally, section 6 concludes this work.
407
Hernández E., Barrientos A., del Cerro J. and Rossi C..
A Multi-robot System for Patrolling Task via Stochastic Fictitious Play.
DOI: 10.5220/0004259504070410
In Proceedings of the 5th International Conference on Agents and Artificial Intelligence (ICAART-2013), pages 407-410
ISBN: 978-989-8565-38-9
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
2 RELATED WORK
The pioneer study on the multi-robot patrolling
problem was performed by (Machado et al., 2003).
In that work, the authors define an evaluation criteria
based on idleness. The Idleness is the number
of cycles that all nodes of a graph have remained
unvisited over n simulation cycles.
Moreover, the problem of generating patrol paths
in a target area is tackled in (Elmaliach et al.,
2007). The solution presented in that work utilizes
a Spanning Tree Coverage method to find a cyclic
patrol path of minimal costs that visits all points in
an environment. When this path is obtained, a group
of robots is uniformly distributed along this path and
each robot follows the same patrol route over and
over. Finally, an approach that dividesan environment
in regions utilizing a balanced graph partitioning
approach is presented in (Portugal and Rocha, 2010).
Each of these areas is assigned to a robot that follows
a local patrolling route. The procedure to obtain this
route can perform up to four stages to find Hamilton,
Euler, longest or Non-Hamilton paths.
The good performance of the approaches
presented in those works could be explained by its
centralized and explicit coordinator scheme (Almeida
et al., 2004). However, centralized, predefined
and fixed schemes are not suitable for security
applications in some situations such as dynamic
environments, huge graphs and environments where
regions have different properties. By implementing
a learning model of Game Theory, this work differs
from others in the manner in which the multi-robot
patrolling problem was solved. The learning model
selected is called Stochastic Fictitious Play (SFP)
(Fudenberg, 1995).
3 CONCEPTS FROM GAME
THEORY
This section presents some concepts as well as some
definitions of game theory (Fudenberg, 1998). In
this work, an environment was represented as an
undirected weighted graph G, which is an ordered
pair consisting of a set of edges and a set of nodes.
Each edge depicts a path as a number corresponding
to the cost proportional to its length. To minimize
the time between two visits to the same node, robots
must interact and select an action in order to choose
the next node to visit. Taking into account this
interaction, a fixed number of normal-form games
were defined at each node of the graph.
Formally, a finite n-robot normal-form game Γ
consist of:
A finite set R of robots i = 1,...,n.
A finite set A = A
1
×·· ·×A
n
, where A
i
is the finite
set of actions available for robot i.
A finite set S = S
1
× · ·· × S
n
, where S
i
is the finite
set of strategies available for robot i.
A real-valued payoff function π = (π
1
× ·· · × π
n
),
where π
i
: S
i
7− is the payoff for robot i
Thus, at every time step, the robots that interact
in these types of games reach a node and play its
corresponding normal-form game. As a result, they
choose a strategy to select an action that maximizes
their expected payoff considering the actions selected
by all other robots, denoted by i = [1,. .. ,i 1,i +
1,. .. ,n]. This process is called best response and it
leads to the central concept of game theory, the Nash
equilibrium.
Each action selected is related to an edge that
leads the robot to the next node. Depending on
the strategy chosen, each robot can select an action
with probability one or by randomizing over the set
of available actions according to some probability
distribution. Such strategies are called pure and
mixed, respectively.
In this work, the expected payoffs were defined
as follows: let τ
i
( j) be the times that other robots
select the strategy j = [1,. .. ,k]. Therefore, the payoff
for robot i playing such strategy is defined as π
i
( j) =
|R| τ
i
( j), where |R| represents the cardinality of
the set R. In the games defined in this implementation,
the robots have no conflicting interests and their
sole challenge is to coordinate on actions that are
maximally beneficial to all.
4 STOCHASTIC FICTITIOUS
PLAY (SFP) MODEL
Stochastic Fictitious Play (SFP) (Fudenberg, 1995) is
a belief-based learning model. In SFP, robots form
beliefs about what other robots will play in the future
based on past observations. Thus, they attempt to
define processes that lead to a Nash Equilibrium by
choosing a best response strategy that maximizes their
expected payoff to their beliefs.
In the prediction of SFP, each robot i has an initial
weight function k
0
i
(s
j
i
) : S
i
+
which assigns a
real value defined by k
0
i
(s
j
i
) =
|S
i
|− j
|S
i
|
, j = 1,· ·· , k,
where |S
i
| represents the cardinality of the set S
i
which is the finite set of strategies available for robots
i.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
408
Thus, the belief that robot i assigns to the other
robots playing the strategy s
j
i
in period t is given by
B
t
i
(s
j
i
) =
α· k
t1
i
(s
j
i
) + ξ(s
j
i
(t))
γS
i
[α· k
t1
i
(s
γ
i
) + ξ(s
j
i
(t))]
(1)
where the indicator function ξ(s
j
i
(t)) assigns one to
the strategy selected in period t and zero to the other
strategies.
Once beliefs are updated, expected payoff in
period t is defined according to
E
t
i
(s
j
i
) =
B
t
i
(s
j
i
) · π
i
(s
j
i
,s
j
i
) (2)
Therefore, the best response of the robot i is given
by
BR
t
i
= argmax
j
E
t
i
(s
j
i
) (3)
Finally, the formulation of SFP produces a
distribution over the set of action of the robot i
following a smooth best response BR
i
defined by
BR
i
(σ
i
)[s
j
i
] =
exp(π
i
(s
j
i
,σ
i
)/λ)
γ
exp(π
i
(s
j
i
,σ
i
)/λ)
(4)
where λ is termed the randomization parameter.
Values of λ close to zero allow playing best response
strategies, whereas large values enable complete
randomization.
5 EXPERIMENTS AND RESULT
This section presents the experimental results
obtained by executing SFP model in a patrolling
simulator over the maps depicted in the figure 1.
Moreover, comparison results with Single Cycle
(Elmaliach et al., 2007) and MPS (Portugal and
Rocha, 2010) are shown.
First experiments aim at studying the performance
of SFP in the multi-robot patrolling context by
choosing different values for its parameters, namely
λ and α. Thus, 1900 simulations demonstrate that the
best performance of SFP was obtained when λ = 5
and α = 4.
Once the parameters of SFP were defined, next
experiments aim at comparing this algorithm with
MSP and Single Cycle. These experiments were
performed utilizing all the maps of the figure 1. As
a result, figures 2(a), 2(b), 2(c) and 2(d) show the
performance of these algorithms. In each of these
experiments, the starting position of all the robots
was defined randomly. Moreover, a new experiment
was performed when all the nodes of the map were
visited 256 times. Since this procedure was executed
Patrolling Simulator
File View
Convert
Run
Help
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
23
25
26
15
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
62
64
57
59
58
60
61
63
64
Staff
W
E
S N
Cumberland Public Library
2 Floor
nd
Legend:
WirelessAccess:
Computers:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
(a) Cumberland 2nd Floor
File View
Convert
Run
Help
Patrolling Simulator
8
0
2
7
9
14
13
10
4
3
11
12
5
1
6
18
19
20
23
22
21
16
17
15
25
28
24
29
26
37
32
38
40
30
31
42
44
43
34
27
41
39
35
36
33
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 16
17
18
19
(b) Strongly Connected
Patrolling Simulator
File View
Convert
Run
Help
10
0
1
2
3
4
5
6
7
8
9
11
12
14
13
16
17
15
18
19
11
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(c) Maze
Patrolling Simulator
File View
Convert
Run
Help
0 1 2 3
4
5
6
7
10
8 9
11
12
13 14
15 16
17
18 19
20 21
22
23
24
0 1 2 3
4
5 6
7
8
9
10
11
12
13 14 15 16
17
18 19
(d) Grid
Figure 1: Four maps to evaluate and compare the
performance of SFP with other approaches.
ten times, each point of the graphs was obtained by
calculating the average value of these simulations.
The results presented in this section show that
Single Cycle performance is slightly better than SFP,
especially in maps 1(a) and 1(c) as shown in figures
2(a) and 2(c). However, figure 2(a) shows that SFP
performs better than MSP in map 1(a). Moreover,
figures 2(b) and 2(d) show that in some cases in maps
1(b) and 1(d), SFP performs better than Single Cycle.
6 CONCLUSIONS
The multi-robot patrolling problem has received
much attention in recent years due to its applicability.
However, almost all the work presented in the
literature is concerned with centralized and
deterministic methods. Nevertheless, these types
of solutions present vulnerability, scalability, and
fault-tolerance problems. By contrast, this work
presents a decentralized and non-deterministic
approach base on the model of Game Theory called
Stochastic Fictitious Play (SFP).
The results presented in section 5 show that in
some cases of study, either Single Cycle and MSP
perform better than SFP. However, regardless of
individual cases, the results of Single Cycle and
MSP fall into the standard deviation of SFP. And
therefore, such improvement does not represent a
meaningful difference. Indeed, in some cases SFP
improves the results of Single Cycle and MSP. Such
improvement is important because SFP presents some
characteristics to highlight such as distribution and
AMulti-robotSystemforPatrollingTaskviaStochasticFictitiousPlay
409
6 8 10 15 20 25 30
2
4
6
8
10
7.8459
6.4726
5.7239
3.95
2.9906
2.5706
2.1979
Idleness
Team Size
6.3457
5.973
4.8296
3.2657
2.3802
1.6918
1.6522
9.8233
7.0855
SFP
Single Cycle
MSP
(a) Map of figure 1(a)
6 8 10 15 20 25 30
1
2
3
4
5
6
7
8
9
10
7.2865
5.7522
4.8979
3.2133
2.4534
1.966
1.7845
8.1398
4.2241
5.1172
2.981
1.9963
2.0264
1.7523
6.7832
5.8892
Idleness
Team Size
SFP
Single Cycle
MSP
(b) Map of figure 1(b)
6 8 10 15 20 25 30
0
2
4
6
8
10
12
14
16
18
13.6187
11.5974
9.1049
6.3578
4.5175
3.7175
2.6535
13.5823
9.9626
7.6736
4.8437
2.0315
3.0206
2.3883
Idleness
Team Size
SFP
Single Cycle
(c) Map of figure 1(c)
6 8 10 15 20 25 30
0.5
1
1.5
2
2.5
3
3.5
4
4.5
3.6829
2.5128
1.8983
1.3986
1.0782
0.96513
0.81629
Idleness
Team Size
2.8826
2.5144
2.08
1.2786
0.91268
1.1001
0.5828
2.9468
2.1102
SFP
Single Cycle
MSP
(d) Map of figure 1(d)
Figure 2: Performance of MSP, Single Cycle and SFP with
a different team size utilizing the maps of figure 1.
decentralization. These results provide evidence of
the suitable nature of SFP model and suggest that
the game theory principle is appropriate to perform
the multi-robot patrolling task better than centralized
methods.
In spite of the good performance of SFP,
some limitations are worth noting. For example,
SFP does not include mechanism to avoid robots
interference neither guarantee that the environment
will be completely explored in the single robot case.
Thus, future work must consider mechanisms to
avoid interference and guarantee the convergence of
the single robot case. Moreover, more cases of
study would be analyzed considering scenarios with
different characteristics such as long corridors, huge
graphs, and so forth.
ACKNOWLEDGEMENTS
This work has been supported by the Robotics
and Cybernetics Research Group at Technique
University of Madrid (Spain), and funded under the
projects “ROTOS: Multi-Robot system for outdoor
infrastructures protection”, sponsored by Spain
Ministry of Education and Science (DPI2010-17998),
and the project ROBOCITY 2030 Project, sponsored
by the Community of Madrid (S-0505/DPI/000235).
REFERENCES
Almeida, A., Ramalho, G., Santana, H., Tedesco, P.,
Menezes, T., Corruble, V., and Chevaleyre, Y. (2004).
Recent advances on multi-agent patrolling. In
Advances in Artificial Intelligence, SBIA. Springer.
Chevaleyre, Y. (2004). Theoretical analysis of the
multi-agent patrolling problem. In International
Conference on Intelligent Agent Technology. IEEE
Computer Society.
Elmaliach, Y., Agmon, N., and Kaminka, G. (2007).
Multi-robot area patrol under frequency constraints.
Annals of Mathematics and Artificial Intelligence.
Farinelli, A., Iocchi, L., and Nardi, D. (2004). Multirobot
systems: a classification focused on coordination.
Systems, Man, and Cybernetics, Part B: Cybernetics,
IEEE Transactions on.
Fudenberg, D. (1995). Consistency and cautious fictitious
play. Journal of Economic Dynamics and Control.
Fudenberg, D. (1998). The theory of learning in games. The
MIT Press.
Machado, A., Ramalho, G., Zucker, J., and Drogoul, A.
(2003). Multi-agent patrolling: An empirical analysis
of alternative architectures. Multi-Agent-Based
Simulation II.
Portugal, D. and Rocha, R. (2010). Msp algorithm:
multi-robot patrolling based on territory allocation
using balanced graph partitioning. In Proceedings
of the 2010 ACM Symposium on Applied Computing.
ACM.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
410