and Kautz, 2001). A large body of past research in
general optimization techniques such as gradient de-
scent and simulated annealing (Glover and Kochen-
berger, 2003) has also shown that the inclusion of ran-
dom processes can be important to achieving global
maxima/minima over local maxima/minima. Ro-
botic control which is purely reactive (like gradi-
ent following) can lead to the same type of local
maxmima/minima issues. The addition of random-
ness or noise into robotic motion has been shown to
help to avoid this (Balch and Arkin, 1994) but the
strategy used to determine the amount of random mo-
tion is generally fixed (ie not locally adaptive) and
therefore not well suited to time-dependent problems
requiring adaption to changing or uncertain environ-
mental conditions.
The main contribution of this paper is the devel-
opment of a simulation model based on a novel ap-
proach for representing local self-tuning or adaption
strategies within the context of optimization of multi-
robot search in complex and dynamic domains. A key
aim of the simulation model is to be able to study
in detail the use and effect of local strategies which
vary the degree to which reactive behaviour to marker
trails and reinforcement of marker trails should dom-
inate over random motions. The random walk model
used in this paper can be compared to a markov de-
cision process and has influences from reinforcement
learning theory (Kaelbling et al., 1994). This paper is
structured as follows: First a brief summary is given
of related work, then the numerical model is defined
and a set of initial results is presented. A summary
and future work section then describes some of the
aims and proposed uses of the simulation model.
2 RELATED WORK
A large body of research in AI life applied to opti-
mization and guidance problems has made use of in-
direct communication techniques based on social in-
sects such as trail laying. This has led to the term
synthetic pheromone describing data structures in-
spired by chemical markers called pheromones from
biological systems. Research in collective robotics
(Holland and Melhuish, 1999) has made substantial
use of stigmergy and other concepts from AI life re-
search. Approaches based on such techniques can
a provide robust and adaptive indirect coordination
mechanism for collaborating entities such as robots
(Wagner et al., 1999). Multiple robots using such
techniques are particularly efficient for tasks such
as mapping unknown terrain which are well suited
to being performed collectively. Each robot needs
only relatively simple functionality to achieve com-
plex group behaviour. This reduces the complexity
and cost of each robot. Different approaches to navi-
gation strategy for indoor searching have been exam-
ined by (Gonzales-Banos and Latombe, 2002). One
approach to aid search behaviour has been the use of
coverage maps (Stachniss and Burgard, 2003) which
in a similar way to maker trails can be viewed as a
form of indirect communication stored in the environ-
ment. Lately there has been increased use of multi-
ple robots to perform specialized tasks (eg search and
rescue (Baltes and Anderson, 2003)). Some common
problems with multiple robot guidance are:
(1) Pure reactive navigation often suffers from local
minima issues due to limitations such as sensor range
and/or accuracy. It has been found that a combina-
tion of goal directed behaviour and reactive behaviour
can be an effective (Balch and Arkin, 1994) strategy
but this often requires more complex robot behaviour
such as the use of path planning algorithms.
(2) centralized versus de-centralized control and
whether to use localized and indirect communica-
tion. Decentralized control and indirect communi-
cation (see (Holland and Melhuish, 1999), (Wagner
et al., 1999)) can be very useful in complex and dy-
namic environments where the environment is spa-
tially/geographically complex or where the positions
of objects that exist in the environment change or the
environment itself is subject to uncertainty or change.
Also robots with limited ability to transmit and com-
municate over longer distances can benefit from an
approach based on local communication.
Random walk models have been widely used to
model movement patterns such as dispersion. It is
possible under certain conditions to look at local con-
trol decisions in a biased random walk as a form of
Markov decision process. (Azar et al., 1992) examine
optimal strategy applicable to time independent long
term behaviour of a random walk on a finite graph
where local movement decisions can be viewed in
terms of a controller selecting from a set of available
actions to bias the behaviour of a markov chain. This
type of approach has relevance for this paper but is not
able to address time dependent local strategy forma-
tion. A reinforced random walk model was first pro-
posed by Coppersmith and Diasconis (Coppersmith
and Diaconis, 1987) as way of modeling a person ex-
ploring a new city (See also (Davis, 1990)). Ran-
dom walk models can be used as an important part
of more specific models for spatial exploration and
cooperative interaction. For example a biased ran-
dom walk model which uses feedback with the en-
vironment to influence a walkers movement is the ac-
tive walker model originally formulated by (Lam and
Pochy, 1993).
ICINCO 2005 - ROBOTICS AND AUTOMATION
336