4.4 Simulating Team Behaviors
The spreading activation networks are then initialized. Each network consists of three
layers. The preconditions layer is initialized with propositions as observed in the cur-
rent state of the environment. The effect (or postconditional) layer consists of reward
values for each condition being present in the goal state, which then determine action
utility as in (3). Table 2 shows the reward values assigned to each condition. A negative
reward implies that the condition is undesirable, which is the case with a wall in front
of the robot. With negative rewards, the robot will plan against adverse environmental
conditions: for instance, if it is likely that a wall will be observable in front of the robot
after performing an action, that action’s utility is lessened. Also, by assigning positive
rewards to having a wall on the left or right of the robot (as is the case in Table 2) will
result in a wall following behavior.
Table 2 does not include rewards given for each specific orientation. Assigning a
higher reward to facing north and east, for instance, would result in robots that tend
to explore towards the northeast. However, since these rewards are common to all ro-
bots, it is undesirable to hardcode behaviors in this way. Instead, the link strengths
between these orientation conditions and each action, which are unique for each robot,
will provide these behaviors. These orientation reward values must be balanced with
other utilities, such as wall conditions, otherwise the robot may prefer to face a wall as
long as it is oriented in the preferred direction. Also, these orientation rewards should
not be hardcoded to zero, or robots will be unable to develop behaviors based on these
conditions.
In selecting actions there is often the possibility that no action seems desirable, or
all actions seem equally desirable. In these situations the robot may continue planning
indefinitely. To keep this from occuring, the forward propagation lookahead is limited to
a value assigned at design time. The value chosen for the experiments was 4 lookahead
states. In the case that no conditions are present in the environment to make any of the
possible actions desirable (i.e. all action utilities are zero), the default action is to move
forward. This is rarely needed when the orientation conditions can themselves cause an
action to be desirable, which is further discussed in the next section.
4.5 Applying Evolved Team Actions to Spreading Activation
Next, the action sequences and corresponding condition vectors of the best team simu-
lation run from the GA evolution are used to initialize the link strengths of the spreading
activation networks. A portion of these link strengths are shown in Figure 9. Since these
link strengths are limited directly to the experience of the particular run corresponding
to the GA simulation, they are heavily biased to that particular run and are not guaran-
teed to necessarily represent the environment accurately. However, the biasing results in
each robot exhibiting a different behavior based on how they perceive the environment.
It is through this biasing that behaviors are developed.
Figure 10 shows a resulting simulation of 2000 time steps, or equivalently twice the
length of action sequences used in the GA simulations. Heterogeneous robot behaviors
are apparent from the figure. The path of robot #2, indicated in blue, is the least produc-
tive, but exhibits wall-following around the perimeter of the map. Robot 1, with a path