near agent in the following manner. We measure the
average distance of the evolved remote agent to the
nearest object at t = T in the images of the training
set. Then, we evolve the near agent for positions that
are normally distributed with as mean an object posi-
tion and as standard deviation the measured average
distance at t = T.
We split evolution in two by evolving both the fea-
tures and the neural network weights in the first half,
and evolving only the neural network weights in the
second half. Evolution starts with a population of λ
different agents. An agent is represented by a vector
of real values (doubles), referred to as the genome.
In this genome, each feature is represented by five
values, one for the type and four for the two coor-
dinates inside the scanning window. Each neural net-
work weight is represented by one value. We evalu-
ate the performance of each agent on the task by let-
ting it perform R runs per training image, each of T
time steps. The fitness function we use in the first half
of evolution is: f
1
(a) = (1− distance(a))+ recall(a),
where distance(a) ∈ [0,1] is the normalised distance
between the agent’s scanning position at t = T and
its nearest object, averaged over all training images
and runs. The term recall(a) is the average propor-
tion of objects that is detected per image by an en-
semble of R runs of the agent a. An object is detected
if the scanning position is on the object. When all
agents have been evaluated, we test the best agent
on the validation set. In addition, we select the µ
agents with highest fitness values to form a new gen-
eration. Each selected agent has λ/µ offspring. To
produce offspring, there is a p
co
probability that one-
point cross-over occurs with another selected agent.
Furthermore, the genes of the new agent are mutated
with probability p
mut
. The process of fitness eval-
uation and procreation continues for G generations.
As mentioned, we stop evolving the features at G/2.
In addition, we set p
co
to 0, since cross-over might
be disruptive for the optimisation of neural network
weights (Yao, 1999). Moreover, we gradually dimin-
ish p
mut
. Finally, we also change the fitness function
f
1
to f
2
(a) = recall(a). At the end of evolution, we
select the agent that has the highest weighted sum of
its fitness on the training set and validation set (ac-
cording to the set sizes) to prevent overfitting.
The near agent is evolved in exactly the same man-
ner as the remote agent, except for the different start-
ing positions (close to the objects) and the fitness
function: g(a) = (1 − distance(a)) + precision(a),
which does not change at G/2. precision(a) is the
proportion of runs R of the near agent that detect ob-
jects at the end of the run. The goal of the near agent is
to refine the scanning position reached by the remote
agent, by detecting the nearest object and approaching
its center as much as possible.
The third phase of the AOD-method, the object
detector that verifies object-presence at the last scan-
ning position, is not evolved, but trained according to
the training scheme in (Viola and Jones, 2001).
3.3 Face-detection Task
We apply the AOD-method to a face-detection task
that is publicly available. We use the FGNET video
sequence (http://www-prima.inrialpes.fr/FGnet/
),
which contains video sequences of a meeting room,
recorded from two different cameras. For our
experiments we used the joint set of images from
both cameras (’Cam1’ and ’Cam2’) in the first
scene (’ScenA’). The set consists of 794 images of
720 × 576 pixels, which we convert to gray-scale.
We use the labelling that is available online, in which
only the faces with two visible eyes are labelled.
For evolution, we divide the image set in two parts:
half of the images is used for testing and half of the
images for evolution. The images for evolution are
divided in a training set (80%), and a validation set
(20%). We perform a two-folded test to obtain our
results, and run one evolution per fold.
3.4 Experimental Settings
Here we provide the settings for our experiments. The
maximal scanning shift j is equal to half the image
width for the remote agent, and equal to one third of
the image width for the near agent. The scanning win-
dow is a square with sides equal to one third of the
image width for the remote agent, and one fourth of
the image width for the near agent. The number of
time steps per agent is T = 5, and the number of runs
per image R is 20. We use n = 10 features that are ex-
tracted from the sampling window. We set the num-
ber of hidden neurons h of the controller to n/2 = 5,
while the number of output neurons o is 2. We set the
evolutionary parameters as follows: λ = 100, µ = 25,
G = 300, p
mut
= 0.04, and p
co
= 0.5.
4 RESULTS
4.1 Behaviour of the Evolved Agents
In this subsection, we give insight into the scanning
behaviour of the remote and near agents evolved on
the first fold (their behaviour on the second fold is
similar). Figure 3 shows ten independent runs of the