Hybrid Training to Generate Robust Behaviour for

Swarm Robotics Tasks

Pedro Romano

1,3

, Lu

ıs Nunes

1,2 a

and Sancho Oliveira

1,2,3 b

Iscte - Instituto Universit

ario de Lisboa, Av. Forc¸as Armadas, Lisboa, Portugal

ISTAR Iscte, Lisboa, Portugal

Instituto de Telecomunicac¸

oes, IT Iscte, Lisboa, Portugal

Keywords:

Evolutionary Robotics, Multirobot Systems, Cooperation, Perception, Object Identiﬁcation, Artiﬁcial

Intelligence.

Abstract:

Training of robotic swarms is usually done for a speciﬁc task and environment. The more speciﬁc the train-

ing is, the more the likelihood of reaching a good performance. Still, ﬂexibility and robustness are essential

for autonomy, enabling the robots to adapt to different environments. In this work, we study and compare

approaches to robust training of a small simulated swarm on a task of cooperative identiﬁcation of moving

objects. Controllers are obtained via evolutionary methods. The main contribution is the test of the effec-

tiveness of training in multiple environments: simpliﬁed versions of terrain, marine and aerial environments,

as well as on ideal, noisy and hybrid (mixed environment) scenarios. Results show that controllers can be

generated for each of these scenarios, but, contrary to expectations, hybrid evolution and noisy training do not,

in general, generate better controllers for the different scenarios. Nevertheless, the hybrid controller reaches a

performance level par with specialized controllers in several scenarios, and can be considered a more robust

solution.

1 INTRODUCTION

The penetration of fully autonomous robots in society

is still scarce. One of the key factors of this chal-

lenge is environment perception. In order to behave

autonomously, the robot needs to make a wide variety

of decisions that have to be supported by a great un-

derstanding of the environment surrounding it (Fitz-

patrick, 2003).

”Machine Perception” is a term used to describe

the capability of a machine to interpret data much like

humans use their senses to perceive the world around

it. A good level of perception will ultimately boost

the level of situation awareness, greatly improving the

chances of making a good decision.

Classic methods for synthesizing robotic con-

trollers are based on the manual speciﬁcation of its

behavior. For greater levels of complexity, man-

ually specifying all possible use cases and scenar-

ios a robot may encounter gets specially demanding.

This has motivated the application of artiﬁcial intelli-

https://orcid.org/0000-0001-7072-0925

https://orcid.org/0000-0003-1391-3194

gence (AI) and evolutionary computation (a subﬁeld

of AI and machine learning) to synthesize robotic

controllers. This approach started having promising

results (Lewis et al., 1992; Cliff et al., 1993) as the

evolutionary robotics (ER) ﬁeld of study started to

gain shape. Using this approach, an initial random

controller is optimized through several generations.

At each generation, a population of candidate solu-

tions is tested and the best performing solutions are

mutated, crossed-over and passed on to the next gen-

eration. With this method, we get an incrementally

better controller at each generation as we let evolu-

tion take care of the controller speciﬁcation.

A common framework for robotic controllers is an

artiﬁcial neural network (ANN). This approach is in-

spired by the way the human brain works, with com-

puter models of axons and neurons. One of the main

advantages of the ANN framework applied to robotic

controllers is the resistance to noise (Jim et al., 1995),

introduced for example by the normal imperfections

of real-world hardware (sensors). The ANN frame-

work is also a natural ﬁt for robotics, with its layer

architecture allowing for a direct mapping of the sen-

sors to the input layer and the actuators to the output

Romano, P., Nunes, L. and Oliveira, S.

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks.

DOI: 10.5220/0012193300003595

In Proceedings of the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), pages 265-277

ISBN: 978-989-758-674-3; ISSN: 2184-3236

265

layer. Sensor activation in ANN’s are usually repre-

sented by a value in speciﬁc range, for example [0,1].

Environment perception in robotics is a natural

evolution driven by the need to make robots ever more

autonomous and intelligent. Different approaches on

this subject have been studied over the years, based on

voice (Fitzpatrick, 2003), vision (Merino et al., 2006;

Spaan, 2010; Spaan et al., 2010) and touch (Le et al.,

2010) to perceive the environment.

Investigation on this subject although very sparse

in the means of perceiving and acting upon the envi-

ronment, concerns mostly terrain environments. With

the proliferation of devices like drones and the expan-

sion of robotic applications, it’s important to explore

different environments and create solutions that can

be applied to multiple scenarios. In particular, this

work will focus on simulating conditions characteris-

tic of terrain, aerial and marine environments and the

challenges that arise in both developing cooperative

active perception capabilities for swarms that are scal-

able to multiple environments and the new challenges

introduced by each of the environments’ singularities.

In the scope of this article, perceiving the environ-

ment can be described as the identiﬁcation of objects,

its features and further classiﬁcation. Upon the results

of that classiﬁcation, the robot can act on the environ-

ment, changing its state. The perception of each robot

is shared with the team-members in the ﬁeld of sight.

This aggregates as a cooperative active perception ap-

proach to swarm robotics.

This task is required for complex environments

where observations must be veriﬁed by several

sources for structures that have a much larger scale

than the sensors, or that need to be sensed in differ-

ent wavelengths, or using different types of sensors,

thus requiring the contribution of different elements

of the swarm each, identifying a speciﬁc set of char-

acteristics, to validate the identiﬁcation. In this case

the problem was simpliﬁed to sets of different color

that had to be observed at the same time and commu-

nicated to the peers.

We will focus in a task where a swarm of robots

navigates through an environment crossed by uniden-

tiﬁed objects. These objects carry a set of features,

each of which can only be observed from a different

viewpoint. The robots have three goals:

1. Identifying all the features of the objects

2. Catching the objects that fall in a certain category

deﬁned by the presence of a speciﬁc set of fea-

tures.

3. Keeping a formation like distribution on the envi-

ronment, simulating a patrolling behavior inside

the arena.

Although collective object identiﬁcation is not a novel

issue, the introduction of marine and aerial singular-

ities and the expectation of creating an environment

independent solution was not approached with depth

in previous studies and can have relevant applications,

from marine surveillance operations to aerial forest

ﬁres detection.

In summary, the key objectives are:

1. Develop a cooperative active perception approach

that is scalable to different types of environments

and its singularities.

2. The demonstration of the approach successfully

working on a simulation environment with known

real-world transferability (Duarte et al., 2012).

The main contributions are:

1. The assessment of the results in evolving a solu-

tion to a new learning task, suited to test coopera-

tive perception problems.

2. The evaluation of techniques to evolve more ro-

bust solutions that adapt to different environ-

ments.

2 RELATED WORK

Sensing the environment is one of the key features

to enable a fully autonomous behavior. To success-

fully develop a controller with these capabilities, sev-

eral problems need to be considered, in multiple ar-

eas: environment perception, object recognition and

computer vision.

In this section, we start with an overview of ER,

the technique that will be used in the synthesis of the

robotic controllers developed throughout this study

and we review various approaches studied for solving

the cooperative active perception challenges in swarm

robotics for autonomous robots.

2.1 Evolutionary Robotics

Evolutionary computation is a sub-ﬁeld of artiﬁcial

intelligence in which evolutionary algorithms (EAs)

are used. These algorithms are inspired on biologi-

cal mechanisms, following the same principles as the

natural evolution described by Darwin. The ﬁtness

function plays one of the most important roles in the

evolution, deﬁning the balance of the objectives to be

reached in order to get the most adequate solution af-

ter a couple generations.

ANNs are the most common framework of ER

controllers. This approach is inspired by the way the

human brain processes information, like biological

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

266

neurons (McCulloch and Pitts, 1943), with nervous

activities, neural events and relations being described

in terms of propositional logic.

A typical neural network includes ﬁve compo-

nents: (i) the input layer, (ii) the hidden layer, (iii) the

output layer, (iv) the weighted connections between

each of the previous components and (v) the activa-

tion function that converts the input to the output in

each of the nodes (neurons). The weighted connec-

tions as well as the activation function for the neu-

rons are the main parameters that deﬁne an abstract

ANN framework to solve a concrete problem. When

EAs are used, these parameters are obtained via the

global optimization methods characteristic of this ap-

proach. This process replaces the manual speciﬁca-

tion of the solution and it is the main advantage of

using this method.

Early approaches were often based on a spe-

ciﬁc type of ANN, a discrete time neural network.

Continuous-time recurrent neural networks (CTRNN)

were later introduced by Joseph Chen in 1998 with

appealing results (Chen and Wermter, 1998), ﬁlling

the gap of the discrete time neural network’s lack of

temporal dynamics, like short term memory.

ER comes as a natural concretization of EAs to

synthesize robotic controllers. These methodologies

started emerging in the 1990’s (Lewis et al., 1992;

Cliff et al., 1993). Even when the ﬁtness function

didn’t imply certain attributes, the evolution devel-

oped those capabilities to solve the task. The authors

consider the results sufﬁciently promising of future

success in the area.

Although the approach has proven successfully

in evolving creative solutions for simple behaviors

like foraging, formation, aggregation, etc, one of the

biggest challenges in the area is scaling up the ap-

proach to more complex tasks, mainly due to the boot-

strapping problem, where the goal is so hard/distant

that all individuals in the ﬁrst generation perform

equally bad, causing a slow start of the evolution

process. Transferring the robotic controllers from

simulation to real environments (crossing the reality

gap) is another big challenge, with proposed solu-

tions based on sensors, noise and real-world error es-

timation (Angelo Cangelosi, Domenico Parisi, 1994;

Jakobi et al., 1995; Hartland and Bred

eche, 2006).

In 2007, M. Eaton presents one of the ﬁrst applica-

tion of EAs to develop complex moving patterns of a

humanoid robot (Eaton, 2007) and successfully trans-

fers the solution to real hardware.

Miguel Duarte conducted a study (Duarte et al.,

2012) that introduced a novel methodology for de-

veloping controllers for complex tasks: recursively

splitting them into simpler tasks until these are sim-

ple enough to be evolved; controllers to manage the

activation of these tasks are also evolved. Then, a

tree-like composition of simple tasks and its activa-

tion controllers make up the solution for the initial

complex task.

2.2 Cooperative Active Perception

As referred by Paul Fitzpatrick in (Fitzpatrick, 2003),

it is difﬁcult to achieve robust machine perception, but

doing so is the key to intelligent behavior. The author

also defends an active perception approach, as ﬁg-

ure/ground separation is difﬁcult for computer vision.

This author conducted studies using active vision and

active sensing for object segmentation, object recog-

nition and orientation sensitivity.

In 2006, Lu

ıs Merino (Merino et al., 2006) used a

cooperative perception system for GPS-equipped Un-

manned Aerial Vehicles (UAV)’s to detect forest ﬁres,

where active vision played the most important role. A

statistical framework is used to reduce the uncertainty

of the global objective (the ﬁre position) taking into

account each team-member distinct sensor readings

and their uncertainty. This approach provides a way

to exploit complementarities of different UAV with

different attributes and sensors.

The foundation of all this process is profoundly

linked to a robust perception, as such, correctly iden-

tifying objects. As stated by Q. V. Le in (Le et al.,

2010), angles in which objects can be viewed are the

main variable to increase likeliness of identiﬁcation.

This study produces great results in object identiﬁca-

tion as the robot is capable of observing the object

in many angles until certainty is reached, and was

proven to be better than passive observation and ran-

dom manipulation.

To drive the robot’s decision making based on an

incomplete and noisy perception is another challenge

described in 2010 by Matthijs T.J. Spaan in (Spaan,

2010) and (Spaan et al., 2010). The authors pro-

pose a Partially Observable Markov Decision Pro-

cess (POMDP) to develop an integrated decision-

theoretic approach of cooperative active perception,

as POMDPs “offer a strong mathematical framework

for sequential decision making under uncertainty, ex-

plicitly modeling the imperfect sensing and actuation

capabilities of the overall system.”. Later in 2014, the

authors introduced a new type of POMDP, POMDP-

IR (Information Reward), that extends the solution

with actions that return information rewards (Spaan

et al., 2014).

Another robot control approach for a perception-

driven swarm is presented by Aamir Ahmad in 2013

(Ahmad et al., 2013), where the author proposed and

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

267

implemented a method for a perception-driven mul-

tirobot formation control, with a weighted summed

term cost function to control multiple objectives.

This study was successful in demonstrating that the

authors’ approach enables a team of homogeneous

robots to minimize the uncertainty of a tracked ob-

ject while satisfying other criteria such as keeping a

formation. The approach consists in integrating two

main modules, a controller and an estimator.

Seong-Woo Kim states that fusing data from re-

mote sensors has various challenges (Kim et al.,

2015). The author focuses on the map merging prob-

lem and sensor multimodality between swarm mem-

bers to successfully extend perception range beyond

that of each member’s sensors. Compared with co-

operative driving without perception sharing, his ap-

proach was proven better at assisting driving deci-

sions in complex trafﬁc situations. The author pro-

poses triangulation and map reckoning to get the rel-

ative pose of the nodes allowing the information to be

properly fused. The approach assumes no common

coordinate system making it more robust.

In 2015, Tiago Rodrigues addressed the sensor

sharing challenges as well. In (Rodrigues et al., 2015)

the author proposes local communication to share

sensor information between neighbors to overcome

constraints of each member’s local sensors. Triangu-

lation is used to georeference the tracked object. The

proposed approach is transparent to the controller,

working as a collective sensor. This scenario was able

to achieve a much better performance than classic lo-

cal sensors.

These techniques present a diverse contribution in

terms of the robotic controllers used, and the sensing

and actuating capabilities. In most of the cited work,

active vision played the central role of the approach

(Fitzpatrick, 2003; Merino et al., 2006; Le et al.,

2010). In (Merino et al., 2006), a statistical frame-

work is used in the controllers to estimate the target

position, and perception with heterogeneous teams is

tested. In (Spaan et al., 2010; Spaan, 2010; Spaan

et al., 2014), POMDP’s were used to model decision

making under uncertainty (good for noisy percep-

tions). The control of multiple objectives in a robotic

solution is presented in (Ahmad et al., 2013). Fusing

data sensed between multiple nodes also poses chal-

lenges studied in (Kim et al., 2015), and (Rodrigues

et al., 2015) presents a shared sensor solution to the

same problem.

The studies presented above develop and test per-

ception solutions centered in the linear terrain en-

vironment, and the development of cooperative ac-

tive perception systems using ER was not approached

with depth. The work presented in this study differs

in proposing a generic solution scalable to environ-

ments with different characteristics and overcoming

the challenges of the environments’ singularities, us-

ing ER techniques.

3 METHODOLOGY

We will now proceed to describe an approach for a

swarm robotics control system capable of collectively

identifying objects and making decisions based on the

identiﬁcation. It’s a common approach in robotic per-

ception to unfold the identiﬁcation of objects as the

identiﬁcation of speciﬁc features that build to a known

object or class of objects (Fitzpatrick, 2003; Le et al.,

2010). Our approach follows that direction: the iden-

tiﬁcation of an object is completed when all its key

features are seen by at least one of the robots in the

team. Those features can be sensed: (i) directly by

each team member using its local sensor and (ii) indi-

rectly through the shared sensor that allows each robot

to sense object features being seen by the teammates

in sight. From the controller’s point of view, there is

no distinction between the local and the shared sens-

ing of a feature.

In this work, the objects and its features serve as

a conceptual representation of any given category of

object and its features, respectively.

We’ll use a task in which a team of robots must

collectively identify a set of objects that pass by, and

catch the ones that fall into a certain category (have a

speciﬁc set of features).

For our experiments, we will use JBotEvolver

(Duarte et al., 2014) , a Java-based open-source neu-

roevolution framework and versatile simulation plat-

form for education and research-driven experiments

in ER.

3.1 Experimental Setup

To conduct our experiments, 8 circular robots with a

radius of 5 cm are placed in a 4x4 m bounded envi-

ronment. The unidentiﬁed objects have a 10 cm ra-

dius (twice the size of the robots), are generated in

intervals of 500 time steps (50 seconds) and can ap-

pear from any side of the arena, moving to the oppo-

site side. In 30% of cases, two objects will be on the

arena at the same time, increasing the identiﬁcation

complexity; in the remainder 70% of cases only one

object is inside the arena at the same time. The initial

position of the object is randomly assigned when only

one object is on the arena at a time and ﬁxed on the

bottom and top or left and right portions of the arena

when two objects are on the arena at the same time.

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

268

Having two objects inside the arena at the same time

should force the robots to separate in groups to pro-

ceed with the identiﬁcation. Object speed is variable,

assigned to each object at the moment of creation and

corresponding to a random speed between 0.15 and

0.35 cm/s, drawn from a uniform distribution.

Each object carries 4 features distributed around

the 4 quadrants of the object’s circular perimeter. In

the scope of this study, object features are repre-

sented by colors. To simulate the complexity asso-

ciated with large objects identiﬁcation (objects bigger

than robots) and scale the approach to multiple ob-

ject sizes, each robot can only see one feature at a

time. With this limitation, cooperation is needed to

sense all the features and proceed with the identiﬁca-

tion. The key is for the robots to position themselves

around an object so that each one is situated in a van-

tage point that enables it to see one feature directly

through its local sensor and all the others indirectly,

through the shared sensor that receives the percep-

tions from nearby teammates. An object is considered

identiﬁed if all the features are observed by a robot,

for 10 consecutive time steps.

The object features are contained in a predeﬁned

set of 8 features (4 enemy features and 4 friend fea-

tures), unknown by the robots. While enemy objects

always have 4 enemy features, friend objects can have

a mix of friend and enemy features (up to a max of 2

enemy features). This ambiguity serves a more real-

istic model and forces robots to evolve a more precise

identiﬁcation process. The order, mix and choice of

the features are all uniformly distributed random pro-

cesses that take place at the generation of each object.

An example of the object identiﬁcation scenario is

depicted on Fig. 1Schematics of the simulation envi-

ronment when identifying an object. Object 0 rep-

resents the unidentiﬁed object, with f1 to f4 repre-

senting its features; robot 0 to robot 3 represent the

swarm; grey ﬁlled sensors represent the local features

sensor of each robot; C represents the communication

between each team-member (shared features sensor)

and the circular lines represent the ﬁeld of commu-

nication of each robot and its teammates (radius of

robot sensor).. Here, each robot is sensing a different

feature of the object with its front facing local sensor.

All robots are inside of each other’s range of commu-

nication, thus being able to share the local perception.

As a result, the 4 robots are able to identify the object,

as each of them knows all the features. They are now

able to deduce it’s category and decide whether they

should catch the object if it’s an enemy (any of them

can take that action).

Robot 0

Figure 1: Schematics of the simulation environment when

identifying an object. Object 0 represents the unidentiﬁed

object, with f1 to f4 representing its features; robot 0 to

robot 3 represent the swarm; grey ﬁlled sensors represent

the local features sensor of each robot; C represents the

communication between each team-member (shared fea-

tures sensor) and the circular lines represent the ﬁeld of

communication of each robot and its teammates (radius of

robot sensor).

3.2 Controller Architecture

The robotic controller will be obtained using the AI

methods introduced in section 2.1 and is driven by a

CTRNN. The optimization will be set to maximize

a ﬁtness function that measures the solution perfor-

mance.

The controller architecture is composed of 2 actu-

ators and 5 sensors. The information from the envi-

ronment perceived by the robot through its sensor is

represented by a [0,1] value and mapped to the neural

network inputs. A hidden neuron layer is also used,

with 5 hidden neurons. The neurons in this layer are

connected to each other and to themselves, maintain-

ing a state (this allows for short term memory). The

output layer of the ANN is connected to the robot’s

actuators.

An array of wall, robot, distances, features and

team-mate density sensors were chosen. Together,

they provide all the necessary information to success-

fully solve the proposed task. All the sensors, actu-

ators and corresponding ANN inputs and outputs are

described in Table 1Controller Architecture: Robot

Sensors and Actuators and corresponding ANN In-

puts and Outputs..

The following equation describes the network be-

haviour:

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

269

Table 1: Controller Architecture: Robot Sensors and Actuators and corresponding ANN Inputs and Outputs.

Sensor ANN Inputs

i) Wall Sensor 4

Reading in range [0,1] depending (total of 4 sensors around the

on distance to closest wall robot each with 90

◦

aperture)

ii) Robot Sensor 4

Reading in range [0,1] depending (total of 4 sensors around the

on distance to closest robot robot each with 90

◦

aperture)

iii) Object Distance Sensor 4

Reading in range [0,1] depending (total of 4 sensors around the

on distance to closest object robot each with 90

◦

aperture)

iv) Object Features Shared Sensor 8 (local) + 8 x N

close robots

(shared)

Binary readings corresponding (2 local sensors arranged like

to the feature in sight eyes, with 35

◦

aperture and

for the closest object 10

◦

between the eyes)

v) Robot Density Sensor 1

Reading corresponding (1 sensor)

to the percentage of robots

in sight according to total

Actuator ANN Output

i) Differential Drive Actuator 2

Output in range [0,1] depending (left and right)

on speed

ii) Object Catch Actuator 1

Binary output to catch an (catches closest object

object at max distance of 0.1 m)

= −H

∑

j=1

hidden

∑

k=1

Z(H

+ β

) (1)

with

Z(x) = (1 + e

−x

)

−1

(2)

where τ

represents the decay constant, H

the neu-

ron state and ω

the strength of the synaptic connec-

tion between neurons j and i (the weighted connec-

tions). β represents the bias and Z(x) is the sigmoid

function (equation 2). in represents the total number

of inputs and hidden the total number of hidden nodes

(5 were used). β, τ and ω

compose the genome that

encodes the controller behavior, and are the param-

eters randomly initialized at the ﬁrst generation and

optimized throughout the evolutionary process, where

β ∈ [−10, 10], τ ∈ [0.1, 32] and w

∈ [−10, 10]. Inte-

grations follow the forward Euler method with an in-

tegration step size of 0.2 and cell potentials set to 0 at

network initialization.

The sensors follow the conﬁguration depicted on

Fig. 2Robot sensors representation. Sensors i), ii) and

iii) are placed all around the perimeter of the robot

and sensor iv) consists in 2 front facing sensors with

an eye-like distribution, for a more realistic approach

since the perception is based on vision. This also al-

lows the robot to sense the path to reach the object

(due to the sensors overlapping at the center).

To catch the objects, robots have a binary actuator.

When active, the closest object is caught by the robot

if situated at a maximum distance of 0.1 m.

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

270

Wall, Robot distance and Object distance sensors

(i,ii and iii)

1 m

0.75 m

45º

Object features sensor (iv)

Eyes overlapping (15º)

Figure 2: Robot sensors representation. 4 sensors with 90º

opening angle for sensors i), ii) and iii) and 2 eyes-like sen-

sor with 45º opening angle and 15º of overlapping for sensor

iv).

3.3 Fitness Function and Evolutionary

Process

To obtain the controller, the evolutionary process was

conducted 10 times (evolutionary runs) during 2000

generations. Each generation is composed of 100 in-

dividuals, each corresponding to a genome that en-

codes an ANN. To select the best individuals in a

generation, the considered ﬁtness is the average of 30

tests. Each sample is tested during 5000 time steps

(500 seconds). For the test, every robot in the swarm

has the same genome. After each individual is eval-

uated, the top 5 are included in the next generation

and used to create the remanding 95 individuals of

the population: each one of the top individuals gener-

ates 19 new individuals by applying gaussian noise to

each genome with a probability of 10%.

The ﬁtness function is very simple: it rewards

robots for identifying and catching enemies and pe-

nalizes them for catching friends. A formation com-

ponent was added to the ﬁtness function, to stimulate

the robots to evolve a patrolling behavior and spread

out inside the arena, maintaining a known distance to

each other. The evolution is set to optimize this ﬁtness

function, described in equation 3:

= α

+ β

(3)

with

timesteps

∑

n=0

−2

, if ADN ∈ [S

−

, S

] (4)

timesteps

∑

n=0

−|ADN − S

| × 10

−2

, otherwise (5)

and

Enemy

identi fied

5 × 10

−3

Enemy

caught

−3

−

Friends

caught

2 × 10

−3

−

Unidenti f ied

caught

−3

(6)

and β

correspond to the formation compo-

nent of the ﬁtness function and the object identi-

ﬁcation component, respectively. ADN (Average

Distance to Nearest) is the average distance of the

robots to its closest team-mate, S

is the robot team-

mates sensor radius. Enemy

identi f ied

is the total

number of enemy objects that were identiﬁed dur-

ing the test, Enemy

caught

corresponds to the total

number of enemy objects caught. Friends

caught

and

Unidenti f ied

caught

corresponds to the total number of

friends / inoffensive objects caught, respectively. The

formation component rewards the robots for keeping

a distance between each other that corresponds to the

radius of their teammates sensor (S

) with an error

margin of

. This allows them to disperse around

the environment in search for objects while keeping a

known distance to their teammates.

4 MULTIPLE ENVIRONMENTS

The global contribution of this work is not only to

present a novel cooperative active perception solution

using EAs, but also to ﬁll a gap in the current state of

the art: evaluate the possibility of evolving generic so-

lutions, adaptable to environments with multiple char-

acteristics and its singularities.

In the real-world, external factors heavily inﬂu-

ence the swarm performance. In this section, we will

model different environments, mainly governed by

external conditions that inﬂuence the swarm perfor-

mance. Three main classes of environments will be

considered: (i) terrain, (ii) marine and (iii) aerial.

On a terrain environment, we simulate visual and

navigational obstacles present on terrain scenarios.

While terrain irregularities can be handled by the

robotic driver and thus don’t need to be handled by

the controller, accessibility issues like obstacles or ob-

ject occlusion will beneﬁt from an optimized behav-

ior to solve the task in these conditions. In our model,

we included a set of rectangular opaque obstacles dis-

tributed around the environment.

The marine environment can help develop swarms

capable of running patrolling and exploration marine

tasks. Our model of the marine environment is based

on previous studies that successfully obtained con-

trollers capable of crossing the reality gap in a ma-

rine environment (Duarte et al., 2016) and is centered

around two main characteristics: (i) a constant drag-

ging current and (ii) robots movement inertia. Each

robot has two marine propellers (left and right) that

are controlled by the differential drive actuator.

Regarding the aerial environment, several studies

(Pﬂimlin et al., 2004; Cheviron et al., 2009; Leonard

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

271

et al., 2012) address some of the challenges of con-

trolling an Unmanned Aerial Vehicle (UAV): maneu-

verability, wing gusts and other aerodynamic efforts.

In (Cheviron et al., 2009), the authors study the inﬂu-

ence of wind gusts on the system concluding that it is

a crucial problem for real-world outdoor applications,

especially on an urban environment. We will base our

model of the aerial environment in the simulation of:

(i) constant wind and (ii) intermittent wind gusts, as

these seem the most relevant challenges. This envi-

ronment can be used to achieve controllers capable of

drone obstacle avoidance and object detection.

All the agent’s solutions are built upon the solu-

tion presented in section 3.1. A description of each

of the environments and it’s singularities is summed

in table 2Multiple Environments: Description of each

environment and it’s singularities..

Obstacles width and height, sea current and wind

magnitudes, gust duration and whether a gust is

present or not are all random values drawn from a uni-

form distribution. The intervals used were designed,

by trial and error, to include different problems for the

agents to solve in each environment, although all on a

similar difﬁculty level.

5 RESULTS AND DISCUSSION

As we place the robots in different settings, the op-

timization will follow different paths and we obtain

different solutions, speciﬁcally optimized to the setup

the evolutionary process was conducted within. The

evolutionary process described in section 3.3 was

conducted in four main setup categories: (i) in each

of the 3 environments described, (ii) in an ideal setup

(described in section 3.1), (iii) in a noisy environment

and (iv) in a hybrid scenario - consists in each sample

being conducted in a different environment (terrain,

marine or aerial). A total of 60 evolutionary processes

were conducted, taking 29 days to complete on a com-

puter grid with an average availability of 75 workers.

Conducting the evolutionary process in each of the

3 environments gives us a benchmark for the target

controller behavior in each environment. This way,

we will obtain controllers speciﬁcally optimized for

each environment. If we ﬁnd controllers obtained via

other methods (ie. noisy or hybrid) to perform as

good as the environment speciﬁc controllers, the most

generic solution will be validated.

The noisy environment was introduced as noise

can be seen as an abstract and multi-purpose way of

generating a more robust solution. Introducing noise

on the ANN Inputs during the evolutionary process

is one of the known ways of creating a solution that

is able to cope with slightly different conditions than

the ideal environments usually used during training,

thus boosting the ability to cross the reality gap. All

sensors are affected by the noise with a ﬁxed offset of

[-0.1,0.1] and random noise [0.1,0.1], for each read-

ing as suggested in (Romano et al., 2016). For ob-

ject features, a 10% probability of having each binary

reading reversed is used, a value equivalent to the pre-

vious. Offset, noise values and binary state reversions

are random processes drawn from an uniform distri-

bution.

In the hybrid scenario, 1/3 of the samples are con-

ducted in the terrain, marine and aerial environments.

Controllers will be tested not only in the environ-

ment they were evolved in but also in all the others.

The evolutionary process was conducted to optimize

the ﬁtness function set in Eq. 3 with the conﬁgura-

tion detailed in Table 1Controller Architecture: Robot

Sensors and Actuators and corresponding ANN In-

puts and Outputs.. The tests are done to the best con-

troller resulting from the evolution, with an average

of 100 samples during 10000 time steps.

Results for each evolution are condensed in Table

3Each evolution tested in each environment. in terms

of: ﬁtness, percentage of friends and enemies iden-

tiﬁed and percentage of enemies and friends caught.

In Figure 3All controllers tested in all scenarios we

can analyse the ﬁtness dispersion of each controller,

tested on each scenario.

Although environment speciﬁc evolution provided

good results, it was not always the best option. The

terrain environment is an example: the ideal and ma-

rine evolved controllers had better performance when

tested on the terrain environment, with an average ﬁt-

ness of 2576 ± 1340 and 2191 ± 1261, respectively,

while the terrain evolved controller scored a ﬁtness of

2158 ± 1433 (15% lower). Although the margin is

small, it stands out the fact that the terrain evolved

scenario was not the best ﬁt to solve the task in the

environment it was trained in, possibly due to the

complexity of the scenario preventing the evolution

from extracting the object identiﬁcation and catching

as well as it did on the ideal environment. The char-

acteristics of the terrain environment conducted the

evolution to a behavior in which the swarm separates

in small search groups strategically placed in spaces

conﬁned by the obstacles.

When tested on the noisy environment, all con-

trollers failed to solve the task. Although the noise

magnitude used in these experiments gave us good

results in previous studies (Romano et al., 2016), it

appears to be destructive in this scenario. In previous

studies, we used a simple aggregation and formation

task with identical noise applied. The controller we

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

272

Table 2: Multiple Environments: Description of each environment and it’s singularities.

Environment Singularities

Terrain Environment i) [0,7] obstacles inside the arena, width and height between [25,65]

Marine Environment i) F

current

∈ [−0.1, 0.1] cm/s for each axis ﬁxed throughout each

sample (constant current)

ii) propellers movement inertia, with a maximum increment of

0.1m/s for each timestep,

for each propeller.

Aerial Environment i) F

current

∈ [−0.1, 0.1] cm/s for each axis ﬁxed throughout each

sample (constant wind)

ii) F

gust

∈ [−2, 2] cm/s for each axis ﬁxed ﬁxed throughout the gust

period. Gust

period

∈ [0, 20]

seconds, sorted at the beginning of each wind gust.

Wind gusts are intermittent, sorted between silent and windy peri-

ods.

present in this work shares many of the same sensors

and actuators as the solution on the previous study,

the biggest difference being the shared features sen-

sor. While the search and identiﬁcation portion of

the behavior seems correct, the categorization was

the main variable to fail in the controller (that caught

both enemies and friends), leading us to conclude the

shared features sensor was the bottleneck that caused

the noisy evolved controller to fail, being the compo-

nent less prone to noise.

The marine environment is the environment with

the biggest discrepancy between the environment spe-

ciﬁc evolution performance and the remaining, with

the environment speciﬁc controller scoring an aver-

age ﬁtness of 3473 ± 1549. Hybrid evolved controller

on this environment scored a lower ﬁtness of 2242 ±

1120. The ideal, terrain and aerial evolved controller

scored the lowest ﬁtness by a big margin: 843 ± 1518,

694 ± 1415 and 768 ± 901, respectively. The robot

movement inertia is the main difference in this en-

vironment. This results shows us that although the

adaption to this characteristic is needed (low ﬁtness

on the ideal evolution), the adaption is not hard for

the evolution to handle (high ﬁtness in the environ-

ment speciﬁc and hybrid controllers). Direct observa-

tion of the behavior presents no visible differences to

the remaining solutions.

The aerial environment presented the lowest

global ﬁtness values among the three environments.

With no clear performance distinction from the envi-

ronment speciﬁc solution, we conclude that the evo-

lution was not able to generate a controller that com-

pensates for the wind gusts. Observing the behavior,

we notice that when the wind gusts appear, the robots

lose control of the object being identiﬁed. Controllers

evolved in the aerial environment revealed a tendency

to always keep close together (behavior found on 90%

of the evolutionary runs). This tendency was not ob-

served in the remaining scenarios and represents a

speciﬁc path the aerial evolution followed, possibly

keeping teammates close to use them as a reference

to acquire awareness of when the wind gusts drag the

robots out of their position.

We noted that on all environments, the hybrid con-

troller performance revealed to be on par with the en-

vironment speciﬁc results in terms of ﬁtness. To fur-

ther analyze these results, the differences between the

environment speciﬁc controllers and the hybrid con-

troller are condensed on Table 4Environment spe-

ciﬁc controllers compared to the hybrid evolved con-

troller in each scenario., for: (i) ”enemies identiﬁed

ratio”, (ii) ”enemies caught ratio” and (iii) ”friends

and unidentiﬁed objects caught ratio”.

We notice that the differences between the two ap-

proaches range from a positive performance of [0,2]%

for the hybrid controller in the terrain and marine en-

vironment and a slight degradation of performance of

[1,8]% in the aerial environment.

The hybrid controller reveals to be equivalent to

the environment speciﬁc controllers in the terrain and

marine environments, and worse on the aerial envi-

ronment. The worse performance on the aerial envi-

ronment is common to most of the experiments, pos-

sibly linked to the overall complexity of this environ-

ment. Still, the differences found between these are

of small magnitude. In terms of observable behav-

ior, there are no visible differences as both solve the

task in the same manner. We can state that the perfor-

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

273

Table 3: Each evolution tested in each environment.

Testing scenario Fitness ± Stdev Enemies iden-

tiﬁed (%)

Enemies

caught (%)

Friends caught

(%)

Unidentiﬁed caught

(%)

Terrain environment evolution

Ideal Environment 2874 ± 1922 54% 42% 1% 5%

Noisy Environment -529 ± 1713 5% 0% 0% 4%

Terrain Environment 2158 ± 1433 41% 32% 1% 3%

Marine Environment 694 ± 1415 25% 11% 0% 3%

Aerial Environment 752 ± 801 22% 13% 0% 3%

Hybrid Environment 1263 ± 1056 31% 20% 1% 3%

Marine environment evolution

Ideal Environment 3067 ± 1668 56% 46% 2% 7%

Noisy Environment -492 ± 1600 4% 0% 0% 4%

Terrain Environment 2191 ± 1261 43% 34% 2% 5%

Marine Environment 3473 ± 1549 59% 51% 2% 7%

Aerial Environment 737 ± 684 24% 13% 0% 4%

Hybrid Environment 2242 ± 1120 42% 34% 1% 5%

Aerial environment evolution

Ideal Environment 2385 ± 1026 48% 40% 6% 7%

Noisy Environment 92 ± 210 15% 2% 3% 1%

Terrain Environment 1715 ± 799 37% 31% 5% 5%

Marine Environment 768 ± 901 26% 12% 1% 3%

Aerial Environment 1428 ± 618 35% 23% 2% 4%

Hybrid Environment 1334 ± 777 34% 23% 3% 4%

Ideal environment evolution

Ideal Environment 4239 ± 2573 71% 64% 2% 7%

Noisy Environment -90 ± 515 6% 0% 0% 1%

Terrain Environment 2576 ± 1340 45% 37% 1% 3%

Marine Environment 843 ± 1518 25% 12% 0% 2%

Aerial Environment 836 ± 748 22% 13% 0% 3%

Hybrid Environment 1557 ± 1112 33% 23% 0% 3%

Noisy environment evolution

Ideal Environment 2120 ± 428 59% 47% 42% 4%

Noisy Environment 2186 ± 319 58% 48% 43% 4%

Terrain Environment 1526 ± 209 45% 36% 32% 4%

Marine Environment 191 ± 714 25% 15% 13% 6%

Aerial Environment 443 ± 193 22% 10% 9% 2%

Hybrid Environment 809 ± 388 31% 20% 18% 3%

Hybrid environment evolution

Ideal Environment 3031 ± 1292 54% 45% 4% 6%

Noisy Environment 157 ± 220 16% 0% 0% 0%

Terrain Environment 2108 ± 951 41% 34% 3% 4%

Marine Environment 2553 ± 1176 48% 38% 2% 6%

Aerial Environment 1057 ± 483 27% 16% 1% 3%

Hybrid Environment 2004 ± 881 40% 31% 2% 4%

Table 4: Environment speciﬁc controllers compared to the hybrid evolved controller in each scenario.

Environment Terrain Environment Marine Environment Aerial Environment

Controller Env. spe-

ciﬁc

Hybrid (± diff) Env. spe-

ciﬁc

Hybrid (± diff) Env. spe-

ciﬁc

Hybrid (± diff)

Enemies identiﬁed (%) 41% 41% (0%) 43% 48% (+5%) 35% 27% (-8%)

Enemies caught (%) 32% 34% (+2%) 36% 38% (+2%) 23% 16% (-7%)

Friends caught (%) 1% 3% (+2%) 2% 2% (0%) 2% 1% (-1%)

Unidentiﬁed caught (%) 3% 4% (+1%) 5% 6% (+1%) 4% 3% (-1%)

mance for the hybrid controller on the terrain, marine

and aerial environments is similar to the one obtained

by evolving speciﬁc controllers, differing only by a

small negligible margin with no clear performance

impact.

6 CONCLUSIONS AND FUTURE

WORK

In this paper, we proposed a novel approach for

swarm robotics environment perception. This ap-

proach is different from the remaining state of the art

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

274

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Ideal Environment

Fitness

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Noisy Environment

Fitness

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Terrain Environment

Fitness

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Marine Environment

Fitness

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Aerial Environment

Fitness

●

2000

4000

6000

Ideal Noisy Terrain Aerial MarineHybrid

Hybrid Environment

Fitness

Figure 3: All controllers tested in all scenarios.

for two main reasons: (i) the controller is obtained

using EAs and (ii) the study is focused on scaling the

approach to multiple environments.

We conducted the study in a simulation scenario,

with unidentiﬁed objects appearing from any side of

the screen moving to the opposite side, with the pos-

sibility of having two objects on screen at the same

time. The evolved behavior consists in performing a

dispersed search around the arena, getting closer to

the objects when an enemy feature is detected. When

robots gather around the object, one of them catches

it. Attention given to friend features is lower, so

robots didn’t gather around the friend objects most

times, nor caught them.

Besides the ideal environment, we also modeled:

(i) a terrain environment based on obstacles randomly

placed around the environment, (ii) a marine environ-

ment with constant currents and inertia in the robots’

movements and (iii) an aerial environment with a con-

stant current and wind gusts. Also, we selected 2 main

scenarios that are known to evolve more robust behav-

iors: (i) noisy evolution and (ii) a hybrid evolution in

the multiple scenarios. These were compared to the

ideal evolution scenario.

When observing the evolved behaviors, two main

categories can be extracted: in the ﬁrst, the robots

evolved a behavior in which the team performs a dis-

persed search around the arena and then aggregates

around the object to proceed with the identiﬁcation;

in the second, the robots follow each other in circu-

lar paths around the environment once again aggregat-

ing towards the object to identify. The identiﬁcation

process followed very similar behavior in all experi-

ments: circumnavigating the object while front-facing

it until the identiﬁcation is complete. Specialization

was also observed on the environment-speciﬁc evolu-

tions: in the terrain evolved controller the swarm had

a tendency to separate in groups and search inside the

areas conﬁned by the obstacles.

The noisy evolution not only failed to evolve a

more robust and scalable solution, but also failed to

solve the task at all. The noise magnitude that was

adequate for similar tasks (Romano et al., 2016) re-

vealed to be destructive for this task. The global

objective of this work was to test and compare sev-

eral ways of developing a controller. The controller

should be capable of collectively identifying and cat-

egorizing a set of objects and act upon multiple types

of environments based on the categorization. This

objective was successfully completed as we demon-

strated how EAs could synthesize a controller capable

of solving this task. We have also tested the ﬂexibility

of a controller trained in multiple environments: the

hybrid solution. Although environment-speciﬁc con-

trollers globally outperformed the hybrid controller

in the respective environment, the difference between

the two was small enough to state that both controllers

are equally capable of solving the task.

Future work could start with the scaling of the ap-

proach using 3D models of the environments. This

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

275

would allow for a more realistic simulation with ma-

jor impact speciﬁcally on the aerial environment,

where the 2D representation used in this work is a

major simpliﬁcation. Another necessary step is the

deployment of the solution to real robots and real

environments, optimization and study of the chal-

lenges associated with it. The biggest difﬁculty for

the controller appeared to be on the aerial environ-

ment, speciﬁcally the wind gusts, that the controller

had difﬁculty in compensating. Future work could

also reside in optimizing this controller for better re-

sults in the different environments. For example, giv-

ing the controller access to a sensor that detects wind

gusts could help the robot compensate them and boost

the performance on the aerial environment.

ACKNOWLEDGMENTS

This work was partly funded through national funds

by FCT Fundac¸

ao para a Ci

encia e Tecnologia, I.P.

under projects UIDBEEA500082020 (Instituto de

Telecomunicac¸

oes) and UIDB044662020 (ISTAR).

REFERENCES

Ahmad, A., Nascimento, T., Conceicao, A. G. S., Moreira,

A. P., and Lima, P. (2013). Perception-driven multi-

robot formation control. In Proceedings - IEEE In-

ternational Conference on Robotics and Automation,

pages 1851–1856.

Angelo Cangelosi, Domenico Parisi (1994). The touch sen-

sitive behavior of caenorhabditis elegans: A simula-

tion approach using neural networks. Technical Re-

port May, Institute of Psychology C.N.R. - Rome.

Chen, J. and Wermter, S. (1998). Continuous Time Re-

current Neural Networks for Grammatical Induction.

In International Conference on Artiﬁcial Neural Net-

works,1998, pages 381–386.

Cheviron, T., Plestan, F., and Chriette, A. (2009). A robust

guidance and control scheme of an autonomous scale

helicopter in presence of wind gusts. International

Journal of Control, 82(12):2206–2220.

Cliff, D., Husbands, P., and Harvey, I. (1993). Evolving

visually guided robots. In Proceedings of the Second

International Conference on Simulation of Adaptive

Behavior (SAB), pages 374–383.

Duarte, M., Costa, V., Gomes, J., Rodrigues, T., Silva, F.,

Oliveira, S. M., and Christensen, A. L. (2016). Evolu-

tion of collective behaviors for a real swarm of aquatic

surface robots. PLoS ONE, 11(3):1–25.

Duarte, M., Oliveira, S., and Christensen, A. L. (2012).

Hierarchical evolution of robotic controllers for com-

plex tasks. In 2012 IEEE International Conference on

Development and Learning and Epigenetic Robotics,

ICDL 2012.

Duarte, M., Silva, F., Rodrigues, T., Oliveira, S. M., and

Christensen, A. L. (2014). {JBotEvolver}: A versatile

simulation platform for evolutionary robotics. In Pro-

ceedings of the International Conference on the Syn-

thesis & Simulation of Living Systems (ALIFE), pages

210–211.

Eaton, M. (2007). Evolutionary humanoid robotics: past,

present and future. Lecture Notes in Computer Sci-

ence, 4850:42.

Fitzpatrick, P. M. (2003). Perception and perspective in

robotics. In Proceedings of the 25th Annual Confer-

ence of the Cognitive Science Society.

Hartland, C. and Bred

eche, N. (2006). Evolution-

ary robotics, anticipation and the reality gap. In

2006 IEEE International Conference on Robotics and

Biomimetics, ROBIO 2006, pages 1640–1645.

Jakobi, N., Husbands, P., and Harvey, I. (1995). Noise and

the Reality Gap: The Use of Simulation in Evolution-

ary Robotics. Lecture Notes in Computer Science,

929:704–720.

Jim, K., Giles, C. L., and Horne, B. G. (1995). Effects of

Noise on Convergence and Generalization in Recur-

rent Networks. In Advances in Neural Information

Processing Systems (NIPS) 7, page 649.

Kim, S.-W., Qin, B., Chong, Z. J., Shen, X., Liu, W., Ang,

M. H., Frazzoli, E., and Rus, D. (2015). Multivehicle

cooperative driving using cooperative perception: De-

sign and experimental validation. IEEE Transactions

on Intelligent Transportation Systems, 16.

Le, Q. V., Saxena, A., and Ng, A. Y. (2010). Active Percep-

tion : Interactive Manipulation for Improving Object

Detection. Technical report, Stanford.

Leonard, F., Martini, A., and Abba, G. (2012). Robust non-

linear controls of model-scale helicopters under lat-

eral and vertical wind gusts. In IEEE Transactions on

Control Systems Technology, pages 154–163.

Lewis, M. A., Fagg, A. H., and Solidum, A. (1992). Ge-

netic Programming Approach to the Construction of a

Neural Network for Control of a Walking Robot. In In

IEEE International Conference on Robotics and Au-

tomation, pages 2618–2623.

McCulloch, W. S. and Pitts, W. (1943). A logical calculus of

the ideas immanent in nervous activity. The Bulletin

of Mathematical Biophysics, 5(4):115–133.

Merino, L., Caballero, F., Mart

ınez-de Dios, J. R., Ferruz,

J., and Ollero, A. (2006). A cooperative perception

system for multiple UAVs: Application to automatic

detection of forest ﬁres. Journal of Field Robotics,

23(3-4):165–184.

Pﬂimlin, J., Soueres, P., and Hamel, T. (2004). Hover-

ing ﬂight stabilization in wind gusts for ducted fan

UAV. 2004 43rd IEEE Conference on Decision and

Control (CDC) (IEEE Cat. No.04CH37601), 4(Jan-

uary 2005):3491–3496.

Rodrigues, T., Duarte, M., Figueir

o, M., Costa, V., Oliveira,

S. M., and Christensen, A. L. (2015). Overcoming

limited onboard sensing in swarm robotics through

local communication. Lecture Notes in Computer

Science (including subseries Lecture Notes in Artiﬁ-

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

276

cial Intelligence and Lecture Notes in Bioinformatics),

9420:201–223.

Romano, P., Nunes, L., Christensen, A. L., Duarte, M.,

and Oliveira, S. M. (2016). Genome Variations. In

Reis, L. P., Moreira, A. P., Lima, P. U., Montano, L.,

and Mu

noz-Martinez, V., editors, Robot 2015: Second

Iberian Robotics Conference: Advances in Robotics,

Volume 1, pages 309–319, Cham. Springer Interna-

tional Publishing.

Spaan, M. T. J. (2010). Cooperative Active Perception using

POMDPs. October, pages 4800–4805.

Spaan, M. T. J., Veiga, T. S., and Lima, P. U. (2010). Active

cooperative perception in network robot systems us-

ing POMDPs. In IEEE/RSJ 2010 International Con-

ference on Intelligent Robots and Systems, IROS 2010

- Conference Proceedings, pages 4800–4805.

Spaan, M. T. J., Veiga, T. S., and Lima, P. U. (2014).

Decision-theoretic planning under uncertainty with

information rewards for active cooperative percep-

tion. Autonomous Agents and Multi-Agent Systems,

29(6):1157–1185.

Hybrid Training to Generate Robust Behaviour for Swarm Robotics Tasks

277