Cooperative Guidance of Lego Mindstorms NXT Mobile Robots

Julien Marzat, Hélène Piet-Lahanier and Arthur Kahn

ONERA – The French Aerospace Lab, F-91123 Palaiseau, France

Keywords:

Autonomous Robots, Cooperative Control, Model Predictive Control, Source Localization.

Abstract:

This paper presents experimental results of cooperative guidance laws embedded on Lego Mindstorms NXT

mobile robots for two types of missions. The ﬁrst one is navigation to a waypoint as a ﬂeet with collision

and obstacle avoidance, following a model predictive control (MPC) framework. The second one is source

localization, i.e., ﬁnding the maximum of a potential ﬁeld, for which a distributed estimation and control

strategy is proposed. Experiments show the ability to perform the two missions on these basic mobile robots,

in spite of their limited computational resources. In particular, the search for the optimal control sequence

through a dedicated discretization of the command space makes it possible to implement real-time MPC.

1 INTRODUCTION

Cooperation between autonomous mobile robots is a

challenging task which is currently a very active re-

search ﬁeld. Several approaches have been suggested

for designing decentralized control laws that allow

each vehicle to follow a trajectory without affecting

the performances of the other vehicles of the ﬂeet,

while achieving a required cooperative task [Murray,

2007]. An efﬁcient method is to evaluate distribu-

tively a common criterion based on each vehicle ac-

tion and measurements and the interaction between

vehicles. The determination of the control laws can

be be derived relying on approaches such as model

predictive control (MPC) [Dunbar and Murray, 2006]

with individual optimization.

This approach has been validated by simulation

but requires experimental testing to ensure that it

can be embedded on autonomous platforms. The

chosen test system is the low-cost Lego Mindstorms

NXT. This popular platform has already been used

for basic control of a single robot [Costa et al.,

2011, Valera et al., 2011] in particular for control

education [Canale and Brunet, 2013] and for test-

ing estimation or localization algorithms [Pinto et al.,

2012]. A few attempts at cooperativeness have been

made with these robots mainly for coordination and

ﬂocking [Benedettelli et al., 2009, Maze et al., 2012].

These works did not consider MPC as a potential so-

lution for ﬂeet coordination and collision avoidance,

since it is usually regarded as computationally too ex-

pensive. However, our experiments show that it is

possible to embed an MPC algorithm on robots with

limited computing capacities if a suboptimal search

procedure is adopted.

Two classical missions illustrate experimentation

of embedded cooperative control and estimation on

Lego Mindstorms NXT (see description of the robots

in Section 2). The ﬁrst one (Section 3) involves the

navigation to waypoints by a ﬂeet of robots, while

avoiding collisions between agents and with obsta-

cles. The second one (Section 4) consists in ﬁnding

the localization of a source as the maximum of a po-

tential ﬁeld based on local measurements performed

by the agents while moving in ﬂeet.

2 LEGO MINDSTORMS NXT

MOBILE ROBOTS

A ﬂeet of N Mindstorms robots has been considered,

all built according to a two-wheel differential struc-

ture (Figure 1). The dynamical model of the i-th ve-

hicle is











(k + 1) = x

(k) + ∆tv

(k) cos(χ

(k))

(k + 1) = y

(k) + ∆tv

(k) sin(χ

(k))

(k + 1) = χ

(k) + ∆tu

(k)

(1)

where p

= [x

, y

]

is the vehicle position and χ

its direction angle, which form the state vector

= [x

, y

, χ

]

; ∆t is the sampling timestep. The ve-

locity v

was set to a constant value for simplicity, the

only control input is thus the rotational speed u

= u

constrained between ±ω

max

605

Marzat J., Piet-Lahanier H. and Kahn A..

Cooperative Guidance of Lego Mindstorms NXT Mobile Robots.

DOI: 10.5220/0005119406050610

In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2014), pages 605-610

ISBN: 978-989-758-040-6

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

For the i-th robot, practical control of linear and

angular velocity via the controllable rotation speeds

of the wheels ω

and ω

is achieved by

(

(ω

+ ω

)

(ω

− ω

)

(2)

where r is the wheel radius and L the half-axis length.

Localization of each robot is performed using odom-

etry based upon the previous model (embedded wheel

sensors have an accuracy of 1 degree). Since the

test missions were short, this localization method was

deemed sufﬁcient for estimating the position of the

robots in spite of the error accumulated by odometry.

Figure 1: Lego Mindstorms NXT robot.

The Lego Mindstorms NXT embedded computa-

tional capabilities are provided by a main ARM CPU

clocked at 48Mhz assisted by a co-processor ATmega

clocked at 20Mhz, both communicating with 64kB of

DRAM and a storage ﬂash memory of 256kB. Es-

timation and control algorithms should thus be de-

signed accordingly in terms of number of operations

and data stored to be able to run on this architec-

ture. It can be programmed in many languages, here

NXC (“not exactly C”) was chosen for its simplicity

and ability to control all the parameters and elements

of the robot. Bluetooth communication was used to

allow robots to share their estimated positions and

possibly other information with the rest of the ﬂeet.

A major practical constraint of the Mindstorms sys-

tem is that the communication graph requires a single

master robot with a maximum of three slaves, which

limits the ﬂeet to four robots. An exchange rate larger

than 20Hz was recorded in practice for sharing posi-

tion information between 2 robots.

3 FLEET NAVIGATION WITH

COLLISION AND OBSTACLE

AVOIDANCE BY MPC

The ﬁrst mission considered is the navigation of a

ﬂeet of Mindstorms toward a waypoint, with colli-

sion and obstacle avoidance. This is tackled using

a simpliﬁed version of the MPC framework initially

deﬁned in [Rochefort et al., 2012], where more theo-

retical details can be found.

3.1 MPC Method

In distributed MPC [Scattolini, 2009], each vehicle

computes its control inputs at each timestep as a so-

lution of an optimization problem over the future pre-

dicted trajectory. For tractability reasons, ﬁnite pre-

diction and control horizon lengths, respectively de-

noted H

and H

are used. Future control inputs and

resulting state trajectories are

= ((u

(k))

, (u

(k + 1))

, ..., (u

(k + H

− 1))

)

= ((x

(k + 1))

, (x

(k + 2))

, ..., (x

(k + H

))

)

When H

< H

, we assume that the control inputs are

null after H

steps. Once the optimal input sequence

has been computed, each vehicle communicates

its predicted trajectory to the rest of the ﬂeet and ap-

plies the ﬁrst entry u

(k). The optimization problem

at time k is stated as

minimize J

, X

)

over U

∈ U

with x

(t) satisfying (1), ∀t ∈ [k + 1;k + H

]

(3)

is the cost function associated with vehicle i.

The constraints coupling the dynamics of the vehicles,

such as collision avoidance, are taken into account

by penalization. At the next timestep, each vehicle

solves its optimization problem considering that the

other vehicles follow their predicted trajectories. J

comprises a navigation cost J

nav

, a safety cost J

sa f ety

and a control cost J

such that

(k) = J

nav

(k) + J

sa f ety

(k) + J

(k) (4)

Formulation of each cost function is presented in the

following subsections. Weighting coefﬁcients W

•

are

tuned to set relative priorities between each aspect of

the mission.

3.1.1 Navigation Cost

The navigation cost J

nav

aims at controlling how ve-

hicles navigate to waypoints. It is decomposed into

nav

(k) = J

nav,direct

(k) + J

nav, f leet

(k) (5)

The reference trajectory to the next waypoint p

composed of points p

re f

i,p

(n|k) (n ∈ [k + 1, k + H

]).

They correspond to positions that vehicle i would

reach at timestep n if moving along a straight line to

at nominal velocity v

. These reference points are

thus deﬁned by (6) and the associated cost J

nav,direct

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

606

is given by (7). The notation

(n|k) represents the

predicted position of robot i at time n, starting from

instant k.

re f

i,p

(n|k) = p

(k) + (n − k) ∆tv

(k) − p



(k) − p



nav,direct

(k) = W

k+H

∑

n=k+1



(n|k) − p

re f

i,p

(n|k)



(6)

(7)

nav, f leet

aims at keeping the vehicles together as

a ﬂeet. Its deﬁnition penalizes the predicted distance

i j

(n|k) =



(n|k) −

(n|k)



between vehicles i and

j (i 6= j), as

nav, f leet

(k) = W

∑

j=1

j6=i

k+H

∑

n=k+1

1 + tanh



i j

(n|k) −β

i j



i j



(8)

where coefﬁcients β

i j

and α

i j

are deﬁned by

i j

= 6(d

loss

− d

des

)

−1

, α

i j

loss

+ d

des

) (9)

The coefﬁcient d

des

deﬁnes a desired distance be-

tween the vehicles inside the ﬂeet whereas d

loss

is the

maximum distance allowed between vehicles of the

ﬂeet. Vehicles j (i 6= j) beyond this maximum dis-

tance are not considered any more by vehicle i. This

represents for example limited communication and/or

sensing ranges.

3.1.2 Safety Cost

The safety cost J

sa f ety

aims at avoiding collisions with

obstacles, and between vehicles within the ﬂeet. It is

made of two costs

sa f ety

(k) = J

sa f e,veh

(k) + J

sa f e,obs

(k) (10)

The ﬁrst cost deals with collision avoidance between

vehicles by penalizing the predicted distance d

i j

be-

tween them:

sa f e,veh

(k) = W

∑

j=1

j6=i

k+H

∑

n=k+1

1 − tanh



i j

(n) − β

i j



i j



(11)

where

i j

= 6(d

des

− d

safe

)

−1

, α

i j

des

+ d

safe

)

(12)

where d

safe

represents a safety distance between the

vehicles. The costs J

sa f e,veh

and J

nav, f leet

with respect

to the distance d

i j

are plotted in Figure 2.

The other cost J

sa f e,obs

penalizes the predicted dis-

tance

of vehicle i to any obstacle o. It is deﬁned

Figure 2: Flocking (J

nav, f leet

) and avoidance (J

sa f e,veh

)

costs.

sa f e,obs

(k) = W

∑

o=1

k+H

∑

n=k+1

1 − tanh



(n|k) −β



(13)

where N

stands for the number of obstacles and

the parameters β

and α

are given by

= 6(d

des

− d

safe

)

−1

, α

des

+ d

safe

)

(14)

where d

des

and d

safe

are desired and safety distances

to obstacles. Locations of the obstacles are assumed

to be known in the experiments, but could also be de-

tected with infrared or ultrasonic Lego sensors.

3.1.3 Control Cost

As traditionally introduced in MPC, the control cost

(k) aims at limiting the control effort, and thus en-

ergy consumption of vehicle i. It is simply deﬁned

(k) = W

u,ω

k+H

∑

n=k+1

(n))

(15)

3.1.4 Online Computation of Control Inputs

As the computational cost should be reduced to cope

with the robot resources, we limit the search to a ﬁ-

nite set S of candidate control sequences. MPC guar-

antees stability even if the resulting sequences are not

optimal but ensures the steadily decrease of the cost,

as shown in [Scokaert et al., 1999]. At each timestep,

the control problem (3) is solved as follows:

1. using a model of the vehicle dynamics, predict the

effect of each control sequence of the set of can-

didates S on the state of the vehicle;

2. compute the cost J

corresponding to each remain-

ing candidate control sequence;

3. select the control sequence with smallest cost.

The distribution of the candidate control sequences is

chosen so as to limit their number while providing a

good coverage of the control space, according to the

following three rules:

1. the set S of candidates includes the extreme con-

trol inputs to exploit the full vehicle potential;

2. the set S of candidates includes the null control

input to allow to continue along the same path;

CooperativeGuidanceofLegoMindstormsNXTMobileRobots

607

3. candidates are distributed over the entire control

space with an increased density around the null

control input.

The set of control inputs for the problem at hand re-

duces to the discretization, with a varying step γ,

S =



2πγ



with γ ∈ [1,η

], (16)

where η

should be chosen to keep the computa-

tion of predicted trajectories within the duration of a

timestep.

3.2 Experimental Results

The chosen discretization strategy to select the op-

timal control input makes it possible to embed the

MPC guidance laws on such mobile robots with lim-

ited capabilities, which is otherwise not feasible. The

parameters for the experiments presented here were

= 0.1 m/s, ω

max

= 2.5 rad/s (consistent with the ac-

tual motor limitations), ∆t = 0.3 s, H

= 4, H

= 8 and

= 11. The computation time achieved at each iter-

ation was constant and comfortably smaller than ∆t.

Collision avoidance by MPC: To validate the com-

munication capabilities of the vehicles and the safety

cost ensuring collision avoidance, the ﬁrst scenario re-

quired each robot to go to the initial position of the

other robot while avoiding it on the way. The global

cost function was thus

= J

nav,direct

+ J

sa f e,veh

+ J

(17)

where the navigation cost J

nav,direct

for robot i steered

it to reach the initial position p

of robot j. The ob-

tained trajectories are given in Figure 3, where it can

be observed that the two robots reach their destination

with good accuracy and collision avoidance is effec-

tive with the desired distance (d

des

= 0.5 m here).

Figure 3: Collision avoidance trajectory.

Safe ﬂeet navigation by MPC: In addition to the

two costs tested in the ﬁrst experiment, the second

scenario involved a ﬂeet behavior (cost J

nav, f leet

) with

a desired distance between the vehicles equal to 0.2 m

and an obstacle (whose position is known) to avoid

thanks to the cost J

sa f e,obs

. For this experiment, the

global cost function was

= J

nav,direct

+ J

nav, f leet

+ J

sa f e,veh

+ J

sa f e,obs

+ J

(18)

where now both vehicles were required to go to the

same waypoint as a ﬂeet. The obtained trajectories

(Figure 4) illustrate the ﬂeet behavior of the vehicles

(distance d

des

= 0.2 m is respected) before they en-

counter the obstacle and avoid it (with a safety dis-

tance d

safe

= 0.1 m) and ﬁnally head toward the same

waypoint.

Figure 4: Fleet navigation with obstacle avoidance (motion

from left to right).

4 SOURCE LOCALIZATION

The second scenario is the localization of the maxi-

mum of a potential ﬁeld φ using the distributed mea-

surements acquired by the robots of the ﬂeet. This is

a problem where cooperative distributed estimation is

necessary, since a single robot would have difﬁculties

to localize the source with a single sensor (informa-

tion would be gathered only on its own path).

The MPC costs obtained for Scenario 1 are still

valid for ﬂeet management and collision or obstacle

avoidance, but the navigation cost should now be re-

placed by the control described in Section 4.2.

4.1 Distributed Estimation

The ﬁeld φ, function of position p

, is assumed to

be time-invariant and concave (maximum is unique,

second-order derivative ∇

φ is negative everywhere).

The source-localization algorithm consists in estimat-

ing locally the spatial gradient ∇φ of the ﬁeld φ us-

ing the ﬂeet of robots, each of the vehicle being able

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

608

to measure (with identical sensors) the ﬁeld value at

its current position and broadcast it to the rest of the

ﬂeet. The ﬂeet of vehicles will then move along the

direction of the estimated gradient to head toward the

maximum. For estimating the gradient, a vector y of

m measurements is considered

y =













(19)

where y

= φ(p

) + e

is the measurement of the ﬁeld

φ gathered at a vehicle position p

and e

is the mea-

surement noise assumed to be distributed according

to a zero-mean Gaussian distribution of variance σ

The different measurements y

( j = 1, ..., m) are ac-

quired by the robots of the ﬂeet and can possibly

be measurements at successive time steps. For in-

stance, if m is chosen to be greater than the number

of robots N, a design choice should be made to use

past measurements of the robots associated to succes-

sive positions (with a timestep large enough to avoid

ill-conditioning).

The actual function φ is unknown, so a linear es-

timation model is considered around a given posi-

tion p

= [x

, y

]

(corresponding to a virtual position

within the ﬂeet) as

(p) = φ(p

) + [x − x

y − y

][∇φ

∇φ

]

, (20)

which can be written as

(p) = [1 x − x

y − y

]β, (21)

where β = [φ(p

), ∇φ

, ∇φ

]

is an unknown vec-

tor of parameters to be estimated using the measure-

ment vector y and the corresponding positions p

( j =

1, ..., m). The model with the m measurements can

then be written as y = Hβ, where

H =







1 x

− x

− y

1 x

− x

− y







(22)

The least-square estimate of β is

β = (H

−1

y (23)

and the estimated gradient

∇φ = [

∇φ

]

can

be used as the steepest climbing direction to guide

the ﬂeet toward the unique maximum of the potential

ﬁeld. Note that with the assumed linear model, the

estimated gradient

∇φ is independent of the reference

point p

where it is computed.

4.2 Fleet Control for Maximum Seeking

The ﬂeet should then be guided toward the maxi-

mum by following the direction indicated by the es-

timated gradient. In a distributed scheme, each vehi-

cle estimates the same gradient value using the mea-

surements of all the ﬂeet, then aligns its velocity

on this direction. A speed control is considered as

= ∇φk∇φk

−1

with v

> 0, assuming that the esti-

mation error

∇φ − ∇φ is small enough. It could then

be proven that the robots will converge to the position

of the unique maximum p

∗

by considering the Lya-

punov function

V = (p

∗

− p

)

∗

− p

), (24)

whose derivative along the system trajectory is

V = −(p

∗

− p

)

∇φk∇φk

−1

. (25)

The Taylor expansion of the actual ﬁeld φ at position

, evaluated at the maximum position p

∗

yields

φ(p

∗

) = φ(p

) + (p

∗

− p

)



∇φ + ∇

φ(ξ)(p

∗

− p

)



(26)

where ξ belongs to the 2D interval [p

, p

∗

]. Then

V = (p

− p

∗

)

∇φk∇φk

−1

= (φ(p

) − φ(p

∗

))k∇φk

−1

+(p

− p

∗

)

∇

φ(ξ)(p

− p

∗

)k∇φk

−1

(27)

Since the ﬁeld is concave, ∇

φ < 0 everywhere and

since p

∗

is the maximum location, φ(p

) − φ(p

∗

) ≤ 0,

is chosen positive, therefore

V < 0 and the gradient

climbing control makes the vehicles converge toward

the maximum of the ﬁeld.

4.3 Experimental Results

A grayscale map is considered as a potential ﬁeld

for the experiments, the maximum being located at

the darkest spot. In addition to their positions, the

robots now also share their measured gray level (us-

ing the Lego color sensor). A ﬂeet of three robots is

considered and the mission stops when the measure-

ment of one of the robots reaches a predeﬁned target

value (another possible stopping condition could be a

threshold on the gradient norm). For estimation pur-

pose, 3 points are sufﬁcient to estimate the gradient, 6

can also be considered for better redundancy by tak-

ing into account past positions and measurements of

the robots. The NXT brick needs 10 ms to invert a

3x3 matrix and 80 ms to solve the least-square esti-

mation problem (23) with 6 measurements, which is

compatible with the chosen timestep of 300 ms. Fig-

ure 5 illustrates the gradient ascent in the joint space

of positions and gray level (mapped between 0 and 1

from lighter to darker). The ﬂeet successfully ﬁnds

the location of the maximum.

CooperativeGuidanceofLegoMindstormsNXTMobileRobots

609

Figure 5: Fleet trajectory toward maximum, with measured

potential level.

Figure 6: Sequence of ﬂeet convergence to the maximum.

5 CONCLUSIONS AND

PERSPECTIVES

The experiments reported in this paper have high-

lighted the possibility to control a ﬂeet of Lego Mind-

storms NXT to fulﬁll two types of missions with these

autonomous vehicles, notwithstanding their compu-

tational capabilities. A ﬁrst scenario has shown that

MPC can be a ﬂexible solution to deal with ﬂeet

management and collision avoidance between vehi-

cles and with obstacles. An efﬁcient discretization

strategy allows the MPC to ﬁnd an efﬁcient control

sequence within constrained time. In a second sce-

nario, a decentralized estimation and control scheme

to ﬁnd the maximum of a potential ﬁeld has been

presented. It involved linear parameter estimation to

obtain the gradient of the ﬁeld and a gradient-ascent

control proven to converge to the actual maximum lo-

cation. Implementation of the two strategies on the

ﬂeet of Lego Mindstorms NXT was successful, which

shows the interest of these platforms as a practical

testbed for cooperative estimation and control under

strict implementation constraints.

ACKNOWLEDGEMENTS

The authors would like to thank Guillaume Broussin

and Mathieu Touchard, who contributed to these ex-

periments during their internship at ONERA.

REFERENCES

Benedettelli, D., Casini, M., Garulli, A., Giannitrapani, A.,

and Vicino, A. (2009). A Lego Mindstorms experi-

mental setup for multi-agent systems. In Proceedings

of the IEEE Multi-conference on Systems and Control,

Saint Petersburg, Russia, pages 1230–1235.

Canale, M. and Brunet, S. C. (2013). A Lego Mindstorms

NXT experiment for model predictive control educa-

tion. In Proceedings of the European Control Confer-

ence, Zurich, Switzerland, pages 2549–2554.

Costa, P., Moreira, A., Gonçalves, J., and Lima, J. (2011).

Proposal of a new real-time cooperative challenge in

mobile robotics. In Proceedings of the 18th IFAC

World Congress, Milan, Italy.

Dunbar, W. and Murray, R. (2006). Distributed receding

horizon control for multi-vehicle formation stabiliza-

tion. Automatica, 42:549–558.

Maze, N., Wan, Y., Namuduri, K., and Varanasi, M. (2012).

A Lego Mindstorms NXT-based test bench for cohe-

sive distributed multi-agent exploratory systems: Mo-

bility and coordination. In Proceedings of the AIAA

Infotech@Aerospace, Garden Crove, California.

Murray, R. (2007). Recent research in cooperative control

of multivehicle systems. Journal of Dynamic Systems,

Measurement, and Control, 129(5):571–583.

Pinto, M., Moreira, A. P., and Matos, A. (2012). Localiza-

tion of mobile robots using an extended Kalman ﬁl-

ter in a Lego NXT. IEEE Transactions on Education,

55(1):135–144.

Rochefort, Y., Bertrand, S., Piet-Lahanier, H., Beauvois, D.,

and Dumur, D. (2012). Cooperative nonlinear model

predictive control for ﬂocks of vehicules. In Proceed-

ings of the IFAC Workshop on Embedded Guidance,

Navigation and Control in Aerospace, Bangalore, In-

dia.

Scattolini, R. (2009). Architectures for distributed and hi-

erarchical model predictive control–a review. Journal

of Process Control, 19(5):723–731.

Scokaert, P. O. M., Mayne, D. Q., and Rawlings, J. B.

(1999). Suboptimal model predictive control (feasibil-

ity implies stability). IEEE Transactions on Automatic

Control,, 44(3):648–654.

Valera, Á., Vallés, M., Marín, L., and Albertos, P. (2011).

Design and implementation of Kalman ﬁlters applied

to Lego NXT based robots. In Proceedings of the 18th

IFAC World Congress, Milan, Italy.

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

610