2 BACKGROUND INFORMATION
Definition 1. An objective is defined as a goal for a
single player. An example of this could be: move to
position x,y on the field. An objective can require sev-
eral player actions, i.e. dash, kick and turn, to com-
plete, or none if the objective is already fulfilled.
Definition 2. A strategy is a set of objectives as-
signed to one or more players.
Definition 3. In this paper, a model is a timed au-
tomata modelled within Uppaal (David et al., 2015).
2.1 RoboCup
RoboCup is a set of annual competitions where teams
of autonomous robots compete in the game of soc-
cer. RoboCup features tournaments for both physi-
cal robots as well as simulated robots in 2D and 3D
environments (Robocup Federation, 2020a). This pa-
per focuses on the 2D simulator. Agents in the sim-
ulator must tackle problems such as interpretation of
noisy data, strategisation and coordination with other
robots, and finally time sensitivity as the server en-
forces a 100ms tick rate. The simulator is publicly
available through GitHub (Rodrigues et al., 2020).
The Soccer Server is the main component of the sim-
ulator, and is responsible for storing and updating the
game state. All communication between the agents
must be done through the server.
Three different types of agents can connect to the
server. The Trainer and the Online Coach have both
access to a perfect view of the game state. The Trainer
can move objects and change the game state, it can-
not be used in official matches, and it is meant to
test strategies in controlled environments. The Online
Coach can communicate via short messages broadcast
to the players (through the server), and can issue sub-
stitutions of the players during a match.
The Players can communicate with each other
(through the server), they periodically receive sen-
sory data, and they send back actions to be performed.
Sensory data come of three different types. Visual
data consists of distances and relative directions to
flags, other players and the ball, and it is distorted
depending on how far away the objects are located.
Body sensor data include current values of stamina
and head angle; auditory data include messages from
the referee, other players and the coach. Player ac-
tions are executed to influence the game state, and in-
clude dash, kick, turn, turn neck and say. The kick
and dash actions are accompanied by a power parame-
ter between 0 and 100 indicating how hard to kick and
how fast to dash. The player state has a stamina level
that dictates the effectiveness of the dash and kick ac-
tions, which by default starts starts at the upper limit
of 8000 stamina, and dash actions consume stamina
equal to the power of the action. Stamina regenerates
at a rate of at most 30 units per tick throughout the
game (The RoboCup Simulator Committee, 2020).
2.2 Uppaal Toolsuite
Uppaal is a modelling and verification toolsuite. The
latest edition of Uppaal comprises Uppaal Stratego,
which is a tool for generating strategies (Uppaal,
2019). A strategy in Uppaal Stratego consists of a
number of transitions in a timed automata depending
on the values of variables and clocks, the latter repre-
senting the passing of time.
A common issue for model checking of timed
automata is the state space explosion, which hap-
pens when the state space gets too big to analyse.
Within Uppaal Stratego, the strategies are generated
according to a query formulated in a query language
containing variables or clocks that should be opti-
mised (David et al., 2015). Strategies are generated
using different machine learning methods, among
which co-variance, Splitting, Regression, Naive, M-
Learning and Q-learning (the default method). In this
paper, Uppaal Stratego will only be used in its de-
fault setting using Q-learning. Models in Uppaal are
saved as XML files, which allows for easy direct ma-
nipulation of the model. The Uppaal verifier, called
verifyta, is a binary file used to run strategy gener-
ation queries on the models.
Timed automata have been used to strategise real
time systems in the past. The work in (Larsen et al.,
2016) used Uppaal for online synthesis of short-
period strategies on the fly. In fact, the computations
needed to learn an effective strategy in Uppaal grows
exponentially with the number of states, which grows
steepily with the time horizon for the strategy. A heat-
ing control strategy is created for the near future, and
then recalculated periodically. The traffic controller
described in (Eriksen et al., 2017) uses Uppaal to re-
duce waiting times at traffic lights by continuously
creating new strategies for intelligent traffic lights.
3 STRATEGISING RoboCup
Our approach, whose reference architecture is shown
in figure 1, employs Uppaal to create strategies for the
Players and for the Online Coach.
A player can represent its view of the world as a
Uppaal model, which is then used to generate a strat-
egy. The player then translates the strategy into ob-
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
274