that last multiple time steps (Puterman, 1994).
Our RTS approach focusses on the sub-process
of mid-level combat strategy. Neural network im-
plementations of low-level combat behaviour have al-
ready shown reasonable results in (Patel, 2009; Buro
and Churchill, 2012). Agents in the game of Counter
Strike were given a single task and a neural network
was used to optimize performance accomplishing this
task. Our method uses task selection: instead of giv-
ing the neural network a single task for which it has to
optimize, our neural network optimizes task selection
for each unit. The unit then executes an order, like
defend the base or attack that unit. The implementa-
tion of the behavior is executed via an FSM. Abstract
actions reduce the state space and the number of time
steps before rewards are received. The reduction is
beneficial for RTS games due to the many options and
the need for real-time decision-making.
For learning to play RTS games, we use HRL with
a multi-layer perceptron (MLP). The combination of
RL and MLP has already been successfully applied
to game-playing agents (Ghory, 2004; Bom et al.,
2013). RL and MLP have for example been suc-
cessfully used to learn the combat behavior in Star-
craft (Shantia et al., 2011). The MLP receives higher-
order inputs, an approach where only a subset of (pro-
cessed) inputs is used that has been successfully ap-
plied to improve speed and efficiency in the game Ms.
Pac-man (Bom et al., 2013). Two RL methods, Q-
learning and Monte Carlo learning (Sutton and Barto,
1998), are used to find optimal performance against
a pre-programmed AI and a random AI. Since play-
ing in an RTS game involves a multi-agent system,
we compare two different methods for assigning re-
wards to individual agents: using individual rewards
or sharing rewards by the entire team.
We developed a simple custom RTS where every
aspect is controlled to reduce unwanted influences or
effects. The game contains two bases, one for each
team. A base spawns one of three types of units until
it is destroyed, the goal of these units is to defend their
own base and to destroy the enemy base. All decision-
making components are handled by FSMs except for
the component that assigns behaviours to units, and
this is the subject of our research.
2 REAL-TIME STRATEGY GAME
The game is a simple custom RTS game that focuses
on the mid-level combat behaviour. A lot of RTS
game-play features such as building construction and
resource gathering are omitted, while other aspects
are controlled by FSMs and algorithms to reduce un-
wanted influences and effects. An example is the A
∗
search algorithm which is used for path finding, while
unit building is done by an FSM that builds the unit
that counters the most enemies for which there is not
a counter already present. A visual representation of
the game can be found in Figure 1.
Figure 1: Visual representation of the custom RTS game.
The game consists of tiles; black tiles are walls
and can’t be moved through and white tiles are open
space. The units can move in 4 directions. We use the
Manhattan distance to determine the distance between
2 points. Although units do not step as large as a tile,
our A
∗
path finding algorithm computes a path from
tile to tile for speed. When a unit is within a tile of the
target, the unit moves directly towards it.
The goal of the game is to destroy the opponent’s
base and defend the own base. The bases are indicated
by large blue and red squares in Figure 1. The game
finishes when the hit-points of a base reach zero be-
cause of the units attacking it. Depending on the unit
type a base has to be attacked at least 4 times before it
is destroyed. The base is also the spawning point for
new units of a team, the spawning time depends on
the cool-down time of the previously produced unit.
There are three different types of units: archer,
cavalry and spearman. Each unit has different statis-
tics (stats) for attack, attack cool-down, hit-point,
range, speed and spawning time. Spearmen are the
default units with average stats. Archers have a
ranged attack but move and attack speed is lowered.
Cavalry units are fast and have high attack power but
take longer to build. All units also have a multiplier
that doubles their damage against one specific type.
The archer has a multiplier against the spearman, the
cavalry has a multiplier against the archer, and the
spearman has a multiplier against the cavalry. This
resembles a rock, paper, scissors mechanism, which
is commonly applied in strategy games.
The most basic action performed by a unit is mov-
ing. Every frame, a unit can move up, down, left, right
or stand still. If after moving, the unit is within attack-
ing range of an enemy building or enemy unit, the unit
deals damage to all the enemies that are in its range.
The damage dealt is determined by the unit’s attack
Hierarchical Reinforcement Learning for Real-Time Strategy Games
471