et al., 2009) and new environments according to the
player’s progress(Lopes et al., 2018).
Other researchers have attacked the same problem
by recognizing the type of player, followed by a com-
parison to groups of similar players and, finally, con-
figuring the game according to the mapped character-
istics (Missura and G
¨
artner, 2009). However, it may
be difficult to fit the player to a group of other players,
as human behavior presents quite singular and unique
traits (Charles et al., 2005).
There are still other approaches that focus on
creating adaptable agents that directly interacts with
the environment, taking advantage of Reinforcement
Learning techniques (Sutton et al., 1998). In this case,
the agent receives the current state of the environment
and a reward value aiming at learning the policy it
should follow. Previous work(Andrade et al., 2005)
has followed this approach by using the classical Q-
learning algorithm (Watkins and Dayan, 1992). Nev-
ertheless, value-based methods such as Q-learning
may face difficulties to deal with continuous space
and may converge slowly, as they are not directly op-
timizing the policy function.
In this work we propose a NPC-agent that does
not need a representative model of the player to
adapt to him, and whose policy is directly in-
duced from how the agent observes the game. To
achieve that, we designed a reward function that is
based on a game-balancing constant and introduce it
into the Proximal-Policy-Opmitization (PPO) (Schul-
man et al., 2017) algorithm, a reinforcement learn-
ing method that directly optimizes the policy using
gradient-based learning. In order to tackle the com-
plexity of the environment, the PPO implementation
we selected follows a Deep Reinforcement Learning
approach (Mnih et al., 2015). In this way, we can also
benefit from the remarkable results that other game-
based problems have recently achieved (Mnih et al.,
2013; Silver et al., 2016; Lample and Chaplot, 2017).
We take advantage of the Unity ML-Agents Toolkit
(Juliani et al., 2018) to implement the graphical envi-
ronments and to run the PPO algorithm. Experimen-
tal results show that we are able to devise adaptable
agents, at least when facing other non-human players.
2 RELATED WORK
The challenge of balancing the game refers to the abil-
ity of a game to modify or adjust its level of difficulty
according to the level of the user so that he can be con-
tent with the game. This includes avoiding that the
player get stressed or bored when playing the game
due to very difficult or easy situations (Csikszentmi-
halyi and Csikszentmihalyi, 1992). In fact, (Andrade
et al., 2006) has showed that satisfaction and balance
are quite related, by asking some players to answer a
series of questions after experiencing a fighting game.
To achieve game balancing, previous work has fo-
cused on trying to add new content (environments,
elements, obstacles, etc., ) regarding the abilities of
the player while others has sought to create intelli-
gent agents capable of facing the player but without
hindering their possible success (Bakkes et al., 2012).
Examples of the first group includes (Bakkes et al.,
2014) and (Hunicke, 2005), which tries to modify
the environment or adding new elements, but know-
ing beforehand how to represent the behavior of the
player within the game.
Regarding the second group, in (Missura and
G
¨
artner, 2009), the authors tried to create an auto-
matic adjustment by first identifying the level of the
player and then fitting him into a specific group (Easy,
Medium or Difficult). The goal is to include a new
player inside one of these groups so that their oppo-
nents are easier or more difficult to confront. That
work sees the identification of the type of player as
a fundamental issue to have balanced games. Our
work aims to avoid creating abstractions and repre-
sentations of the player, as this is sometimes difficult
to observe beforehand. Instead, we focus on repre-
senting the current state of the game to determine the
decision-making policy of the agent against his oppo-
nent.
Meanwhile, (Silva et al., 2015) uses a heuristic
function to determine the performance of the player
when facing his enemies during the game and, accord-
ing to it, the difficulty of the game is increased or de-
creased. Regarding the use of Reinforcement Learn-
ing (RL), which is the Machine Learning technique
most used to game-based issues, (Andrade et al.,
2006) adopted RL to teach a virtual agent to imitate
the player. After that, the agent is further trained to
balance the difficulty of the game. The work pre-
sented in (Andrade et al., 2005), unlike to the previous
one, only used RL to teach the agent to fight and the
balancing is achieved by a heuristic function tuned ac-
cording to the abilities of the player. We do not intend
to teach the agent to imitate a player, but, instead, we
aim at devising an agent that learns altogether how to
play the game and how to adapt to the player.
3 REINFORCEMENT LEARNING
In this work we rely on Reinforcement Learning (Sut-
ton et al., 1998) to make the agent learn how to act to
achieve game balancing. We benefit from the Unity
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
694