Assisted Feature Engineering and Feature Learning to Build

Knowledge-based Agents for Arcade Games

Bastian Andelefski and Stefan Schiffer

Knowledge-Based Systems Group, RWTH Aachen University, Aachen, Germany

Keywords:

Assisted Feature Engineering, Feature Learning, Arcade Learning Environment, Knowledge-based Agents.

Abstract:

Human knowledge can greatly increase the performance of autonomous agents. Leveraging this knowledge

is sometimes neither straightforward nor easy. In this paper, we present an approach for assisted feature

engineering and feature learning to build knowledge-based agents for three arcade games within the Arcade

Learning Environment. While existing approaches mostly use model-free approaches we aim at creating a

descriptive set of features for world modelling and building agents. To this end, we provide (visual) assistance in

identifying and modelling features from RAM, we allow for learning features based on labeled game data, and

we allow for creating basic agents using the above features. In our evaluation, we compare different methods

to learn features from the RAM. We then compare several agents using different sets of manual and learned

features with one another and with the state-of-the-art.

1 INTRODUCTION

In many domains, the knowledge of domain experts

can greatly help in constructing agents that perform

well in a certain task. However, how to transfer such

knowledge from the expert into an agent formulation

is often neither obvious nor easy to do. In this paper,

we tackle the creation of knowledge-based agents for

arcade style games. We do so by means of assisted

feature engineering and feature learning. The resulting

features are then used to construct knowledge-based

agents. The goal is to make use of expert knowledge

as conveniently as possible. We leverage human pat-

tern recognition skills and modelling competence and

combine it with machine learning techniques. With

our approach we maximize the impact of training data

in the learning and keep the agent’s decision making

traceable.

For our application domain, we chose the Arcade

Learning Environment (ALE) introduced by Belle-

mare (Bellemare et al., 2013). Based on the Atari

2600 emulator Stella, it provides a uniﬁed interface

for a large number of arcade games. Notably, this in-

terface only offers raw data, speciﬁcally screen and

RAM content as well as the current score. As a re-

sult, most agents have so far sidestepped the absence

of an explicit world model by using model-free ap-

proaches. An early ALE paper by Naddaf (Naddaf,

2010) is an interesting exception to this trend. Naddaf

compares several approaches of reinforcement learn-

ing and planning, including a few attempts at creating

better features by image recognition. While this pre-

processing does increase performance compared to

the raw image data, it does not result in signiﬁcant

improvements over the raw RAM data. These results

seem to indicate, that the RAM contains all relevant

information in a format that is well suited to machine

learning algorithms. It should therefore be possible to

isolate important features and turn them into a simple

world model.

We investigate this by building a toolkit and frame-

work that makes it simple and fast to analyze the data

and create appropriate world models from informa-

tion contained in the RAM. Our goal is to use these

tools to explore possible beneﬁts of using experts and

a model-based approach. We do this by using domain

knowledge and human pattern recognition skills to de-

ﬁne highly relevant features. These features are then

used to construct agents using decision tree learning.

The outline of the paper is as follows. We start by

introducing the background of our work, namely the

Arcade Learning Environment and the games we use

as well as the scikit learning toolkit and the learning

algorithms we use. After reviewing related work on

arcade game agents and feature learning we present

our approach. We detail our assistance with analysing

the RAM for manual feature creation and elaborate

on our feature learning. A brief view on our agent

228

Andelefski B. and Schiffer S.

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games.

DOI: 10.5220/0006202602280238

In Proceedings of the 9th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2017), pages 228-238

ISBN: 978-989-758-220-2

design is followed by an comprehensive evaluation

of the feature learning. We then compare the agents’

performance in playing arcade games with different

sets of features and with the state-of-the-art.

2 BACKGROUND

We introduce the frameworks and libraries that the

work described in this paper is based on. Further,

we present the foundations that underlie the learning

methods we use in our toolkit.

2.1 The Arcade Learning Environment

The Arcade Learning Environment was introduced by

Bellemare as an engaging way to compare game play-

ing agents. The benchmark is based on the classic

gaming console Atari 2600

and offers a wide selec-

tion of highly diverse games with different control

schemes and goals. One of the most engaging quali-

ties of ALE is the extremely restricted nature of the

system. While the games feature non-essential infor-

mation to make them more engaging to players, they

are running on a platform with only 128 bytes of RAM

and 18 possible actions. These actions are listed in

Table 1.

This dichotomy creates interesting AI domains that

are quite complicated, but remain within the compu-

tational scope of modern hardware. The number of

actions is reasonably small which makes it interesting

also for planning approaches. But even in learning-

based approaches it is helpful, because the training

data for a game playing agent is spread across fewer

actions and therefore more likely to be decisive.

2.2 Scikit-learn

Both, our feature learning and our game playing agent

are based on machine learning methods. To ensure

reliable performance and save time on our implemen-

tation, we use the machine learning library scikit-learn

(Pedregosa et al., 2011) for all machine learning meth-

ods in this paper.

Scikit-learn

was ﬁrst introduced in 2007 and has

grown into an extensive resource for state-of-the-art

implementations of machine learning algorithms. Its

high performance, active maintenance and liberal BSD

license make it an obvious choice among the long list

of machine learning libraries.

https://en.wikipedia.org/wiki/Atari_2600

http://scikit-learn.org/

2.3 Learning Methods

We now give a short introduction into the classiﬁers

used in our feature learning approach. A discussion

of why each classiﬁer was chosen can be found in

Section 4.3.1.

Random trees

are a way to counteract overﬁtting

with decision tree learning. With classical approaches

to decision trees it is not possible to increase general-

ization and accuracy at the same time. Random trees

achieve this by training multiple decision trees with

random sub-sets of the feature vector and training data.

A collection of such trees then yields a random deci-

sion forest (Ho, 1998). Since the trees in the forest are

trained independently, the generalization capabilities

can be raised without sacriﬁcing the accuracy. The

individual trees classiﬁcation results are combined, for

instance, by a majority vote (Breiman, 2001; Friedman

et al., 2001). With an increasing number of trees in the

forest very high accuracy results can be achieved (Ho,

1998).

Support vector machines

(SVMs) is a type of lin-

ear classiﬁer, meaning that it tries to divide the input

space with a hyperplane. This creates a binary classiﬁ-

cation, with the hyperplane as the decision boundary.

The deﬁning property of SVMs is, that they create

this hyperplane such that it maximizes the distance

between the two classes.

To avoid the linear restrictions of SVMs, the input

space can be mapped to a higher-dimensional feature

space through the ’kernel trick’. There are also sev-

eral methods like ’one-versus-all’ and ’one-versus-one’

to allow multiple classes by splitting up the problem

(Boser et al., 1992; Bishop, 2006).

Nearest neighbor classiﬁers

are the only entirely

model-free classiﬁer in our selection. This means that

they do not assume an underlying structure. Instead,

they retain the entire training data as it was provided

to them. New inputs are then categorized by ﬁnd-

ing the closest known data points, called the ’nearest

neighbors’. The classiﬁer votes among the selected

neighbors and then outputs the winning class (Fix and

Hodges Jr, 1951; Bishop, 2006).

2.4 Game Descriptions

This section gives a brief explanation of the games

that we chose to focus on, namely Space Invaders,

Bowling, and Private Eye. A basic understanding of

the game mechanics is required to understand some of

the examples throughout this paper.

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games

229

Table 1: List of possible actions in ALE.

noop (0) ﬁre (1)

up (2) right (3) left (4) down (5)

up-right (6) up-left (7) down-right (8) down-left (9)

up-ﬁre (10) right-ﬁre (11) left-ﬁre (12) down-ﬁre (13)

up-right-ﬁre (14) up-left-ﬁre (15) down-right-ﬁre (16) down-left-ﬁre (17)

reset* (40)

Space Invaders

is a vertical shooter with the goal

of defeating slowly approaching waves of alien space-

ships. These waves move from one side to the other

before moving down a single step. The controls are

very simple, consisting only of moving left or right and

shooting. In the early stages of the game three shields

can be used as protection against the shots emitted by

the grid of enemies. If the player is hit three times or

an enemy reaches the bottom of the screen the game

is over. Occasionally, a ’mothership’ appears at the

top of the screen. Defeating it gives a signiﬁcant point

bonus.

Bowling

is an extremely simple video game version

of the popular sport. The game is divided into position-

ing, essentially moving your character up and down

to align with your target, and ball guiding, where you

inﬂuence the balls curve on its way down the lane.

This inﬂuence deﬁnes the curve of the balls trajectory.

Other than that, the standard rules of bowling apply.

Private Eye

is an adventure game that revolves

around ﬁnding items and returning them to a speciﬁc

location on the map. The paths are ﬁlled with obsta-

cles like potholes, animals and villains. In addition,

there is a time limit of three minutes to collect and

return all items. Movement is restricted to left, right

and occasional up commands as well as jumping.

The three games represent different types of arcade

games so that we can examine the feasibility of our

approach for these different types. For a more detailed

discussion on why each game was chosen we refer to

Section 5.2.2. Screenshots of each of the three games

can be found in Figure 1.

3 RELATED WORK

We discuss related work with regard to agent design

and feature learning.

3.1 Agents

A wide range of agents have been written for ALE with

varying success. The most prevalent type are learning-

based. Among the most advanced are Hausknecht’s

HyperNEAT agent (Hausknecht et al., 2012) and

Mnih’s deep learning approach (Mnih et al., 2013).

Nair’s massively parallel implementation of Mnih’s ap-

proach (Nair et al., 2015) is likely the most successful

learning-based agent to date. It achieves superhuman

scores in most games.

The most consistent performance however is deliv-

ered by Lipovetzky’s planning-based agents (Lipovet-

zky et al., 2015). While there are comparably few

planners targeted at ALE, Lipovetzky’s iterated width

(IW) and two-queue best-ﬁrst search (2BFS) beat their

learning-based rivals in most games.

This result is likely due to the fact, that action and

reward are temporally divided in most games. While

planning manages to solve this problem for small gaps,

it still performs horribly in adventure games like Mon-

tezumas Revenge and Private Eye, where rewards are

offset by several seconds. A discussion of why deep

learning does not work well in these games can be

found in (Mnih et al., 2015).

3.2 Feature Learning

Feature learning describes the idea of distributing a

classiﬁcation task among several classiﬁers that are

organized in a hierarchical structure. Each classiﬁer is

trained to recognize a speciﬁc feature.

While the ﬁeld of feature learning is still young,

there are already encouraging results. For one, Coates

(Coates and Ng, 2012) achieved some of the highest

current scores on the CIFAR-10 dataset with a k-means

feature learning structure. A broader comparison of

different methods can be found in the results of a fea-

ture learning competition created by Google (Sculley

et al., 2011). The challenge revolved around learning

at most 100 features from 5.000 labeled examples of

malicious URLs that could then be used for supervised

prediction. While a lot of methods achieved results

that essentially solved the problem, the pack was led

by three different structures of random trees.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

230

(a) Space Invaders (b) Bowling (c) Private Eye

Figure 1: Screenshots from three games available in the Arcade Learning Environment.

Figure 2: Structural overview of the approach. Dashed

lines represent data ﬂow, solid ones express a component

relationship.

4 APPROACH

As mentioned in the introduction, the main focus of

this work is to involve human experts in producing rele-

vant data for agent development. This section outlines

our approach to this challenge.

4.1 Overview

The general idea is to use a human expert with knowl-

edge of a given game to create training data and an

appropriate world model of the game. We then create

a game playing agent based on this information. A

visual overview of the approach is given in Figure 2.

Before any analysis can occur, we need to

collect

game data

. These usually consist of RAM states,

screen images and the corresponding inputs of the

player. In a fully automated setup, this could be

achieved by using another agent, or even just randomly

choosing actions. There are however two important

beneﬁts to using a human expert. First, we can ensure

a much more efﬁcient exploration of the game at hand.

Situations that never occur unless the player fulﬁlls

speciﬁc requirements are abundant. Some of these re-

quire signiﬁcant skill that an automatic approach might

not provide. Second, the expert can intentionally cre-

ate training data that is relevant in fewer time. While

playing the game normally should, in theory, provide

all necessary data, it can be hard to ﬁnd sufﬁcient

quantities of speciﬁc situations.

Once we have collected enough relevant data, we

can begin the process of

feature selection

. We do this

through means of manual selection and machine learn-

ing. It should be noted, that our approach is entirely

based on RAM data. While image data could be in-

cluded, we are exploring the notion, that all relevant

data is contained in the RAM.

For the

manual analysis

, the expert chooses a few

states that share a feature value, like the number of

enemies on screen. This selection is done based on the

corresponding game images. The toolkit then indicates

the RAM sections that change the least among the

chosen states, leaving out unremarkable consistencies.

The idea behind this method is to highlight sections

that likely encode the feature we are looking for.

Because only basic features of a game can be easily

identiﬁed in the RAM, we also implement the more

general approach of

feature learning

. Instead of read-

ing values directly from the RAM, we train classiﬁers

with supervised learning to create high-level features.

In this case, the expert simply selects suitable example

states by looking at the corresponding images and then

labels them by hand.

The last step of our workﬂow is

autonomous play

The agent design is intentionally kept simple. To fo-

cus on feature quality and the advantages of a precise

world model, we use a decision tree agent. Observ-

ing an agent play can help in ﬁnding problems of the

underlying world model.

4.2 Assisted RAM Analysis

Because it would not be feasible to extract RAM por-

tions that correspond to speciﬁc features by simply

looking at the raw data, we implement methods to

highlight likely byte candidates. This section describes

these methods and gives a brief introduction into the

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games

231

related parts of our toolkit interface.

4.2.1 Visualizing Relevance

The central aspect of assisting the manual selection is

visualization. In a ﬁrst step, this means displaying the

RAM data in a pattern oriented fashion. Comparing

sets with 128 byte values each is quite demanding

and quickly becomes confusing. To work around this

limitation, we display these values as bits in a 32 by 32

black and white grid. As a result, an expert can simply

regard them as patterns.

But we can do more. By selecting RAM states that

share a given feature value, say the player stays at the

same spot, we can introduce some statistical assistance.

Since the states share a value, we want to highlight the

RAM sections that are the same in all of them.

rel

(S) =

∑

s∈S

p,s

|S|

(1)

Equation 1 does this by applying a bitwise calcula-

tion of the mean across the selected states given as set

. The bit position is indicated by

, so that

rel

is the

relevance of the bit at position

and

p,s

is the value

of the bit in state s at position p.

We also ﬁnd, that most of the RAM very rarely

changes throughout the entire game. Most states that

are highlighted are not uniquely consistent among the

selected states, but are consistent across all recorded

data. To counter this effect, we modify our approach

to remove all unremarkable similarities.

rel

(S) = max(0, rel

(S) − rel

(R)) (2)

Equation 2 removes states that rarely ever change

by simply subtracting the relevance across all recorded

states, given as set

, from the relevance across the

selected states. To preserve the property, that

rel

∈

[0, 1]

, we also cut off all negative results. These nega-

tive values indicate that a bit is less consistent in the

selected set. This is of no particular interest to us and

breaks the possibility of using the result as a simple

scalar later on.

Using this measure of relevance we can shade the

grid to better reﬂect how important speciﬁc parts of

the pattern seem to be. An example application can be

seen in Figure 3. It should be noted, that the relevance

rating is slightly modiﬁed by applying an exponent to

increase visual distinction.

4.2.2 Sample Procedure

To give an impression of the manual feature engineer-

ing, let us consider the game Space Invaders. One of

Figure 3: Raw and shaded RAM data as 32 by 32 grid.

the descriptive attributes that the domain expert might

identify is the number of enemies. An expert user

can create a container for this attribute and associate

recorded game data with certain values of that attribute.

The interface to record data and label them is shown

in Figure 4. The image shows selecting a candidate

for a RAM portion encoding the number of enemies

(modelled as ’e_number’).

Figure 4: Visualization a candidate region in the RAM for

the container ’e_number’. The colors below the game-screen

indicate the changing value. The green mark in the lower

right window indicates the corresponding numeric value of

the RAM region.

By browsing through the different sets of frames

with the same value, the user might spot similarities

in the RAM. This allows for identifying the portions

of RAM encoding a particular feature. The mapping

of state of a region in the RAM and the value of an

attribute can then be used a feature.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

232

4.3 Feature Learning

In this section, we will present and discuss our choices

of feature classiﬁers and demonstrate how the toolkit

can be used to efﬁciently construct world models with

feature learning.

4.3.1 Methods

To be effective, classiﬁers do not only have to be pre-

cise, but also fast, since the games are being played

in real-time. We decided to use the following three

classiﬁers to cover a spectrum of possible advantages.

Random trees

perform well with the inclusion of

irrelevant features. Every decision tree gets a random

subset of features as its input. Features that are good

indicators will dominate the decision process in almost

any tree they are included in. Trees that lack these

features should create predictions that are essentially

randomly distributed. As a result, the ensuing majority

vote is heavily inﬂuenced by relevant features. This

property is very desirable in our application, since the

systems RAM is likely to only include a few features

that are truly decent indicators.

Support vector machines

deal very well with high-

dimensional data. This property can be observed

in some upper bounds of the classiﬁcation error for

SVMs. Equation 3 shows one such upper bound, dis-

cussed by Vapnik (Vapnik and Chapelle, 2000).

E p

l−1

error

≤ E(

lρ

) (3)

Here,

denotes the span of all support vectors (the

intersection of all sub-spaces containing the set of

support vectors),

is the diameter of the smallest

sphere containing all training points,

is the margin

and

is the size of the training set. Most importantly,

the upper bound is independent of the dimensionality

of the problem. To us, this property could matter, since

we are dealing with 128-dimensional data if we take

the full RAM content as our input.

Nearest neighbor classiﬁers

are suitable for prob-

lems that have arbitrarily irregular decision boundaries.

Instead of calculating an explicit model, they simply

vote on an input’s class among a few of the nearest ex-

isting data points. We use this method as an indicator

of how well explicit models can be formed, since we

have little information about the decision boundaries.

If the typically subpar nearest neighbor classiﬁer sur-

passes the more advanced methods, there are likely to

be problems that relate to the underlying models.

Figure 5: Feature learning procedure example: Visualizing

classiﬁcation results by a random forest on test data.

Our selection of the three classiﬁers described

above is also supported by a comparative study in

(Fernández-Delgado et al., 2014). Fernandez-Delgado

tested 179 classiﬁers on 121 datasets and found ran-

dom trees to be the overall best, closely followed by

support vector machines.

4.3.2 Sample Procedure

Similarly to the manual feature construction we also

give a brief example of feature learning. Just like

before, we can create a container for an attribute and

label game situations with a value. In our example, let

us consider the Space Invaders game again. This time,

we are interested in whether our agent is protected by

a shield. After collecting a set of training data, we can

select a learning method and preview the performance

of the same.

Figure 5 shows a situation in feature learning for

the ’protected’ attribute. The user can slide through

different frames and check whether or not the classi-

ﬁcation is correct. The green and red areas in the bar

below the game-screen indicate value false and true

for ’protected’, respectively.

4.4 Agent Design

We use a hierarchical structure to construct our agent.

In this structure, classiﬁers identify individual features,

which are then routed into higher-level classiﬁers. We

ﬁrst model a game world and then learn to play based

on the resulting features. Figure 6 illustrates the hi-

erarchy with the features used for the game Bowling.

The left side utilizes manually selected features (see

Section 4.2), while the right side uses feature learning

to identify the features classes.

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games

233

Figure 6: Bowling agent with feature learning structure. The

classiﬁers feature vector includes both manually selected

and learned features.

It is, of course, possible to create more classiﬁca-

tion layers, but because we were able to achieve very

good results with a comparably ﬂat structure there was

no need for a more complex hierarchy.

4.4.1 Different Feature Sets

In order to measure the effects of an improved world

model, we need to create an agent that uses it. In our

case, we have a clear focus on the quality of features

and therefore need an agent that is simple and transpar-

ent. Decision trees ﬁt perfectly, because their structure

can quickly be analyzed and carries obvious meaning,

since each node corresponds to a speciﬁc feature.

Our baseline agents use the entire RAM as their in-

put. The raw data is simply formatted into byte chunks

and then routed into the decision tree classiﬁer. Agents

based on manual selection reduce this full byte array to

only a few previously marked sections before routing

the input into the decision tree classiﬁer. For the fea-

ture learning approach, we apply the previously trained

feature classiﬁers to the entire RAM, outputting a sin-

gle byte that corresponds to the predicted class. This

byte is then added to the feature vector of manually

selected bytes.

5 EVALUATION

The evaluation comprises two main points. The ﬁrst

point is assessing the performance of the feature learn-

ing in ALE. We compare their scores and behavior

across several features in all tested games. The sec-

ond point is assessing the game playing performance

achieved by our approach. This includes comparisons

to both simple and state-of-the-art methods.

5.1 Learning Methods

In this section we compare the results of three classi-

ﬁers in the context of ALE feature learning. We also

Table 2: Feature learning results based on their

score.

’dnf’ marks training processes that did not ﬁnish. Random

trees delivers the best results across all features.

Space Invaders Bowling Private Eye

shot bottom target state object clue

RT 0.968 0.997 1.000 0.968 1.000 0.998

SVM dnf 0.993 1.000 0.963 0.999 0.991

NN 0.910 0.946 1.000 0.959 0.983 0.980

Figure 7: Visual comparison of classiﬁcation results. The

white lines indicate quickly changing classes. From top to

bottom: random trees, support vector machines, nearest

neighbor.

introduce the methodology that was used in training

and testing these classiﬁers.

5.1.1 Methodology

To measure the performance of each classiﬁer used

in feature and agent learning, both their precision and

recall are calculated. Since these measures never dif-

fered by more than one percent in our tests, we de-

cided to reduce them to their

score. The

score is

the weighted average of precision and recall. Cross-

validation is performed with randomly split training

and test data sets of equal size. A grid search with

a range of hand selected values is used to estimate

appropriate parameters for all classiﬁers.

Each feature is trained with around 2000 frames of

data and the resulting scores are averages of six runs

for each conﬁguration.

5.1.2 Results

Looking at the feature learning results in Table 2, the

clear winner by scores is the random trees classiﬁer. It

equals or surpasses the alternatives in all tests and de-

livers great results of above 96% precision and recall.

But the scores do not tell the entire picture. In

addition to achieving the highest scores, random trees

are also the fastest to train and the fastest to evalu-

ate. While the differences are usually marginal, they

grow signiﬁcantly with larger amounts of training data.

Especially support vector machines struggle in this

regard, as too much data causes excessive training

times (the process was stopped after three hours of

continuous computation). Since the scikit-learn imple-

mentation that is used in this paper is based on the very

popular LIBSVM (Chang and Lin, 2011), it is highly

unlikely to be the result of a coding error.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

234

A different problem arises with the nearest neigh-

bor classiﬁer. While the overall scores indicate that

its performance is stellar, the distribution of misclas-

siﬁcations is far more erratic. An example of this can

be seen in Figure 7, where red and blue indicate two

different classes and white represents high-frequency

class changes. While slight errors around the optimal

decision boundaries are likely inconsequential, stray

misclassiﬁcations could signiﬁcantly impact play per-

formance.

5.2 Play Performance

Towards the overall goal of creating knowledge-based

agents, the playing performance is important. In the

following section we introduce the methodology that

was used in testing the performance of our agents. We

then analyze the results for each of the three tested

games.

5.2.1 Methodology

The basis of all agents used in this paper are decision

trees that are trained with data collected by human

experts. In order to get a rough understanding of how

much training data is required to adequately train the

classiﬁer, we ran tests with an increasing number of

training frames. While around 3000 frames deliver

some of the best results, it became quite apparent

that this is due to a sort of beneﬁcial overﬁtting. The

agent is mimicking the experts example game very

closely. Increasing the number of frames to around

10000 yields worse results, but also shows ﬁrst signs of

abstraction. The amount we settled on, around 20000

frames, delivers considerably better results, while also

increasing the agents capability of handling new situa-

tions. Beyond that point gains were minimal.

Our underlying learning method differs from the,

usually image-based, reinforcement learning that is

used by most other researchers. This is why we de-

cided to measure three conﬁgurations that allow for

a self-contained comparison: The ﬁrst one simply di-

vides the RAM into 128 bytes and it is referred to

with full ram in the tables below. The second one uses

only manually selected parts of the RAM and it is re-

ferred to with manually. The third one adds learned

features to the manually selected ones and it is referred

to with learned. Because random trees dominate our

classiﬁer comparison, it is the only feature classiﬁer

we use from here on out. In addition to our own ref-

erence values, we include the closely related score of

Bellemare’s RAM approach (Bellemare et al., 2013)

(referred to with bellemare), as well as the highest

score we could ﬁnd in related papers (referred to with

state-of-the-art).

Table 3: Space Invaders

and game scores over ten

episodes.

full ram

0.59 0.59 0.61 0.56 0.63 0.61 0.63 0.64 0.61 0.60 0.607

310 260 160 180 445 130 255 70 230 115 215.5

manual selection

0.76 0.75 0.75 0.75 0.74 0.73 0.75 0.75 0.74 0.75 0.747

305 370 275 235 320 185 785 175 295 320 326.5

learned

0.70 0.73 0.74 0.73 0.73 0.72 0.72 0.73 0.72 0.73 0.725

210 555 230 90 510 285 505 315 230 430 336.0

Table 4: Space Invaders game score comparison.

full ram manual learned bellemare state-of-the-art

215.5 326.5 336.0 226.5 3974.0

Each of our own conﬁgurations is trained and then

run ten times per game. In addition to the scores re-

ported by ALE, we also include the

score for each

decision tree. The latter is however not necessarily

related to game performance, since the input data does

not represent perfect play.

5.2.2 Results

We now look at the resulting gameplay performance

for the three games Space Invaders, Bowling, and Pri-

vate Eye introduced earlier. For each game, we include

a short rationale for its inclusion.

Space Invaders

is included in our game selection

because it is a great example of the arcade-style gam-

ing that was dominant on the Atari 2600. It is mainly

reactionary and open ended. There are no compli-

cated objectives and the entire necessary information

is accessible at all times. Manually Selected Features

include the x position of the player, x/y for the enemies

and x/y for two shots, as well as the current number of

enemies. The two learned features recognize danger

from incoming shots and enemies reaching the bottom.

Table 3 shows that there are signiﬁcant beneﬁts to

manually selecting relevant features. Not only does

the classiﬁer achieve a much higher

score, but the

play performance also increases by around 50%. The

gains achieved by adding learned features are much

more modest, with only a marginal increase of around

3% in the game score. The observed play style is

however changed. The agent avoids enemies reaching

the bottom more effectively. It thereby manages to

exceed 400 points in four out of ten games, while

manual selection only does so in one.

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games

235

Table 5: Bowling F

and game scores over ten episodes.

full ram

0.94 0.93 0.93 0.93 0.94 0.93 0.94 0.93 0.94 0.94 0.935

dnf dnf 111 127 177 102 dnf dnf 99 84 116.7

manual selection

0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.870

80 80 80 80 80 80 80 80 80 80 80.0

learned

0.90 0.90 0.89 0.90 0.89 0.91 0.90 0.90 0.90 0.90 0.899

217 190 217 219 174 218 198 216 219 219 208.7

Table 6: Bowling game score comparison.

full ram manual learned bellemare state-of-the-art

116.7 80.0 208.7 29.3 69.0

As can be seen in Table 4, our best score exceeds

Bellemare’s approach by around 50%. However, it

falls very short of the current state-of-the-art.

Bowling

stands out among the available games in

ALE. Even though its premise and game mechanics

are incredibly simple, no agent has so far achieved

results that approach even a human layman. It is there-

fore interesting to see if human involvement in the

modeling and training stages of an agent can remedy

the underlying issues. The manual features that were

easily identiﬁed in the RAM are the ball’s x and the

player’s y position. Notably, the remaining pins proved

too hard to isolate and were therefore not manually

selected. To compensate, the learned features include

the current target alongside a state feature to identify

waiting periods.

The advantages of collecting training samples with

an expert are immediately visible in Bowling. The

full RAM approach beats Bellemare’s control value by

300% and even outclasses the state-of-the art approach,

IW(1) by Lipovetsky, by around 70%. However, the

agent tends to get stuck waiting and fails to ﬁnish the

game four out of ten times. The sparse manual selec-

tion fails to improve upon the full RAM input. Both

the

and the game score drop signiﬁcantly. Adding

learned features makes a world of difference. The

additional context awareness prevents waiting loops

and being able to identify the current target allows the

agent to play a near perfect games. While most throws

result in strikes, even the occasional spare is handled

gracefully. All of this results in scores that trump the

current best by more than 200%.

Looking at the comparison scores in Table 6, it is

unclear why an extremely simple game like Bowling

proves too hard for even the best planning algorithms.

One possible explanation might be the signiﬁcant delay

between input and scoring that results from the ball

Table 7: Private Eye F

and game scores over ten episodes.

full ram

0.93 0.93 0.93 0.92 0.92 0.92 0.92 0.93 0.92 0.92 0.924

-882 26113 3377 26183 3463 3536 -1000 4071 -342 13738 7825.7

manual selection

0.89 0.90 0.89 0.90 0.89 0.89 0.88 0.89 0.89 0.90 0.892

26352 49761 -1000 18692 10647 25449 25868 3300 -1000 4532 16260.1

learned

0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.850

96960 96356 96456 96456 96156 96056 96556 96456 96556 95760 96376.8

Table 8: Private Eye game score comparison.

full ram manual learned bellemare state-of-the-art

7825.7 16260.1 96376.8 111.9 2544.0

traveling down the lane.

Private Eye

is easily the most ambitious of the

selected games. Its large world is only partially ob-

servable at any given time and solving the changing

objectives requires many steps. Unsurprisingly, even

the best agents fail to achieve any notable progress.

Through our custom process of manual feature selec-

tion, we are able to isolate the player’s x position, the

current screen number and the id of the currently held

item, as well as the current time. The ﬁrst, and most

important, learned feature is the current ’objective’. It

encodes the target destination as a number from one to

seven. The second learned feature indicates whether

the current room still contains a ’clue’.

Table 7 shows a signiﬁcantly increased score due

to using expert training data. The initial score on the

entire RAM is 70 times as high as the comparison

value from Bellemare. It still even triples the current

state-of-the-art, 2BFS by Lipovetsky.

Manual selection doubles the score of our own full

RAM approach, but still manages to get lost on the

way. This is shown by the low -1000 scores in two of

the ten runs. In these instances, the agent gets stuck in

a dead end and is eventually terminated by the running

clock. The ’objective’ feature solves this problem,

with all runs resulting in a score of above 95000, or 37

times the current state-of-the-art (see Table 8). This

score is also very close to the theoretical maximum of

101600 and includes ﬁnding and returning all items.

Where Bowling has a somewhat noticeable delay

between action and reward, Private Eye separates these

events by seconds or even minutes. It is therefore

no surprise, that planning-based approaches fail to

achieve high marks. The same applies to reinforce-

ment learning, as the sparse rewards provide little guid-

ance as to what constitutes a beneﬁcial action. An ex-

pert’s domain knowledge solves these problems very

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

236















































































Figure 8: Simple Bowling agent. The structure automates

bowling a strike.

efﬁciently and allows for scores at a human level.

5.3 Further Investigation

In this section we discuss some additional work that

does not ﬁt within the main evaluation of the paper.

These are mainly extensions of our approaches and

could serve as starting points for future work.

5.3.1 Clustering

In addition to supervised feature learning we tested an

unsupervised approach based on a few popular clus-

tering methods (namely k-means, mean shift and DB-

SCAN). We did this to explore whether it would be

possible to fully automate constructing world models

within our toolkit. However, during our limited tests

we were not able to create meaningful features through

this process. Visualizing the results, we found, that

clustering tends to create equally sized classes that are

deﬁned by the temporal proximity of their members.

Because unsupervised feature learning was not a

core goal of this paper, we decided to stop our inves-

tigation at this point. Future research might focus on

advancing these preliminary tests by isolating notable

RAM sections.

5.3.2 Simpliﬁed Agents

One overarching idea behind this paper is to create

agents that require less computational resources by

leveraging additional domain knowledge. We tried

to push the boundaries of expert involvement within

our toolkit by selecting speciﬁc moments of perfect

play to create a minimal agent. The ideal candidate for

this approach is Bowling, because a ’strike’ is always

the best possible outcome and examples can easily be

identiﬁed. Other games, such as Space Invaders are a

lot more complicated, making it virtually impossible

to choose perfect examples.

Choosing only one example sequence of a perfect

throw created the decision tree shown in Figure 8. The

only two features that are used are the player’s x and

the ball’s y position. To give a point of comparison, the

full RAM agents in our main evaluation of Bowling

have around 300 nodes, while this simpliﬁed version

only has 13. In addition, the score is much improved.

The decision tree in Figure 8 achieves a score of 230,

surpassing even our previous best agents with a near-

perfect game.

While these results are interesting, they should be

treated carefully. This reduced agent is essentially

just a ’strike’ automation. It can only deal with ideal

conditions and has no capability of identifying and

managing varying situations. Nevertheless, it is inter-

esting to see, that a little additional domain knowledge

can create an extremely simple agent that performs

nearly perfectly.

6 CONCLUSION

In this paper we presented a toolkit for assisting de-

velopers of knowledge-based agents with feature en-

gineering and feature learning. Our application do-

main are arcade games where we took three exemplary

games from the Arcade Learning Environment. We

provide assistance in creating features by two mean.

For one, we assist in manually creating features by

helping to identify relevant portions of the RAM. For

another, we offer means to learn features from domain

expert knowledge. We then also help to create sim-

ple agents for the games using the developed features.

In our evaluation, we ﬁrst compare different learning

techniques for the feature learning. Then we assess the

game playing performance of different agents using

different (sub)sets of features available. We show that

while the knowledge-based approach cannot keep up

with state-of-the-art approaches using deep learning

in rather reactive kind of games it can clearly outper-

form those methods in games that have a large delay

between actions and their ﬁnal effect.

In our research we found that there are signiﬁcant

gains in all tested games when preselecting features.

Comparing the naive full RAM approach to a carefully

crafted world model with everything else being equal,

we saw scores increase by anywhere from 50% to

1100%. While these scores even surpass state-of-the-

art methods in two out of three games, the effect cannot

be entirely attributed to world modeling. The high-

quality training data recorded by an expert is clearly

also a notable factor, as even the naïve approach of-

ten outclasses the current best methods. Judging from

the results, our toolkit can quickly create world mod-

Assisted Feature Engineering and Feature Learning to Build Knowledge-based Agents for Arcade Games

237

els that meaningfully impact the overall game score.

While it lacks the generality of game independent ap-

proaches, it shows that minimal expert involvement

can enable even simple game playing agents to per-

form very well.

Because in this paper we focused on the feature

extraction, we investigated the beneﬁts on a limited

set of agents only. Future work might revolve around

applying our ﬁndings to a broader range of agents

and games. Another interesting addition could be to

further automate the manual feature selection. The

methods presented in this paper often reduce the pos-

sible byte candidates to just a handful and it might be

feasible to reduce the number even more. If nothing

else, this work should demonstrate that feature quality

can have a very meaningful impact in ALE. Taking

this into account, it would be interesting to investigate

how fully automatic dimensionality reduction methods

can inﬂuence game playing performance for different

agents.

Overall, we have shown that enabling descriptive

features to build knowledge-based agents is a very

promising route. It yields agents that are not only

comprehensible but that are also able to outperform

state-of-the-art solutions in difﬁcult situations.

REFERENCES

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M.

(2013). The arcade learning environment: An evalua-

tion platform for general agents. Journal of Artiﬁcial

Intelligence Research, 47:253–279.

Bishop, C. M. (2006). Pattern recognition and machine

learning. Springer.

Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A

training algorithm for optimal margin classiﬁers. In

Proceedings of the ﬁfth annual workshop on Computa-

tional learning theory, pages 144–152. ACM.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library

for support vector machines. ACM Trans. Intell. Syst.

Technol., 2(3):27:1–27:27.

Coates, A. and Ng, A. Y. (2012). Learning feature represen-

tations with k-means. In Neural Networks: Tricks of

the Trade, pages 561–580. Springer.

Fernández-Delgado, M., Cernadas, E., Barro, S., and

Amorim, D. (2014). Do we need hundreds of clas-

siﬁers to solve real world classiﬁcation problems? The

Journal of Machine Learning Research, 15(1):3133–

3181.

Fix, E. and Hodges Jr, J. L. (1951). Discriminatory analysis-

nonparametric discrimination: consistency properties.

Technical report, DTIC Document.

Friedman, J., Hastie, T., and Tibshirani, R. (2001). The

elements of statistical learning, volume 1. Springer

series in statistics Springer, Berlin.

Hausknecht, M., Khandelwal, P., Miikkulainen, R., and

Stone, P. (2012). Hyperneat-ggp: A hyperneat-based

atari general game player. In Proceedings of the four-

teenth international conference on Genetic and evolu-

tionary computation conference, pages 217–224.

Ho, T. K. (1998). The random subspace method for construct-

ing decision forests. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 20(8):832–844.

Lipovetzky, N., Ramirez, M., and Geffner, H. (2015). Classi-

cal planning with simulators: Results on the atari video

games. Proc. IJCAI 2015.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013).

Playing atari with deep reinforcement learning. NIPS

Deep Learning Workshop 2013.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-

ness, J., Bellemare, M. G., Graves, A., Riedmiller, M.,

Fidjeland, A. K., Ostrovski, G., Petersen, S., Beat-

tie, C., Sadik, A., Antonoglou, I., King, H., Kumaran,

D., Wierstra, D., Legg, S., and Hassabis, D. (2015).

Human-level control through deep reinforcement learn-

ing. Nature, 518(7540):529–533.

Naddaf, Y. (2010). Game-independent ai agents for playing

atari 2600 console games. University of Alberta.

Nair, A., Srinivasan, P., and Blackwell, S. (2015). Mas-

sively parallel methods for deep reinforcement learn-

ing. arXiv preprint arXiv:1507.04296.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,

Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:

Machine learning in python. The Journal of Machine

Learning Research, 12:2825–2830.

Sculley, D. et al. (2011). Results from a semi-supervised

feature learning competition. NIPS 2011 Workshop on

Deep Learning and Unsupervised Feature Learning.

Vapnik, V. and Chapelle, O. (2000). Bounds on error expecta-

tion for support vector machines. Neural computation,

12(9):2013–2036.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

238