EMBEDDED INCREMENTAL FEATURE SELECTION

FOR REINFORCEMENT LEARNING

Robert Wright

, Steven Loscalzo

1,2

and Lei Yu

Information Directorate, Air Force Research Lab, Rome, NY, U.S.A

Department of Computer Science, Binghamton University, Binghamton, NY, U.S.A.

Keywords:

Reinforcement learning, Feature selection, Neuroevolution.

Abstract:

Classical reinforcement learning techniques become impractical in domains with large complex state spaces.

The size of a domain’s state space is dominated by the number of features used to describe the state. Fortu-

nately, in many real-world environments learning an effective policy does not usually require all the provided

features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental

Feature Selection Embedded in NEAT (IFSE-NEAT) that incorporates sequential forward search into neu-

roevolutionary algorithm NEAT. We provide an empirical analysis on a realistic simulated domain with many

irrelevant and relevant features. Our results demonstrate that IFSE-NEAT selects smaller and more effective

feature sets than alternative approaches, NEAT and FS-NEAT, and superior performance characteristics as the

number of available features increases.

1 INTRODUCTION

Reinforcement learning (RL) is a popular technique

for many agent-related problems such as robot con-

trol, game playing, and system optimization (Sutton

and Barto, 1998). In such problems, an agent strives

to learn a policy for selecting actions based on its

current state, by performing actions and receiving re-

wards for the actions it tries. The most common ap-

proach to solving an RL problem is to learn a value

function which indicates, for a particular policy, the

long-term expected reward value of a given state or

state-action pair. In domains with large or inﬁnite

state spaces, it is infeasible to represent the value

function explicitly. Instead, a common strategy is

to approximately represent the value function using

some parameterized class of functions. Many func-

tion approximation methods such as radial basis func-

tions (Keller et al., 2006), adaptive tile coding (White-

son and Stone, 2007), and neural networks (Whiteson

and Stone, 2006) have proven successful on problems

with a very small number of state features.

In many real-world problems, the number of fea-

tures with which to describe the states of the environ-

ment can be quite large. When the number of features

is large, existing function approximation methods are

not only computationally prohibitive, but also prone

to over-ﬁtting due to the so-called “curse of dimen-

sionality” - the size of the state space grows exponen-

tially as the number of state features increases. For-

tunately, among many possible features, a large por-

tion of them may be irrelevant or redundant regard-

ing learning a policy. Determining which subset of

features should be used in order to generate the best

performance for the learning task is challenging for

either a domain expert or a learning agent.

This work is about automated feature selection for

RL. Although feature selection has been extensively

studied for supervised learning (Guyon and Elisseeff,

2003; Liu and Yu, 2005), existing methods are either

inapplicable or impractical in the RL setting. Filter

methods rely on training data, which is not available

in RL, to select features. Wrapper methods require re-

peatedly executing a learning algorithm on each can-

didate feature subset, and are impractical for RL due

to their prohibitively high computational and sample

cost. A promising approach is to embed feature selec-

tion into the training process of a learning algorithm.

However, the embedded approach has to be tailored

for the learning algorithm of interest.

In this work, we propose an embedded incremen-

tal feature selection algorithm for a neuroevolutionary

function approximation algorithm NEAT (NeuroEvo-

lution of Augmenting Topologies) (Stanley and Mi-

ikkulainen, 2002), which we call IFSE-NEAT. The

main idea of IFSE-NEAT is to embed incremental

263

Wright R., Loscalzo S. and Yu L..

EMBEDDED INCREMENTAL FEATURE SELECTION FOR REINFORCEMENT LEARNING.

DOI: 10.5220/0003153402630268

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 263-268

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

subset selection into the neuroevolutionary process of

NEAT. Instead of evolving networks with the full set

of features as NEAT does, IFSE-NEAT initializes net-

works with one feature. IFSE-NEAT then iteratively

adds features to the current best network that con-

tributes most to its performance improvement while

evolving the weights and topology of that network.

Prior to this work, feature selection for reinforce-

ment learning has focused on linear value function ap-

proximation (Kolter and Ng, 2009; Parr et al., 2008)

and model-based RL algorithms (Kroon and White-

son, 2009). For neuroevolution algorithms such as

NEAT, only random search has been explored (White-

son et al., 2005). In this light we can see that IFSE-

NEAT is a novel approach in feature selection for RL.

Our experimental study has shownseveral promis-

ing results for IFSE-NEAT.We ﬁnd that the algorithm

is nearly unaffected in its ability to select relevant fea-

tures as the number of irrelevant features grows very

large. This, in turn, allows for a better policy to be

derived than NEAT and FS-NEAT. Additionally, by

using only a few relevant features we are able to learn

a good policy while limiting model complexity.

The rest of the paper is organized as follows: Sec-

tion 2 introduces the NEAT algorithm and the pro-

posed IFSE-NEAT algorithm. Section 3 describes the

experimental setup. Section 4 presents and discusses

experimental results. Finally, Section 5 concludes this

work and identiﬁes some future research directions.

2 APPROACH

2.1 NEAT

Neural networks (NNs) are efﬁcient function approx-

imators that can model complex functions to an arbi-

trary accuracy. The drawbacks of using NNs in RL

domains have been that NN design was a difﬁcult

manual process and training was a supervised learn-

ing process. Neuroevolutionary approaches, which

utilize genetic algorithms to automate the process of

training and/or designing NNs, eliminate these draw-

backs allowing NNs to be easily applied to RL do-

mains. NeuroEvolution of Augmenting Topologies

(NEAT) is a novel RL framework based on neuroevo-

lution. By evolving both the network topology and

weights of the connections between network nodes,

NEAT solved typical RL benchmark problems sev-

eral times faster than competing RL algorithms with

signiﬁcantly less system resources (Stanley and Mi-

ikkulainen, 2002).

However, one limiting issue with NEAT is that it

assumes that all features provided by the environment

are relevant and necessary, and attempts to incorpo-

rate all the features into its solution networks. The

extraneous features will unnecessarily complicate the

networks and severely slow the rate at which NEAT is

able to derive an effectivepolicy. In the following sec-

tion we describe a new algorithm based upon NEAT

that builds a small set of required features while learn-

ing an effective policy.

2.2 Incremental Feature Selection

Embedded in NEAT

To deal with the exponential search space, we adopt

sequential forward search (SFS), an efﬁcient search

strategy which has proven effective in ﬁnding near-

optimal subsets in supervised feature selection. Start-

ing from an empty set, SFS iteratively adds one fea-

ture at a time to the current best set until a desired

number of features k are selected. Since in each of

the k iterations, it goes through all N features outside

of the current best set, the time complexity of SFS is

O(kN). Although SFS does not guarantee the optimal

solution, it is capable of selecting relevant features

while keeping irrelevant or redundant features out of

the ﬁnal subset. The method is particularly suitable

for high-dimensional problems where a large portion

of the features are irrelevant or redundant.

Algorithm 1 provides a basic overview about how

IFSE-NEAT functions and is able to select a mini-

mal set of features. IFSE-NEAT incrementally adds

features to a NN that we call the BACKBONE. The

BACKBONE network utilizes the best discovered fea-

ture set and represents the current best derived policy.

It is persistent through additions of new features to

the feature set and it is what makes IFSE-NEAT an

embedded algorithm as opposed to a straightforward

wrapper algorithm.

Initially, the BACKBONE network consists of

only the output nodes (line 5). Then, for each of

the individual features available, F

, a NN is gener-

ated by connecting a single input node to every output

node (line 13). In parallel, or independently, a popu-

lation of networks based upon this single-input base

network is generated. Each network in the popula-

tion share the topology of the base network, but have

randomly generated weights on the edges joining the

nodes. The population of NNs are then evolved via

the standard NEAT algorithm for L generations (lines

17-19). At the end of the NEAT process, the cham-

pion of each population (the network representing the

best policy) is identiﬁed. The champions (each corre-

sponding to a candidate feature F

) are then compared

against one another to decide the BEST NETWORK

and BEST FEATURE (lines 22-25). It is our hypoth-

esis that the best performing network, BEST NET-

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

264

Algorithm 1: IFSE-NEAT(N,k,L,p).

1: //N: set of all available features

2: //k: number of features to select

3: //L: number of generations to evolve

4: //p: population size

5: BACKBONE ← outputNodes //initialize the BACKBONE

6: SELECTED SET ← null //initialize the selected feature set

7: for i ← 1 : k do

8: BEST NETWORK ← null

9: BEST FEATURE ← null

10: //iterate through all candidate features outside SELECTED SET

11: for q ← 1 : N − i do

12: //create new network N

based on candidate feature F

13: N

← COMBINE( F

, BACKBONE )

14: //create a population of p networks based upon N

15: population ← INITIALIZE-POPULATION(N

, p)

16: //evolve population using NEAT for L generations

17: for j ← 1 : L do

18: NEAT-EVOLVE(population)

19: end for

20: //select the champion from population

21: champion ← BEST-QUALITY(population)

22: if champion > BEST NETWORK then

23: BEST FEATURE ← F

24: BEST NETWORK ← champion

25: end if

26: end for

27: Add BEST FEATURE to SELECTED SET

28: BACKBONE ← BEST NETWORK

29: end for

WORK, will point to the most relevant fea-

ture. Therefore, the BEST FEATURE that

produced the BEST NETWORK is then added

to the SELECTED SET (line 27), and the

BEST NETWORK becomes the BACKBONE

(line 28) for subsequent iterations where the algo-

rithm will determine the next features to add to the

feature set.

In the subsequent iterations the remaining features

are independently combined with the BACKBONE

network and then re-evaluated. As in the ﬁrst feature

selection iteration, new populations of NNs, random

variations of the base networks, are again evolved by

NEAT for L generations. The algorithm stops once a

desired number of features are selected. Alternatively,

the algorithm can stop when one of the populations

produces a network that represents a suitable solution

to the problem.

The process for combining the BACKBONE net-

work, COMBINE(F

,BACKBONE) (line 13), is illus-

trated in Figure 1. In this process a new base network

is created for a candidate feature F

by connecting

to each of the output nodes. The weights of these

new edges are assigned zero to preserve the policy

of the BACKBONE network. Preserving the policy

of the BACKBONE network bootstraps the successive

networks and improves IFSE-NEAT’s ability to deter-

Figure 1: This ﬁgure illustrates how a candidate feature is

incorporated into the current BACKBONE network to create

a new base candidate network, N

. The new feature F

introduced to the network and is provided connections, the

dashed lines, to every output node. BACKBONE in this

ﬁgure represents the best evolved solution network using

only the previously selected features.

mine the relevance of potential new features.

Analysis of Algorithm 1 shows that time complex-

ity of IFSE-NEAT is O(kN) times the NEAT process,

which itself is dependent on the population size p

and the number of generations L. In practice, how-

ever, we can do better than this. For the ﬁrst few se-

lected features, L can be very small and the algorithm

can still identify relevant features, allowing a signiﬁ-

cant speedup to roughly O(N) times NEAT. Once the

BACKBONE network is reasonably ﬁt, L must be in-

creased to allow new features enough time to have an

impact in the more complex network.

3 EXPERIMENTAL SETUP

We analyze the performance of our IFSE-NEAT al-

gorithm from two perspectives: (i) the quality of the

derived policy, and (ii) the ability of the algorithm to

select relevant features. We measure the quality of the

derived policy by a problem-speciﬁc ﬁtness function.

The composition of the selected subset in terms of

the fraction of relevant features among selected ones

quantiﬁes an algorithm’s ability to select a good fea-

ture subset. Finally, we verify that the performance of

our algorithm (measured by the above metrics) does

not degrade as the number of irrelevant features in-

creases.

We compare IFSE-NEAT to the baseline NEAT as

well as FS-NEAT, a competing feature selection algo-

rithm we describe in Section 3.1. All three algorithms

are evaluated in a challenging race track domain that

is capable of providing many relevant and irrelevant

EMBEDDED INCREMENTAL FEATURE SELECTION FOR REINFORCEMENT LEARNING

265

features for the algorithms to work with. The details

of this environment as well as the speciﬁc parameters

used by the algorithms are given in Section 3.2.

3.1 FS-NEAT

Feature Selective NEAT, or FS-NEAT, is an em-

bedded feature selection algorithm within the NEAT

framework (Whiteson et al., 2005). One limiting as-

sumption standard NEAT makes, discussed in Sec-

tion 2.1, is that all input features are relevant and

are fully incorporated into all solution networks. FS-

NEAT assumes that few features are actually relevant.

Networks are initialized with only a single connection

between a randomly selected pair of input and out-

put nodes. Through subsequent mutations other input

nodes may add a connection to the rest of the network

and hence be selected into the model.

3.2 RARS

We conducted our experimental analysis using ver-

sion 0.91 2 of the Robot Auto Racing Simulator

(RARS)

. RARS provides a detailed physical simu-

lation of a racetrack and vehicles and allows users to

deﬁne their own artiﬁcial agents to control the racers.

The goal of the simulation is to learn a path around

the track that covers the most distance in a limited

time while minimizing damage received by the car.

Damage is calculated by RARS based on the amount

of time the car spends off the track. The racers are

controlled by supplying a desired speed and direction

at every time step in the simulation.

Table 1: Different problems used in the RARS experiments

broken down by the number of relevant, irrelevant, and total

number of features.

Relevant Irrelevant Total

5 5 10

5 25 30

5 45 50

5 95 100

We implemented a rangeﬁnder system in the sim-

ulation to provide vehicle position information to the

learning algorithm as in Figure 2. In our experiments

we placed N range sensors evenly around the front of

the car as in (Whiteson et al., 2005), starting from the

left side of the car and ﬁnishing at the right to provide

a full view of the track. The range ﬁnders, together

with the velocity of the car, are used by the learner to

http://rars.sourceforge.net/

provide two continuous control outputs, correspond-

ing to the desired speed and direction of the car.

To make the RARS environment challenging from

a feature selection point of view we added irrelevant

features to the set. Irrelevant features simply return a

random value in [-1,1]. We developed several chal-

lenging problems with different combinations of rel-

evant and irrelevant features as shown in Table 1.

These combinations allow us to examine the robust-

ness of each of the three algorithms in comparison

w.r.t. increasing numbers of irrelevant features.

All three algorithms tested are neuroevolution-

ary algorithms that require a ﬁtness function to pro-

vide the feedback that guides learning. We adopt

the ﬁtness function used by (Whiteson et al., 2005),

S = 2d − r, where d is the distance the car has trav-

eled from the start and r is the amount of damage re-

ceived. Trials end after the learner either has observed

2000 time steps or the car registers too much damage.

All experiments took place on the clkwis.trk track

that is bundled in the RARS package, shown in Fig-

ure 3. This track was selected because it exhibits sev-

eral driving scenarios such as straightaways, turns and

an S-curve. The experiments were conducted in the

RARS environment according to the following setup.

Figure 2: Rangeﬁnders (dashed lines) extending outwards

from the vehicle to the edges of the track (solid lines) en-

code vehicle location information in this environment.

• Three algorithms were tested, NEAT, FS-NEAT,

and IFSE-NEAT

• For each tested combination of features 10 runs

were conducted with each algorithm, results pre-

sented are the average of these 10 runs.

• Each run lasted 200 generations

• IFSE-NEAT split the 200 generations into

ﬁve L(·) periods with L(1) = 3, L(2) =

7, L(3) = 20, L(4) = 50, L(5) = 120

• The NEAT population size was set to 100

We set the number of generationsallowed to 200 since

the algorithms appeared to converge by that point and

there was no need to carry the experiment further. The

particular values of the L(·) function are not impor-

tant, and we experimented with other values which

yielded similar results.

All three algorithms in comparison rely on NEAT

for generation of the neural networks to allow learn-

ing. In our experiments we make use of Another

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

266

NEAT Java Implementation (ANJI) for the NEAT al-

gorithm (James and Tucker, 2004). We followed the

settings given in (Whiteson et al., 2005) of 0.10 and

0.02 for add-connection and add-neuron respectively

to set the parameters for the FS-NEAT algorithm. For

NEAT, and IFSE-NEAT we set the add-connection

mutation probability to 0.02 and the add-neuron mu-

tation to 0.01. In our experiments we found the pa-

rameters used with FS-NEAT to be too aggressive for

NEAT and IFSE-NEAT.

Figure 3: A top-down view of the clkwis track used in the

experiments.

4 RESULTS

Figure 4a shows the results of running the three algo-

rithms for the problem with 5 relevant features and

45 irrelevant features as the algorithms search for

the optimal policy. Fitness of the derived policies is

measured in terms of the value of the ﬁtness func-

tion deﬁned in Section 3.2. We can see that IFSE-

NEAT converges to the best ﬁtness of the three algo-

rithms, and this convergence occurs at around gener-

ation 50. Both NEAT and FS-NEAT exhibit a slower

rate of convergence than IFSE-NEAT. In this situa-

tion, NEAT is limited by the need to assign correct

weights to many features. Since all available fea-

tures are used in the NEAT neural network, NEAT has

to evolve through many generations to ﬁnd the right

weights for links associated with the relevant features

while keeping weights for irrelevant features low in

order to limit their impact on the network output. The

FS-NEAT algorithm suffers from its random search

policy. Since there are many irrelevant features in the

problem, they have a higher chance of being included

in the network than a relevant feature does, causing

the algorithm to be slow to learn an effective policy.

In Figure 4b we see the composition of the se-

lected subsets by the three algorithms. IFSE-NEAT

clearly has the highest percentage of relevant features

per selected group, at around 90% on average. This

number begins at 100% for 1 selected feature and

slowly decreases as features are added to the set. Fig-

ure 4a shows that IFSE-NEAT achieves optimal ﬁt-

ness early, and then even relevant features do not ap-

pear helpful, causing some irrelevant features to be

incorrectly selected in some of the 10 runs of the al-

gorithm. FS-NEAT slowly adds new features to the

set, many of which are irrelevant, causing low scores

in both measures. It should be noted that IFSE-NEAT

and FS-NEAT select around 5 features by the 200

generation in all tested settings.

(a)

5000

10000

15000

21000

0 50 100 150 200

Fitness

Generations

NEAT

FS-NEAT

IFSE-NEAT

(b)

100

0 50 100 150 200

Relevant/Selected Features (%)

Generations

NEAT

FS-NEAT

IFSE-NEAT

Figure 4: Two performance metrics: ﬁtness (a) and the frac-

tion of relevant features among the selected ones (b), for

NEAT, FS-NEAT, and IFSE-NEAT across 200 generations

on the problem with 5 relevant features and 45 irrelevant

features.

We now further study how the three algorithms

scale with an increasing number of irrelevant fea-

tures. Figure 5b shows the fraction of relevant fea-

tures among the selected ones by each algorithm. We

can see that for each of the problems, IFSE-NEAT se-

lects on average at least four relevant features in ﬁve

feature selection steps. This validates our feature

ranking and selection criteria, and supports the con-

sistently good ﬁtness values seen in Figure 5a. As

predicted, NEAT’s ﬁtness degrades as the number of

irrelevant features increases and the fraction of rele-

vant features decreases. It always includes all the ir-

relevant features, which increases the complexity of

the networks and slows down learning. FS-NEAT’s

ﬁtness shows a variable trend caused by the random

selection mechanism. Despite starting with more ir-

relevant features in the problem with 50 features, the

ﬁtness of the ﬁnal policy actually improved over the

problems with 10 and 25 features, as shown in Fig-

EMBEDDED INCREMENTAL FEATURE SELECTION FOR REINFORCEMENT LEARNING

267

(a)

5000

10000

15000

21000

10 30 50 100

Fitness

Available Features

NEAT

FS-NEAT

IFSE-NEAT

(b)

100

10 30 50 100

Relevant/Selected Features (%)

Available Features

NEAT

FS-NEAT

IFSE-NEAT

Figure 5: Two performance metrics: ﬁtness (a) and the frac-

tion of relevant features among the selected ones (b), for

NEAT, FS-NEAT, and IFSE-NEAT at the 200th generation

across 4 problems with 5 relevant features and 5, 25, 45,

and 95 irrelevant features.

ure 5a. This is most likely the result of the net-

work weights being randomly improved by chance

and more trials should remove this effect.

5 CONCLUSIONS AND FUTURE

WORK

In this work, we have developed an embedded fea-

ture selection algorithm which incorporates a sequen-

tial forward search into the neuroevolutionary func-

tion approximation method NEAT for reinforcement

learning. Our results demonstrate the effectiveness of

IFSE-NEAT at identifying relevant features and elim-

inating irrelevant ones. This ability enables IFSE-

NEAT to converge upon higher quality policies us-

ing simpler networks in fewer generations than either

NEAT or FS-NEAT.

These contirbutions do come at a cost. Although

IFSE-NEAT is more efﬁcient than wrapper methods,

the incremental search for relevant features adds sig-

niﬁcant computational cost when compared to the

other NEAT variants. Possible future directions in-

clude investigatingthe parallelization of the algorithm

to help mitigatethis cost, and further study on the gen-

eralization ability of the simple NN solutions found

by IFSE-NEAT.

REFERENCES

Guyon, I. and Elisseeff, A. (2003). An introduction to vari-

able and feature selection. Journal of Machine Learn-

ing Research, 3:1157–1182.

James, D. and Tucker, P. (2004). A comparative analysis of

simpliﬁcation and complexiﬁcation in the evolution of

neural network topologies. In Proceedings of the 2004

Conference on Genetic and Evoluationary Computa-

tion (GECCO-04). Springer.

Keller, P., Mannor, S., and Precup, D. (2006). Automatic

basis function construction for approximate dynamic

programming and reinforcement learning. In Proceed-

ings of the the 23rd International Conference on Ma-

chine Learning, pages 449–456.

Kolter, J. Z. and Ng, A. Y. (2009). Regularization and

feature selection in least-squares temporal difference

learning. In Proceedings of the 26th International

Conference on Machine Learning (ICML09), pages

521–528.

Kroon, M. and Whiteson, S. (2009). Automatic feature se-

lection for model-based reinforcement learning in fac-

tored mdps. In Proceedings of the International Con-

ference on Machine Learning and Applications, pages

324–330.

Liu, H. and Yu, L. (2005). Toward integrating feature

selection algorithms for classiﬁcation and clustering.

IEEE Transactions on Knowledge and Data Engineer-

ing (TKDE), 17(4):491–502.

Parr, R., Li, L., Taylor, G., Painter-Wakeﬁeld, C., and

Littman, M. L. (2008). An analysis of linear models,

linear value-function approximation, and feature se-

lection for reinforcement learning. In Proceedings of

the 25th International Conference on Machine Learn-

ing (ICML08), pages 752–759.

Stanley, K. O. and Miikkulainen, R. (2002). Efﬁcient rein-

forcement learning through evolving neural network

topologies. In Proceedings of the Genetic and Evolu-

tionary Computation Conference (GECCO 02), pages

569–577.

Sutton, R. and Barto, A. (1998). Reinforcement learning:

an introduction. MIT Press.

Whiteson, S. Taylor, M. E. and Stone, P. (2007). Adaptive

tile coding for value function approximation. Techni-

cal report, University of Texas at Austin.

Whiteson, S. and Stone, P. (2006). Evolutionary function

approximation for reinforcement learning. Journal of

Machine Learing Research, 7:877–917.

Whiteson, S., Stone, P., and Stanley, K. O. (2005). Au-

tomatic feature selection in neuroevolution. In Pro-

ceedings of the Genetic and Evolutionary Computa-

tion Conference (GECCO 05), pages 1225–1232.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

268