Strategy Tree Construction and Optimization with Genetic Programming

Chi Xu

, Jianxiong Qiao

and Na Jia

Department of Computer Science, North China University of Technology, 5 Jinyuanzhuang Street, Beijing, China

Institute of Science, North China University of Technology, 5 Jinyuanzhuang Street, Beijing, China

Keywords:

Artiﬁcial Intelligence, Evolutionary Algorithm, Machine Learning, Regressive Decision Rule.

Abstract:

We applied genetic programming (GP) to search for a strategy in a technical analysis (TA) indicator candidate

pool for stock market trading and optimized it through historical data. The method provides decision rule

optimization scheme to deal with problems in the real trading in ﬁnancial market, and it optimizes strategies

in relatively complicated contents. GP is used to construct the condition in decision rule with different logical

operations. The method has been applied to the optimization of investment strategies with good return results

in simulation experiments.

1 INTRODUCTION

1.1 Background

A reasonable strategy for trading in ﬁnancial markets

is one of the most important topics for researchers to

work with, but it is always difﬁcult to implement the

essential characteristics or complicated contents in a

strategy. In the research area of machine learning, de-

cision tree is used to take in the condition input and

feed to the system for decision making. The complex-

ity of any regressor depends on the number of inputs.

Although a decision rule has the advantages, like:

• being simple to understand and interpret,

• requiring little data preparation,

• handling both numerical and categorical data,

• explaining well problems through boolean logic,

and also

• being robust to validate models using statistical

tests while handling large data with little time con-

sumption.

it still has limitations on problems for optimization

which has to be NP-complete under several aspect of

optimality and even for simple concepts (Hyaﬁl and

Rivest, 1976; Murthy, 1998). Hence, the optimization

of a strategy in a practical problem is not extremely

easy to achieve. The complexity of any regressor de-

pends on the number of inputs, and it determines both

the time and space complexity and the necessary num-

ber of training examples to train such a regressor (Al-

paydin, 2010).

In this paper, we applied genetic programming

to optimize the regressive decision rule like strate-

gies to deal with real trading problems. In the

proposed method, each ”individual” of an evolv-

ing population encodes a candidate strategy to the

given problem, and the individual is evaluated by

a problem/application-oriented ﬁtness function based

on natural selection of survival and reproduction of

the ﬁttest individuals. GP forms a tree structure of

the strategy with boolean logic operators on the inner

nodes.

1.2 Decision Rule Optimization

The optimization of decision rules are difﬁcult, since

over-complex trees may not generalize the data prop-

erly, which is called overﬁtting. In addition, a de-

cision rule cannot express well some conceptual in-

formation. Many approaches have been adopted to

improve the tree structure and performances. (Bloc-

keel and Struyf, 2002) proposed an efﬁcient cross-

validation algorithm to reduce the overhead in the

induction process of a logic programming tree, and

conducted various experiments on different data sets

to evaluate the optimization performance. (Bennett,

1994) proposed a non-greedy decision rule algorithm

to construct decision rules and to update existing de-

cision rules. A global tree optimization is used to

explicitly consider all decisions in the tree concur-

rently. (Suarez and Lutsko, 1999) proposed a fuzzy

425

Xu C., Qiao J. and Jia N..

Strategy Tree Construction and Optimization with Genetic Programming.

DOI: 10.5220/0004201104250428

In Proceedings of the 5th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2013), pages 425-428

ISBN: 978-989-8565-39-6

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

decision rule which transforms the tree into a pow-

erful functional approximation while remaining eas-

ily interpretable, and a global optimization algorithm

ﬁxes the parameters of the fuzzy splits. (Mookerjee

and Mannino, 1997) introduced a sequential decision

model to optimize an expert system when the cost or

time to collect inputs is signiﬁcant and inputs are not

known until the system operates. (Liang et al., 2010)

used a decision rule to handle uncertain concepts, so

the dynamic data stream with uncertain numerical at-

tributes can be classiﬁed efﬁciently.

The paper is organized as follow. We provide a

survey on related works in Section II. In Section III,

our proposed a model is discussed. In Section IV, we

provide two practical scenarios, investing in ﬁnancial

market and gaming in chess competition, as the test

beds for the performance evaluation of the proposed

method. In Section V, we draw a conclusion and make

some suggestions on our future work.

2 METHODOLOGY

In stock market, people usually use technical analysis

(TA) indicators to analyze the trend of market.In this

paper, we mainly use indicators including MA, CCI,

RSI, KDJ and MACD to build strategy trees, and to

evaluate ﬁtness values of each strategy tree.

2.1 System Architecture

Our system consists of a GP Engine, a control core

and a market data input module. The system architec-

ture is given in ﬁgure 1.

GP controls

parameter

Market data

controls

parameter

System

Controls kernel

GP Engine Market data

Optimized

strategies

Figure 1: System architecture.

In our system, the function of control kernel is

to control whole system’s working, the parameters

such as population size, crossover rate, mutation rate

and period are deﬁned by users before the system

gets launched. After users submitted control parame-

ters, the control kernel transmits control parameters to

GP engine for operation. Besides, it receives market

data from market data module, and users can choose

whether to train several strategy trees using sample

data, or use some speciﬁc strategy trees to perform

out-of-sample testing using unseen data or live data.

In training, the control kernel calculates ﬁtness value

of each individual in population. After calculated ﬁt-

ness, it determines whether to do more operation or

to terminate the system. In testing, the control kernel

also calculates the ﬁtness values, then it summarizes

the results of calculation and draws some diagram to

show strategy trees’ performance.

In the system, the outcomes of each generation of

population, and the program running condition are re-

coded in the ’System log’, for the information of ev-

ery aspects about system running. People can use it

to do more analyze or collect good individuals.

2.2 Strategy Trees

In our system, strategy trees should be constructed by

allowing the GP engine to combine several technical

indicator-based rules (see the appendix) with boolean

operators, AND, OR and XOR. According to the cor-

responding rule, each indicator can be evaluated as

two values as TRUE and FALSE. Once a strategy tree

has been constructed, it can represent a trading rule,

and the tree also can be evaluated as TRUE and other-

wise FALSE. For example, a strategy tree is given in

ﬁgure 2. In this tree, it represents the rule of the form:

• IF RULE RSI IS TRUE OR RULE CCI IS FALSE

XOR (RULE MACD IS TRUE AND RULE KDJ

IS TRUE) THEN OPERATION

Where RULE RSI is TRUE could be, for ex-

ample, the value of RSI is above 70, then it stand

for overbought, RULE RSI would. If the value of

RSI is below 30, then it stand for oversold, therefore

RULE RSI would be FALSE. Similarly, RULE CCI

and RULE MACD should meet their conditions using

similar judgement methods.

XOR

OR AND

RSI=TRUE CCI=FALSE MACD=TRUE KDJ=TRUE

Figure 2: Strategy tree.

Each individual in population consists of two

strategy trees, one is buy tree, while another one is

sell tree. Evaluating the tree’s value, when buy tree’s

value is true, it emits buy signal, the system will per-

form buy operation. While sell tree’s value is true, it

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

426

emits sell signal, the system will perform sell opera-

tion. a individual is given in ﬁgure 3

In the individual, the buy strategy can be ex-

pressed as table 1, and the sell strategy can be ex-

pressed as table 2. From the table 1 we can see that the

buy rule is ”BUY IF CCI=FALSE OR MACD=TRUE

AND RSI=TRUE”. While from the table 2, we can

see that the sell rule is ”SELL IF RSI=TRUE OR

CCI=FALSE AND KDJ=TRUE”. According to these

two rules, the system can perform it on real data sets

and then calculate the ﬁtness value and the stability of

the individual.

Table 1: Buy tree expression.

CCI CONNECTOR MACD CONNECTOR RSI

TRUE OR TRUE AND FALSE

Table 2: Sell tree expression.

RSI CONNECTOR CCI CONNECTOR KDJ

TRUE OR FALSE XOR TRUE

XOR

RSI=TRUE CCI=FALSE

KDJ=TRUE

AND

OR RSI=FALSE

CCI=TRUE MACD=TRUE

BUY

SELL

Figure 3: Individual form.

3 RESULTS

In this section, we choose 5 stocks randomly, and use

50% of data to select a strategy, while the left 50%

data to test strategies’ performance. By applying this

process a strategy is tested whether it ﬁts for the mar-

ket sets in other periods.

The primary parameters as table 3:

Table 3: GP Parameters.

parameter value

population 100

max iteration 15

max tree depth 5

regeneration rate 0.05

crossover rate 0.75

initial fund $10000

In the ﬁrst sample, the market data is from

Patterson-UTI Energy Inc. We use the data from

1993-11-02 to 2003-01-07, to train strategies. The

training result is shown in ﬁgure 4.

Figure 4: Training result.

According to the ﬁgure 4, during the process of

training, the optimal strategies’ ﬁtness value is in-

creased with iterations’ increasing. In other words,

with the increasing of iterations, the optimal strate-

gies can perform better and get more returns. This re-

sult shows that the system is effective and more useful

strategies have been found.

After training, we use the data from 2003-01-07 to

2012-04-17 to test strategies which was found in the

training process. The testing result is shown in ﬁgure

Figure 5: Testing result.

According to the ﬁgure 5, we can conclude that,

Every generation of the best strategies’ performance

is different, but comparing with the BUY-and-HOLD

strategy’s -43.36% of yield, the best strategy in the

system can reach 90.34% . This evidence shows that

in this stock data sets, the strategies trained in the sys-

tem have better performance than those of BUY-and-

HOLD strategy, it can work at same stock but differ-

ent period of data sets.

StrategyTreeConstructionandOptimizationwithGeneticProgramming

427

In the below, we choose more stocks to testing the

performance of system, the result is given as table 4.

Table 4: Testing of one stock but different period.

Stock name Training Testing Best return Buy-and-Hold

AMD 1983-03-21˜ 2000-03-01˜ 147.53% -84.22%

2000-03-01 2012-05-06

Dell Inc. 1988-08-17˜ 2000-01-20˜ -28.36% -65.66%

2000-01-20 2012-05-16

HP 1987-11-05˜ 2000-11-22˜ 37.71% 35.21%

2000-11-22 2012-05-16

FORD 1977-01-03˜ 1990-08-14˜ 215.18% -71.92%

1990-08-14 2012-05-25

INTC 1986-07-09˜ 2000-04-12˜ -46.62% -78.25%

2000-04-12 2012-05-16

The table 4 shows that, in this section of testing, in

most cases, the strategies which generated by the op-

timization system have better performance than those

of buy-and-holdstrategy. Besides, the performance of

the strategies can keep at a stable level relatively.

4 CONCLUSIONS

GP is applied to automatically produce various trad-

ing decisions composing of logic operations for TA

indicators, and historical data is used to optimize the

strategy return performance.

Simulation experiments leads us to the conclusion

that GP is effective in searching for strategies with

high return performances. With the genetic operations

in GP, good performance strategy with complicated

contents can be generated.

The applications of GP to investment problems

lead us say that the such a system could be adopted

into solving different targeted problems with the

change of various conditions. Its problem solving

ability is satisfactory for our future researches.

ACKNOWLEDGEMENTS

This paper is partially supported by the National

Natural Science Foundation of China under Grant

#61111130121/F020202.

REFERENCES

Alpaydin, E. (2010). Introduction to Machine Learning.

MIT Press, Cambridge, Mass.

Bennett, K. (1994). Global tree optimization: A non-greedy

decision tree algorithm. In Computing Science and

Statistics, pages 156–160.

Blockeel, H. and Struyf, J. (2002). Efﬁcient algorithms for

decision tree cross-validation. Journal of Machines

Learning Research, 3:621–650.

Hyaﬁl, L. and Rivest, R. (1976). Constructing optimal bi-

nary decision trees is np-complete. Information Pro-

cessing Letters, 5(1:15 17).

Liang, C., Zhang, Y., and Song, Q. (2010). Decision

tree for dynamic and uncertain data streams. In

JMLR 2nd Asian Conference on Machine Learning

(ACML2010), pages 209–224.

Mookerjee, V. and Mannino, M. (1997). Sequential deci-

sion models for expert system optimization. Knowl-

edge and Data Engineering, IEEE Transactions on,

9(5):675 –687.

Murthy, S. (1998). Automatic construction of decision trees

from data: A multidisciplinary survey. In Data Mining

and Knowledge Discovery.

Suarez, A. and Lutsko, J. (1999). Globally optimal fuzzy

decision trees for classiﬁcation and regression. Pat-

tern Analysis and Machine Intelligence, IEEE Trans-

actions on, 21(12):1297 –1311.

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

428