Predicting Stock Market Movement: An Evolutionary Approach

Salah Bouktif and Mamoun Adel Awad

College of Information Technology, UAE University, Al Ain, U.A.E.

Keywords:

Stock Market, Data Mining, Ant Colony Optimization, Bayesian Classiﬁers.

Abstract:

Social Networks are becoming very popular sources of all kind of data. They allow a wide range of users

to interact, socialize and express spontaneous opinions. The overwhelming amount of exchanged data on

businesses, companies and governments make it possible to perform predictions and discover trends in many

domains. In this paper we propose a new prediction model for the stock market movement problem based on

collective classiﬁcation. The model is using a number of public mood states as inputs to predict Up and Down

movement of stock market. The proposed approach to build such a model is simultaneously promoting per-

formance and interpretability. By interpretability, we mean the ability of a model to explain its predictions. A

particular implementation of our approach is based on Ant Colony Optimization algorithm and customized for

individual Bayesian classiﬁers. Our approach is validated with data collected from social media on the stock

of a prestigious company. Promising results of our approach are compared with four alternative prediction

methods namely, bagging, Adaboost, best expert, and expert trained on all the available data.

1 INTRODUCTION

Social networks and portable devices have made ex-

change and sharing a huge amount of personal expe-

riences possible. More than 500 million tweets are

shared daily. In particular, tweets on stock markets

are taking part in the everyday trafﬁc on these social

networks. Mining such social behavior of users in so-

cial networks, like Twitter and Facebook, has become

a hot research topic. In social data mining, the behav-

ior of users can be used in predicting global phenom-

ena (Asur and Huberman, 2010) such as stock mar-

ket (Bollen et al., 2011), disease epidemic (Ginsberg

et al., 2008), elections (Gayo-Avello, 2012), movie

box ofﬁce (Liu, 2006), and marketing (Yu and Kak,

2012). In order to mine social data, we need to con-

sider standard measures of human behavior. The Pro-

ﬁle of Mood States (POMS) and its variations (e.g.,

GPOMS), quoted from psychology, are used to mea-

sure individuals mood (Ginsberg et al., 2008). The

combination of these moods can represent individual

sentiment. Doubtlessly understanding the user behav-

ior and relating it to stock market index can affect

positively traders behaviors. However, unlike other

works (Ginsberg et al., 2008; Asur and Huberman,

2010; Gayo-Avello, 2012; Yu and Kak, 2012; Liu,

2006), our focus is to build an ensemble prediction

model that preserves two properties, namely, inter-

pretability and performance. In the context of ﬁnan-

cial market, one would like to predict the stock mar-

ket movement not only accurately but also with an

easy understanding of the causality relationship that

leads a particular direction of the stock values. For ex-

ample, one wants to know how speciﬁc public mood

states are related to stock changes. Different mining

techniques, in social engineering, have been used in

order to obtain highest prediction performance (Yu

and Kak, 2012). Examples of such techniques include

ANN, Regression, Decision Trees, Rule Mining, and

SVM (Michalski et al., 1986; Goebel and Gruen-

wald, 1999). However, some of these techniques suf-

fer from interpretability shortcomings. Interpretabil-

ity is the ability for an expert to interpret prediction

model by analyzing/examining the causality relation-

ship between input features and prediction outcome.

Such quality is of a critically important; especially

when the user wants to focus her effort to improve

input features to prevent undesirable outcomes. Deci-

sion Trees, Bayesian classiﬁers, rule set systems, and

fuzzy rules are examples of interpretable techniques

because we can visualize the effect of input features

on the ﬁnal prediction outcome (Lavra

c, 1999). Ex-

amples of non-interpretable techniques include SVM,

Regression, ANN, and ensemble techniques such as

Bagging and Boosting. Although some of the non-

interpretable techniques feature high performance and

strong mathematical background, they are still un-

acceptable by experts in many domains (Kim et al.,

Bouktif, S. and Awad, M..

Predicting Stock Market Movement: An Evolutionary Approach.

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 1: KDIR, pages 159-167

ISBN: 978-989-758-158-8

159

2007; Lavra

c, 1999). In this paper, we propose an en-

semble classiﬁers technique to build prediction model

that is more interpretable and accurate. The model is

based on Ant Colony Optimization (ACO) to com-

bine Bayesian classiﬁers. We show that our model

preserves the interpretability of individual Bayesian

classiﬁers and attains higher performance. We apply

such model in the stock market prediction because it

is essential to have highly accurate and interpretable

prediction model according to the market dynamics.

The rest of this paper is organized as follows: Sec-

tion 2 summarizes the related work. Section 3 de-

scribes the problem we are addressing and gives an

overview of the proposed solution. Background tech-

niques are introduced in Section 4. A customization

of Ant Colony Optimization to our problem is pre-

sented in Section 5. A validating experiment is de-

scribed in Section 6. Conclusions and future work are

drawn in Section 7.

2 RELATED WORK

Prediction in social media context has generally gone

three main streams, namely, marketing (Bollen et al.,

2011), movie box-ofﬁce (Liu, 2006), and informa-

tion dissemination (Yu and Kak, 2012). In market-

ing, Bollen et al. (Bollen et al., 2011) analyzed the

text content of daily Twitter feeds to measure pos-

itive vs. negative mood in terms of 6 dimensions,

namely, Calm, Alert, Sure, Vital, Kind, and Happy.

The main conclusion is that mood states can improve

the accuracy of predictions of the Dow Jones Indus-

trial Average (DJIA). Trasov et al. (Trusov et al.,

2009) employ Word of the Mouth (WOM) in mar-

keting and link it with the number of new members

joining the site (sign-ups). Liu et al. (Liu, 2006)

analyzed data collected from yahoo/movies to show

that WOM activities are very informative during pre-

release and the opening week, and the movie view-

ers tend to hold relatively high anticipation before re-

lease, but become more critical in the opening week.

Other social mining applications include election pre-

diction using Twitter data (Gayo-Avello, 2012), mea-

suring ﬂue spread using Google search queries (Gins-

berg et al., 2008), analyzing and measuring athletic

success based on POMS (Yu and Kak, 2012), and oth-

ers. Although, Bollen’s proposed model is accurate, it

cannot explain precisely how some mood indexes are

related to Dow Jones Industrial Average. In predic-

tion models, the trade-off between interpretability and

performance is gaining an increasing interest. Gener-

ally, researchers have gone in one of the following

two directions to build quality models. First, in order

to improve prediction accuracy, selected models are

combined composing a wisdom council which makes

the ﬁnal decision. Prediction is done by consulting all

models in the council. The methods known as Ensem-

ble Classiﬁers Methods (ECM) dominate this strat-

egy. They have demonstrated an out-performance of

classiﬁers combination on the individual ones when

it is applied to unseen data sets. Namely, averaging,

boosting, bagging, and voting (Moerland and May-

oraz, 1999; Oza and Tumer, 2008; Galar et al., 2012)

are the most commonly used techniques of classiﬁers

combination. The second strategy aims at preserv-

ing an easy understanding of the causality buried in

the classiﬁcation models. This is what we refer to

as interpretability of the classiﬁers. Such a property

is preserved by employing modeling techniques that

outcome white boxy classiﬁers able to explain the

causal relationship between inputs and outputs. Ex-

amples include the use of decision tree, Networks and

Bayesian classiﬁer, rule set system, fuzzy rule, etc.

In particular, Bayesian classiﬁers and networks have

been proposed as solution of lack of interpretability in

several domains such as clinical diagnosis (Van Ger-

ven et al., 2007), text and mail classiﬁcation (Chen

et al., 2009), and software engineering (Fenton and

Neil, 1999; Bouktif et al., 2006).

With respect to the problem of stock market predic-

tion, company managers would like prediction mod-

els to explain precisely how some user mood states are

related to the stock movement direction. Hence, we

propose a high performance and interpretable model

for stock market prediction problem. We build such

a model using Ant Colony Optimization that com-

bines classiﬁers structures while preserving the inter-

pretability and increasing the accuracy. Our approach

is customized for the case of individual Bayesian

Classiﬁers (BC) used for stock market movement pre-

diction and validated using empirical experiment.

3 PROBLEM STATEMENT AND

SOLUTION PRINCIPLE

3.1 Problem Statement

Our goal in this work is to construct a prediction

model that maps public mood states collected from

social media to stock market movement. Such a

model is generally built/validated using a representa-

tive data set on the problem being predicted. Particu-

larly, in the problem of stock market movement pre-

diction, the data set D

is assumed to be descriptive

of the particular context of a stock market changes.

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

160

Accordingly, if we want to be representative of all

the possible world-wide stock markets contexts, we

have to consider collecting data from all the world,

which is infeasible. A compromising solution is to

reuse learned patterns from different contexts as an

attempt to approximate more representative data. As

an opinion of experts (Bouktif et al., 2002; Bouktif

et al., 2010; Bouktif and Awad, 2013), adopted by

our approach, a high quality prediction model should

be a mixture of domain common knowledge and con-

text speciﬁc knowledge. On the one hand, the reuse

of existing experiences of prediction, allows to inte-

grate the common domain knowledge represented by

a diversity of contexts. On the other hand, the adapta-

tion of the models driven by the speciﬁc context data

takes into account the speciﬁc knowledge represented

by the available context data D

. In other words, more

generalizable prediction model is attained by cover-

ing more contexts and building higher performance

expert is achieved by adapting and reusing multiple

expertise. For evaluating our proposed solution and

validating our approach assumption, the best expert

among others, used as a benchmark.

Our problem can be seen as a problem of reusing

N predeﬁned prediction classiﬁers f

, .. . , f

called

experts. The objective of our solution is how to im-

prove performance of stock market prediction while

preserving interpretability. Therefore, the challenging

question is how to produce a new optimal BC that in-

herits the ”white-boxy’ property (i.e., interpretability)

of BCs, while improving the prediction performance

on the available context represented by D

3.2 Solution Overview

To achieve our goal, we propose to reuse existing clas-

siﬁers to derive new accurate and interpretable ones.

Our proposed approach is founded on three operations

(1) expertise identiﬁcation (2) expertise pooling and

(3) expertise adaptation. In the former operation, each

expert is decomposed into chunks of expertise. Each

chunk is a subset of the entire input domain. The

decomposition of a Bayesian classiﬁer, leads to ex-

pertise chunks expressed as a set of prior probabili-

ties attached to each range (interval) of attribute val-

ues (See details in Section 5.1). The rational behind

the decomposition is to give more ﬂexibility to the

process of combination, specially when selecting the

appropriate chunk of expertise. Moreover, the de-

rived expert which is a collection of chunks of exper-

tise, is intended to be interpretable since we know the

particular chunks responsible for the ﬁnal decision.

The expertise pooling operation consists in reusing

the chunks of expertise coming from different experts

to progressively build more accurate combinations of

these experts using D

to guide the search. The exper-

tise adaptation operation consists in modifying some

expertise chunks in order to obtain more reﬁned ex-

pertise combinations to a particular context D

Our proposed solution can be seen as a search-

ing problem where the goal is to select and gen-

erate the best set of expertise suitable for perform-

ing well in the context D

. Therefore, several ex-

isting experts will be decomposed into chunks of

expertise. However, searching an optimal combi-

nation of these chunks will degenerate combinato-

rial explosion. This situation makes the problem an

NP-complete one. Typically, such a problem can

be circumvented by using search-based techniques in

a large search space (Bouktif et al., 2010) (Ahmed

et al., 2008). In this, work, we propose a cus-

tomization of the Ant Colony Optimization to com-

bine Bayesian Classiﬁer based experts.

4 BACKGROUND

4.1 Naive Bayesian Classiﬁer

A Bayesian classiﬁer (BC) is a classiﬁcation method,

that assigns the most probable class c to a d-

dimensional observation x

. by determining its com-

puted as:

c = argmax

p(c

, . . . , a

where c

ranges of the set of possible classes

C = {c

, . . . , c

} and the observation x

is written as

generic attribute vector. Assuming that all attributes

are conditionally independent, BC degenerates to a

Naive Bayes. An attribute domain is divided into m

intervals I

and p(I

) will be the prior condi-

tional probability of a value of the j

attribute to be

in the interval I

when the class is c

; t

∈ N is the

rank of the interval in the attribute domain. To classify

a new observation x

, a BC with continuous attributes

applies Bayes rule to determine the a posteriori prob-

ability p(c

, . . . , a

). as:

p(c

, . . . , I

) =

∏

j=1

p(I

)

∑

h=1

∏

j=1

p(I

)p(c

)

p(c

(1)

where a

∈ I

4.2 ACO Metaheuristic

Ant Colony Optimization (ACO) is an algorithm in-

spired by the efﬁcient process by which ants look

Predicting Stock Market Movement: An Evolutionary Approach

161

for food and carry it back to their nest (Deneubourg

et al., 1990). Throughout its trip, an ant deposits

a chemical substance called pheromone which con-

stitutes a mean of indirect communication between

species members (Dorigo et al., 2006). The amount

of pheromone deposited by an ant reﬂects the quantity

of food as well as the optimality of the traversed path.

Investigations show that at the beginning of the food

search, the ants randomly choose their paths. Never-

theless, after some time and based on their commu-

nications through pheromone trails, the ants tend to

follow the same optimal path. A static graph mod-

eling all the possible paths followed by the ants is

used to represent an optimization problem. Based

on this representation, an artiﬁcial ant builds a solu-

tion by moving along the graph and selecting the suit-

able edges towards the optimal path. The deposited

amount of pheromone mirrors the optimality of the

traversed path.

5 CUSTOMIZATION OF ANT

COLONY APPROACH

Customizing ACO algorithm for Bayesian classiﬁers

combination needs the deﬁnition of the following el-

ements; a solution representation, a graph on which

the artiﬁcial ants will progressively construct the so-

lutions, a measure of solution quality, a suitable strat-

egy for the ants communication via pheromone up-

date and ﬁnally a moving rule that decides for an ant

to move from one node to the next in the graph.

5.1 Solution Structure

The decomposition of BC into expertise chunks is

critical to our based approach. This operation fa-

cilitates the exploration of the search space deﬁned

by all the combinations of expertise chunks. Conse-

quently, it makes the steps of reusing and adapting

the existing BC experts easy and efﬁcacious. Accord-

ing to the description of Naive BCs in Section 4.1,

two BC parameters can represent an expertise chunk,

namely, marginal probabilities of classes, and prior

conditional probabilities of attributes. Since prior

conditional probabilities are more determinant in the

BC structure, we use them to characterize a chunk of

expertise.

For a given attribute j, there are m

chunks of ex-

pertise. An expertise chunk can be coded as a triplet

made up of one interval and two conditional proba-

bilities. To elaborate on expertise chunk, let us con-

sider the prediction of stock market movement prob-

lem. The used BCs are binary and each of them pre-

dicts the stock price movement directions. The set of

used class labels is C = {c

, c

}, with c

= up and

= down. In this example a chunk of expertise is a

triplet, denoted by (I

, p(I

), p(I

)). It can

be interpreted as follows: the probability of a value

of the j

attribute to be within the interval I

when

the class is set to c

, is equal to p(I

) and is equal

to p(I

) when the class is set to c

. The index

∈ N is the rank of the interval in the attribute do-

main that contains N intervals. In this illustrative pre-

diction problem, a mood state attribute j in a BC will

be encoded by the following vector:









[0, 11], 0.241, 0.296





[11, 19], 0.303, 0.192





[19, 44], 0.209, 0.254





[44, 96], 0.253, 0.258









where each line encodes a vector of expertise

chunks. An expertise chunk is deﬁned by the triplet

(interval, cond. probability|c

, cond. probability|c

For example, the stock market chunk of expertise

([0, 11], 0.241, 0.296) means that the conditional

probability of a mood state score to be in the interval

[0, 11] when the class is c

= up, is equal to 0.241 and

0.296 when the class is c

= down. In this particular

example, the mood state attribute is divided into 4

intervals.

5.2 Solution Construction Mechanism

5.2.1 Possible Strategies

There are two candidate strategies of modeling BCs

combination. The ﬁrst one considers the modular

structure of BC, in which, ACO is applied on each

single attribute. In other words, the artiﬁcial ants will

progressively construct, for each attribute, a new com-

position until obtaining a near optimal set of expertise

chunks. On each attribute, the ants work to derive a

new decomposition (i.e., new slicing) of the attribute

domain, and a new distribution of conditional proba-

bilities. Then a ﬁnal classiﬁer is built-up by grouping

all the near optimal derived compositions for all the

attributes, respectively.

The second strategy considers all the attributes, si-

multaneously. A new BC solutions is built-up from

expertise chunks that contain knowledge from all the

attributes. In this paper, we adopt the ﬁrst strategy,

however, we will empirically study both in future

work. Therefore, we focus on constructing a solution

at the BC attribute level. The solution construction

mechanism aims at searching the optimal path in a di-

rected graph G(V, E) where V is a set of nodes and E

is a set of edges that will map a BC-attribute structure.

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

162

5.2.2 Graph Construction

The ﬁrst step of the ACO customization is to con-

struct the graph to model the possible paths for the

ants moves. To construct the attribute graph G(V, E),

we ﬁrst consider all attribute compositions through-

out all the BCs and create a new slicing of the domain

attribute. The new slicing is deﬁned by considering

all the boundaries of the attribute intervals in all

the BCs. The list of all these boundaries sorted in

an ascending order, denoted by B, is used to create

the new slicing such that each interval is bounded

by two adjacent values from B. Therefore, each

node v in V , represents a boundary from the list B

of boundaries of the attribute being processed. The

nodes of an attribute graph are then ordered by value

of boundary. With respect to the set of edges E, we

consider the combination of N BCs, subsequently

N instances of the attribute are to be taken into

account. Therefore, between two consecutive nodes

and v

i+1

there are N edges e

, k = 1..N. Each

edge e

in E, is labeled with a couple of conditional

probabilities (i.e., p(e

) , p(e

)). These

probabilities are calculated based on the conditional

probabilities distribution of the original instance of

attribute coming from the k

BC. For example, the

conditional probabilities associated with the edge e

are computed in the following way:

p(e

) =

p([v

, v

]|c

) ∗ (v

− v

)

− v

)

where, m = 1, 2 and [v

, v

] is the interval contain-

ing the interval [v

, v

] in the original composition of

the attribute j in the BC number 1. Figure 1 shows

P(e

) ; p(e

)

P(e

) ; p(e

)

P(e

) ; p(e

)

Figure 1: Two-Node Graph for the Mechanism of Solution

Construction.

two nodes from the graph constructed for an attribute

j. As described above, the solution construction

mechanism assumes that the used graph is static,

built on quantized pairs of conditional probabilities;

all possible values of conditional probabilities are pre-

determined, listed, and used to build the static graph.

Thus, attribute compositions that form the candidate

solutions are derived from simple combinations of

edges.

5.3 Solution Quality Measure

The mission of our ACO based approach is to max-

imize the performance of derived BC containing the

attribute composed by the ants. Our approach can be

seen as a learning process where the data set D

, the

particular context data, is used to guide the ants in

their trails to construct solutions. Therefore, the set

is used as evaluation data to compute the predic-

tive accuracy of the derived classiﬁer.

The predictive accuracy of a BC can be evaluated

with different measures as discussed in (Bouktif et al.,

2010). In order to avoid falling in either constant or

guessing classiﬁers, we decided to use Youden’s J-

index (Youden, 1961), deﬁned as

J( f ) =

∑

i=1

∑

j=1

i j

, where n

i j

is the number of cases in the evaluation

data set with real label c

classiﬁed as c

. Intuitively,

J( f ) is the average correctness per label. In statistical

terms, J( f ) measures the correctness assuming that

the a priori probability of each label is the same.

5.4 Ant Walk, Attractiveness and

Visibility

When an ant on a node v

moves to the next node

i+1

, it chooses an edge e

representing the k

cou-

ple of conditional probabilities associated to the in-

terval [v

, v

i+1

] and originally copied from the k

BC.

Hence, the ant’s task after each move, is to assign a

pair of conditional probabilities to the attribute inter-

val [v

, v

i+1

Initially, the ants start by moving randomly from

one node to the following one. After some iterations,

the ants become guided by a transition strategy. Such

a strategy consists in choosing the edge to be traversed

based on the amount of pheromone deposited on that

edge. The higher the amount, the higher the proba-

bility of choosing that edge. This probability is com-

puted by the following equation:

p(choosing(e

)) =

τ(e

)

∗ η(e

)

∑

h=1

τ(e

)

∗ η(e

)

, (2)

where τ(e

and η(e

) are respectively the attractive-

ness and the visibility of the edge e

to be chosen.

The attractiveness function is based on the quality of

the previous solutions. It is modeling the amount of

pheromone accumulated on the trail of ants and de-

ﬁned by the Equation 3. However, the visibility func-

tion is deﬁned as the sum of conditional probabilities

associated with the edge e

. The two parameters α

Predicting Stock Market Movement: An Evolutionary Approach

163

and β are used to balance the impact of attractiveness

(i.e., pheromone) versus visibility. These are two pa-

rameters of the ACO Algorithm and have to be tuned

empirically after several runs. After computing the

probability of choosing every edge e

, k = 1..N, a

Casino wheel method is used to select an edge.

5.5 Pheromone Update Strategy

An ant deposits a quantity of pheromone on ev-

ery edge it traverses. The accumulated quantity of

pheromone constitutes the attractiveness of an edge.

It can also be considered as long-term memory of

the ant colony. In our proposed ACO algorithm,

this long-term memory is updated after an ant ﬁn-

ishes one tour. The schema of updating the deposited

pheromone amount during the iteration t on an edge

, k = 1..N, is given by the following equation:

τ(e

)

(

(1 − ρ) ∗ τ(e

)

t−1

, i f edge is not traversed

(1 − ρ) ∗ τ(e

)

t−1

+ Q ∗ J( f ), otherwise,

(3)

where 0 ≤ ρ ≤ is a parameter of the ACO algorithm

representing the evaporation rate of the pheromone

substance. This parameter indicates low evaporation

at small values and vice-versa. The quantity of the

newly deposited pheromone, ∆τ = Q ∗J( f ), is propor-

tional to both the base attractiveness constant Q and

the quality measure J( f ) that has to be maximized.

6 EXPERIMENTAL RESULTS

In this section, we present the empirical results to val-

idate our proposed approach. We focus on evaluating

the accuracy of the obtained model for stock market

movement prediction derived by combining Bayesian

classiﬁers. Two inputs are needed for such an eval-

uation. (1) A set of ”existing” models, called stock

market experts that predict stock price movement. In

our case, these prediction models are Bayesian Clas-

siﬁers. (2) A representative data set that serves as a

guidance for the classiﬁers combination process.

6.1 Data Description

Inspired by Bollen’s work (Bollen et al., 2011) and

for the sake of a future comparison with the results

obtained in our previous work (Bouktif and Awad,

2013), we have decided to use public mood states

collected on Twitter as input attributes for the indi-

vidual classiﬁers. A set of 18 mood states is used

by a prestigious company (i.e. IBM). These 18 mood

state attributes are used to track the negative and pos-

itive mood sentiments during the 9 days preceding

the closing price of a particular stock. Precisely, two

mood state attributes are used to capture, respectively,

the score of negative mood sentiments and the score

of positive mood sentiments of one day. The stock

movement data (i.e., up or down) of the IBM com-

pany is obtained from Google Finance. Let PM

and

be respectively, the number of positive and the

number negative mood sentiments for a the IBM com-

pany stock j days before closing price, j = 1..9. A

sample of data row could be (16, 32, 43, 13, 28,10, ...,

15,36, Down), where for example, the score of pos-

itive public mood sentiments one day before closing

price is 16 ( i.e., PM

= 16) and the score of neg-

ative public mood sentiments one day before closing

price is 32 ( i.e., NM

= 32). Besides, the stock move-

ment direction (i.e., Up or Down) as well as the the

mood scores are tracked for a period of more than

20 months. The collected data set contains 462 data

points capturing each 18 mood scores followed by the

stock movement direction.

6.2 Building Individual Stock Market

Experts

For the sake of performing a controlled experiment,

the individual stock market experts are built ”in-

house”. Therefore, we use random combinations of

attributes from the 18 stock mood states (i.e., of two,

three,..., and nine days). By these combinations, we

imitate different opinions of stock market experts that

may predict the direction of the particular stock value

based on different set of mood states about the stock

of the studied company. Randomly, we formed 20

subsets of mood states, and subsequently 20 training

data sets were created. Therefrom, one classiﬁer is

built on each data set by using the RoC machine learn-

ing tool (Ramoni and Sebastiani, 1999).

6.3 Experimental Context Data Set

A data sample called particular context consists of

154 records of mood states about IBM company ex-

tracted randomly from the whole data set (i.e., from

the 462 data points). Also, each record is described by

the 18 mood state attributes and labeled by the stock

movement direction (i.e., up or down). We recall that

the context data set D

is supposed to represent stock

market movement prediction circumstances.

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

164

6.4 ACO Algorithm Setting

After several runs, the termination criterion is deter-

mined by the maximum number of iterations (i.e.,

number of tours performed by each ant) which was

set to 150 and the number of artiﬁcial ants used to se-

lect the conditional probabilities was set to 100. The

pheromone update is conducted by two parameters,

namely, the pheromone variation that was set to 2.0

and the pheromone evaporation rate ρ that was set

to 1%. In order to balance between the impacts of

pheromone and visibility, α and β were ﬁxed at 1.0

and 1.0, respectively.

6.5 Experimental Design and Results

Four alternative approaches are used as benchmarks

to evaluate the performance of our proposed ACO-

based model construction; the ”best expert selection”

approach, the ”data combination” approach, and two

other methods from the ECM, namely, the bagging

and boosting algorithms. The ”best expert selection”

approach consists in analyzing the performance of the

existing expert across a spectrum of available data

), in order to choose the one having the best per-

formance, referred to as f

Best

. The “data combina-

tion” approach takes advantages of the availability of

data sets that served in building the “simulated” ex-

isting experts in this controlled experiment. With the

latter approach an expert, f

AllData

, is trained on the

combination of all the available data sets. With re-

spect to the ECM, the bagging is a voting based meth-

ods that consists in assigning the class that collects

the higher number of votes among the individual ex-

perts (Merz, 1998). In our experiment, it is denoted

bagg

. Finally the boosting approach implemented by

the well-known Adaboost algorithm, a sophisticated

weighted sum of individual classiﬁers outputs (Fre-

und and Schapire, 1997). It is referred to as f

boost

Table 1 compares the accuracy of the resulting expert

ACO

to those of f

Best

, f

AllData

, f

bagg

and f

boost

, re-

spectively. The accuracies of the obtained classiﬁers

are evaluated using J-index of Youden and estimated

with 10-fold cross validation. The process of deriving

the new BC is guided by the union of 9 folds from

the context data D

. A new classiﬁer is then trained

on the union of 9 folds, and tested on the one remain-

ing fold. The process is repeated 10 times for all 10

possible combinations. The mean and standard devi-

ation of accuracies J-index are computed on both the

training and the test data.

When compared to the best expert f

Best

, the gen-

erated BC f

ACO

has gained 15.90% in predictive ac-

curacy on the training data set and 7.41% on the test-

Table 1: ACO-Based Expert Accuracy compared to other

approaches: Mean(standard deviation if applicable) per-

centage values of the J-index.

Approaches Training J-index Test J-index

best

50.10(1.22) 51.41(14.25)

AllData

52.84(8.4) 49.84(10.03)

Bagg

56.92(-) 53.22(-)

Boost

57.43(1.4) 54.93(2.4)

ACO

66.00(1.28) 58.82(2.60)

ing data. On both training and testing data, the per-

formance improvement is signiﬁcantly greater than

each of the standard deviations computed respec-

tively for f

Best

accuracy (e.g., 1.22 or 14.25 ) and

for f

ACO

accuracy (1.28 or 2.6). The null hypothesis

H01, assuming that no differences exist between f

Best

and f

ACO

, is rejected by the statistical testing t-test,

with very strong evidence (Two-tailed t-test, p-value

<1%). Thus a result shows a signiﬁcant improve-

ment in the accuracy of stock market expert derived

by our approach. Moreover, the accuracy of the de-

rived BC f

ACO

is compared to that of the BC trained

on all the available data denoted f

AllData

and the result

shows a signiﬁcant difference in performance (i.e.,

13.16% on training data and about 9.00% on testing

data) on the favor of f

ACO

. T-test statistical analysis

shows a signicant accuracy difference between f

ACO

and f

AllData

. The null hypothesis H02, assuming that

no differences exist between f

AllData

and f

ACO

, is re-

jected with very strong evidence (Two-tailed t-test, p-

value <1%).

In order to compare our approach to similar ones,

two ensemble classiﬁers methods are implemented,

namely, bagging (i.e., f

Bagg

) and boosting (i.e., f

Boost

). The performance of the derived model f

ACO

, shown

in Table 1, is not only comparable to f

Bagg

and f

Boost

but surprisingly higher. A t-test statistics reject with

high signiﬁcance of more than 95% both the the null

hypothesis H03, assuming that the accuracy of f

ACO

signiﬁcantly lower than that of the bagging f

Bagg

and

the null hypothesis H04, assuming that f

ACO

is signif-

icantly more better than f

Boost

obtained by boosting

technique. Figure 2 shows a summary of the perfor-

mance achieved by the prediction model derived by

our approach (J( f

ACO

)).

7 CONCLUSION

In this paper, we have applied our previously pro-

posed approach of mixture of experts to the problem

of stock market movement prediction. Ant Colony

Optimization was tailored and improved to combine

stock market experts. A data driven combination was

Predicting Stock Market Movement: An Evolutionary Approach

165

J-index

0.7

0.65

0.6

0.55

0.5

0.45

0.4

0.35

0.3

fACO fBest fAllData fBagg fBoost

Training

Test

Figure 2: Experimental results: Predictive accuracy im-

provement with ACO.

performed and new expert is learned on public mood

data collected from social networks. The novelty of

the proposed models combination resides in the reuse

of the structural elements of individual models rather

than their outputs. This way of combining models

does not only improve the performance of the com-

posite model but it also promotes model interpretabil-

ity. The latter model property is very critical in the

context of stock market decision making. In par-

ticular it is very useful in discovering which mood

states attributes and which mood scores are responsi-

ble for a particular movement of the stock value. The

high complexity of combining expert chunks was re-

solved by the application of a metaheuristic, namely,

ACO algorithm. Bayesian Classiﬁers are used as in-

terpretable type of prediction models. The results ob-

tained on data of a particular company in the stock

market, show the higher performance of the derived

model when compared to four alternative approaches

including bagging and boosting. Other types of pre-

diction models are to be considered in our future

works. Besides, we will continue collecting more rep-

resentative data from social media. In particular, for

maturing the problem of stock market prediction and

discovering better predictors.

REFERENCES

Ahmed, F., Bouktif, S., Serhani, A., and Khalil, I. (2008).

Integrating function point project information for im-

proving the accuracy of effort estimation. In Advanced

Engineering Computing and Applications in Sciences,

pages 193–198. IEEE.

Asur, S. and Huberman, B. A. (2010). Predicting the fu-

ture with social media. In International Conference

on Web Intelligence and Intelligent Agent Technology,

pages 492–499.

Bollen, J., Mao, H., and Zeng, X. (2011). Twitter mood

predicts the stock market. Journal of Computational

Science, 2(1):1–8.

Bouktif, S., Ahmed, F., Khalil, I., Antoniol, G., and

Sahraoui, H. (2010). A novel composite model ap-

proach to improve software quality prediction. Infor-

mation and Software Technology, 52(12):1298–1311.

Bouktif, S. and Awad, M. A. (2013). Ant colony based ap-

proach to predict stock market movement from mood

collected on twitter. In Proceedings of IEEE/ACM

International Conference on Advances in Social Net-

works Analysis and Mining, pages 837–845.

Bouktif, S., K

egl, B., and Sahraoui, S. (2002). Combining

software quality predictive models: An evolutionary

approach. In Proceeding of the International Confer-

ence on Software Maintenance, pages 385–392.

Bouktif, S., Sahraoui, H. A., and Antoniol, G. (2006). Sim-

ulated annealing for improving software quality pre-

diction. In Genetic and Evolutionary Computation

Conference proceeding, Seattle, USA, pages 1893–

1900.

Chen, J., Huang, H., Tian, S., and Qu, Y. (2009). Feature se-

lection for text classiﬁcation with na

ıve bayes. Expert

Systems with Applications, 36(3):5432–5435.

Deneubourg, J., Aron, S., Goss, S., and Pasteels, J. (1990).

The self-organizing exploratory pattern of the argen-

tine ant. Journal of insect behavior, 3(2):159–168.

Dorigo, M., Birattari, M., and Stutzle, T. (2006). Ant colony

optimization. Computational Intelligence Magazine,

IEEE, 1(4):28–39.

Fenton, N. and Neil, M. (1999). A critique of software de-

fect prediction models. IEEE Transactions on Soft-

ware Engineering, 25(5):675–689.

Freund, Y. and Schapire, R. (1997). A decision-theoretic

generalization of on-line learning and an application

to boosting. Journal of Computer and System Sci-

ences, 55(1):119–139.

Galar, M., Fern

andez, A., Barrenechea, E., Bustince, H.,

and Herrera, F. (2012). A review on ensembles for

the class imbalance problem: bagging-, boosting-, and

hybrid-based approaches. IEEE Transactions on Sys-

tems, Man, and Cybernetics, Part C: Applications and

Reviews, 42(4):463–484.

Gayo-Avello, D. (2012). I wanted to predict elections with

twitter and all i got was this lousy paper a balanced

survey on election prediction using twitter data. arXiv

preprint arXiv:1204.6441.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L.,

Smolinski, M. S., and Brilliant, L. (2008). Detecting

inﬂuenza epidemics using search engine query data.

Nature, 457(7232):1012–1014.

Goebel, M. and Gruenwald, L. (1999). A survey of data

mining and knowledge discovery software tools. ACM

SIGKDD Explorations Newsletter, 1(1):20–33.

Kim, H., Loh, W.-Y., Shih, Y.-S., and Chaudhuri, P. (2007).

Visualizable and interpretable regression models with

good prediction power. IIE Transactions, 39(6):565–

579.

Lavra

c, N. (1999). Selected techniques for data mining in

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

166

medicine. Artiﬁcial intelligence in medicine, 16(1):3–

23.

Liu, Y. (2006). Word of mouth for movies: Its dynamics and

impact on box ofﬁce revenue. Journal of marketing,

pages 74–89.

Merz, C. (1998). Classiﬁcation and Regression by Com-

bining Models. PhD thesis, university of California

Irvine.

Michalski, R. S., Carbonell, J. G., and Mitchell, T. M.

(1986). Machine learning: An artiﬁcial intelligence

approach, volume 2. Morgan Kaufmann.

Moerland, P. and Mayoraz, E. (1999). DynaBoost: Combin-

ing boosted hypotheses in a dynamic way. Technical

report, IDIAP, Switzerland.

Oza, N. and Tumer, K. (2008). Classiﬁer ensembles: Select

real-world applications. Information Fusion, 9(1):4–

20.

Ramoni, R. and Sebastiani, P. (1999). Robust bayesian clas-

siﬁcation. Technical report, Knowledge Media Insti-

tute, the Open University.

Trusov, M., Bucklin, R. E., and Pauwels, K. (2009). Effects

of word-of-mouth versus traditional marketing: Find-

ings from an internet social networking site. Journal

of marketing, 73(5):90–102.

Van Gerven, M., Jurgelenaite, R., Taal, B., Heskes, T., and

Lucas, P. (2007). Predicting carcinoid heart disease

with the noisy-threshold classiﬁer. Artiﬁcial Intelli-

gence in Medicine, 40(1):45–55.

Youden, W. J. (1961). How to evaluate accuracy. Materials

Research and Standards, ASTM.

Yu, S. and Kak, S. (2012). A survey of prediction using

social media. arXiv preprint arXiv:1203.1647.

Predicting Stock Market Movement: An Evolutionary Approach

167