Learning to Predict the Stock Market Dow Jones Index Detecting and

Mining Relevant Tweets

Giacomo Domeniconi, Gianluca Moro, Andrea Pagliarani and Roberto Pasolini

DISI, Universit

a degli Studi di Bologna, Via Venezia 52, Cesena, Italy

Keywords:

Stock Market Prediction, Dow Jones Trend, Text Mining, Noise Detection, Twitter.

Abstract:

Stock market analysis is a primary interest for ﬁnance and such a challenging task that has always attracted

many researchers. Historically, this task was accomplished by means of trend analysis, but in the last years

text mining is emerging as a promising way to predict the stock price movements. Indeed, previous works

showed not only a strong correlation between ﬁnancial news and their impacts to the movements of stock

prices, but also that the analysis of social network posts can help to predict them. These latest methods

are mainly based on complex techniques to extract the semantic content and/or the sentiment of the social

network posts. Differently, in this paper we describe a method to predict the Dow Jones Industrial Average

(DJIA) price movements based on simpler mining techniques and text similarity measures, in order to detect

and characterise relevant tweets that lead to increments and decrements of DJIA. Considering the high level of

noise in the social network data, we also introduce a noise detection method based on a two steps classiﬁcation.

We tested our method on 10 millions twitter posts spanning one year, achieving an accuracy of 88.9% in the

Dow Jones daily prediction, that is, to the best our knowledge, the best result in the literature approaches based

on social networks.

1 INTRODUCTION

The spread of social networks and micro blogging

enables people to share opinions and moods, creat-

ing very large and constantly updated textual corpora.

Sentiment Analysis techniques seek to extract emo-

tional states or opinions expressed in each text docu-

ment and create a collective social emotional state.

Can the trend of social emotional state predict the

macroscopic evolution of global events such as some

economic indicators? Recent studies have answered

afﬁrmatively to this question. In particular, (Liu et al.,

2007) using a Probabilistic Latent Semantic Anal-

ysis (pLSA) model extracts sentiment indicators on

blogs that predict future sales, (Mishne and de Rijke,

2006) shows how through assessments of blog sen-

timents can predict the movie sales; similarly (Asur

and Huberman, 2010) shows how public sentiments

on movies expressed on Twitter can actually predict

box ofﬁce receipts. (Gruhl et al., 2005) tests the pre-

dictability of books sales using online chat activities.

But all that glitters ain’t gold: (Gayo-Avello, 2012)

This work was partially supported by the european

project “TOREADOR” (grant agreement no. H2020-

688797).

criticises some literature on this topic, showing results

that are in fact unpredictable, for instance the predic-

tion of election. Of course analyses of tweets can help

to understand the political popularity, but can not con-

sistently predict the results so far.

In this work we experiment the prediction of the

Dow Jones Industrial Average (DJIA) from Twitter

messages. For obvious reasons, the ability to predict

the stock market trends has historically attracted inter-

est from shareholders as well as academia. Efﬁcient

Market Hypotesis (EHM) proposed in (Fama, 1965)

states that prices of ﬁnancial assets are managed by

rational investors who rely on new information, i.e.

news, and not by present or past prices; since news

are not predictable, neither is the stock market, which,

according to past studies (Kimoto et al., 1990; Fama,

1991), follows in general a random walk trend. How-

ever, (Malkiel, 2003) confutes the EMH, providing

evidences that market prices reﬂect all the available

information. Moreover, several studies show that the

trend of the stock market does not follow a random

walk model and can be predicted in some way (Lo and

MacKinlay, 1988; Butler and Malaikah, 1992), in-

cluding, for example, with mining techniques applied

to market news (Gid

ofalvi and Elkan, 2001; Schu-

Domeniconi G., Moro G., Pagliarani A. and Pasolini R.

Learning to Predict the Stock Market Dow Jones Index Detecting and Mining Relevant Tweets.

DOI: 10.5220/0006488201650172

In Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR 2017), pages 165-172

ISBN: 978-989-758-271-4

maker and Chen, 2006) or to past prices (Li et al.,

2011) or even to ﬁnancial reports (Lin et al., 2008).

Recently several works have studied the correla-

tion between sentiments extracted from Twitter and

socio-cultural phenomena (Bollen et al., 2011a), such

as the popularity of brands (Ghiassi et al., 2013), and

also the correlation between public mood in Twitter

and the DJIA trend (Bollen et al., 2011b).

Differently from previous works that predict DJIA

by computing people sentiments or moods from their

twitter opinions, we introduce a simpler method based

on mining techniques and text similarity measures for

the characterisation and detections of relevant tweets

with respect to increments or decrements of DJIA. In

particular, as far as the selection of tweets is con-

cerned, our method includes a noise detection ap-

proach in short textual messages in order to ﬁlter out

irrelevant tweets in predicting DJIA. As discusses in

Section 2, there is a large literature regarding the de-

tection of noise in data mining and especially in data

clustering; various methods have also been applied to

text mining, generally for the recognition of noisy fea-

tures (Samant and Rao, 2011) or for novelty detection

(Markou and Singh, 2003), i.e. the discovery of un-

known data that a machine learning system has not

been trained with.

In this work we employed the same set of ten mil-

lions tweets posted in 2008 used by (Bollen et al.,

2011b), but with a much smaller training set in or-

der to assess our method more reliably with a wider

test set. Intuitively, our method is based on train-

ing an intermediate classiﬁer on ﬁve millions tweets

posted in the ﬁrst seven months of the 2008. By

analysing the results of this classiﬁcation, we create

a pruning scheme based on four goodness groups of

tweets, namely true and false positives and true and

false negatives, depending on the outcome of the clas-

siﬁcation. We subsequently transform the training set

by removing irrelevant tweets considered noise. This

technique has been applied at two level: both to in-

dividual tweets and to aggregations of them, which

correspond to actual instances of the training set.

The paper is organized as follows. Section 2 anal-

yses literature about stock market prediction based

on news, social network analyses and noise detection

methods. Section 3 explains the data considered, the

Vector Space Model construction and the noise de-

tection technique. Section 4 describes and compares

experiments with other works showing our results im-

proves the best existing outcomes we found among

social network based prediction approaches. Finally,

Section 5 sums results up and outlines future work.

2 RELATED WORK

Stock market analysis and prediction has always re-

ceived great interest by the academic world: several

possible approaches have been proposed, from time

series prediction to textual news analysis, until arriv-

ing to the social networks analysis. We start from

classic stock market prediction approaches, then we

summarize the most recent works using social net-

work information to forecast the market prices. Fi-

nally, we analyze the most known noise detection

methods proposed in literature.

Both academia and practitioners worked to the

prediction of stock prices by analysing the underly-

ing dynamics of ﬁnancial markets. Initially, the sci-

entiﬁc researches were based on the Efﬁcient Mar-

ket Hypothesis (EMH) (Fama, 1965) according to

which prices of traded assets reﬂect all relevant infor-

mation available at any time. In such ﬁnancial mar-

ket model, neither technical prediction analysis of fu-

ture prices based on the study of past prices, nor fun-

damental analysis studying the evolution of the busi-

ness value, allows an investor to achieve higher prof-

its than those that another investor would get with a

portfolio of stocks selected randomly, with the same

degree of risk. However, in the last decades a great

amount of works refused the unpredictability hypoth-

esis (Malkiel, 2003; Qian and Rasheed, 2007) show-

ing that stock price series follow the random walk the-

ory only in a short period of time and consequently

arguing that in general they could be predicted.

Two major approaches to stock market prediction

exist: using features derived from technical analysis

based on the history of stock index prices and us-

ing related news and textual information to predict

trends. Surveys about the two approaches are given

in (Atsalakis and Valavanis, 2009) and (Mittermayer

and Knolmayer, 2006) respectively. Other researches

employ blog posts to predict stock market behaviour

by determining correlation between activities in In-

ternet message boards and stock volatility and trad-

ing volumes (Antweiler and Frank, 2004). (Gilbert

and Karahalios, 2010) create an index of the US na-

tional mood, called Anxiety Index, by exploiting over

20 million posts from the LiveJournal website: when

this index increased signiﬁcantly, the S&P 500 ended

the day marginally lower than expected. A compar-

ative survey of artiﬁcial intelligence applications in

ﬁnance is reported in (Bahrammirzaee, 2010).

Twitter represents a huge knowledge base provid-

ing information about the most disparate topics. It

can be argued that this knowledge base can provide

an indication on the public mood. In fact the emo-

tional state, as the prerogative of a single human be-

ing, propagates to social status as a feature of all of the

individuals. This phenomenon is studied by (Bollen

et al., 2011a): authors ﬁnd that events in the social,

political, cultural and economic sphere do have a sig-

niﬁcant, immediate and highly speciﬁc effect on the

various dimensions of public mood extracted from

Twitter. They speculate that large scale mood anal-

ysis can provide a solid platform to model collective

emotive trends in terms of their predictive value with

regards to existing social as well as economic indica-

tors. This predictive feature of Twitter mood has been

used for forecasting different phenomenons, like the

sales of a movie (Asur and Huberman, 2010), the pub-

lic opinion on a particular brand (Ghiassi et al., 2013)

and so on.

Unlike the proposal of this paper, many of the ap-

proaches in literature apply sentiment analysis tech-

niques to tweets to create forecast models. (Bollen

et al., 2011b) measure collective mood states (posi-

tive, negative, calm, alert, sure, vital, kind and happy)

through sentiment analysis applied to more than 9

million tweets posted in 2008. Tweets are ﬁltered

by some generic sentiment expressions (e.g. “I’m

feeling”) not directly related to stock market. They

analyse tweets by two mood tracking tools: Opin-

ion Finder (OF, (Wilson et al., 2005)) that classi-

ﬁes tweets as positive or negative, and Google-Proﬁle

of Mood States (GPOMS) that measure mood in

the other 6 dimensions. They found that the calm

mood proﬁle yields the best prediction result for Dow

Jones Industrial Average (DJIA) with an accuracy of

86.7% in the prediction of the daily directions in the

month of December, moreover they also show how

a tweet aggregation in a 3-day period ensures better

prediction on the daily DJIA. Similarly, (Chyan and

Lengerich, 2012) use the calm score of tweets extracts

from June and December 2009, achieving an accuracy

of 75% in 20-day test of prediction of Dow Diamonds

ETF (DIA). They increase the accuracy up to 80% by

adding a quantitative feature related to the previous

value of the DIA.

Another similar analysis of (Bollen et al., 2011b)

is made by (Mittal and Goel, 2012), where the same

dataset of (Chyan and Lengerich, 2012) is used in

a multi-class classiﬁcation, considering only calm,

happy, alert and kind mood dimensions. Further-

more, 4 different learning algorithms (i.e. Linear Re-

gression, Logistic Regression, SVMs and SOFNN)

are used to learn and exploit the actual predictions;

SOFNN based model performed best among all other

algorithms, giving nearly 76% of accuracy. A com-

parison of six different and popular sentiment analysis

lexical resources (Harvard General Inquirer, Opinion

Lexicon, Macquarie Semantic Orientation Lexicon,

MPQA Subjectivity Lexicon, SentiWordNet, Emoti-

cons) to evaluate the usefulness of each resource in

stock prediction is done by (Oliveira et al., 2013).

(Sprenger et al., 2013) used sentiment analysis on

stock related tweets collected during a 6-month pe-

riod. To reduce noise, they selected tweets contain-

ing cashtags ($) of S&P 100 companies. Each mes-

sage was classiﬁed by a Na

ıve Bayes method trained

with a set of 2500 tweets. Results showed that senti-

ment indicators are associated with abnormal returns

and message volume is correlated to the trading vol-

ume. Similarly, (Rao and Srivastava, 2012) associate

a polarity to each day considering the number of posi-

tive and negative tweets via sentiment140

, testing the

DJIA and NASDAQ-100 index in a 13-month period

between 2010 and 2011. (Mao et al., 2011) surveyed

a variety of web data sources (Twitter,news headlines

and Google search queries) and tested two sentiment

analysis methods used for the prediction of stock mar-

ket behavior, ﬁnding that their Twitter sentiment in-

dicator and the frequency of ﬁnancial terms occur-

rence on Twitter are statistically signiﬁcant predictors

of daily market returns.

There are several approaches that do not use di-

rectly the sentiment analysis to make predictions. For

example (Mao et al., 2012) analyse with linear regres-

sion model the correlation between the Twitter pre-

dictor and stock indicators at three levels (stock mar-

ket, sector and single company level) and ﬁnd that

the daily number of tweets that mention S&P 500

stocks is signiﬁcantly correlated with S&P 500 daily

closing price. They obtain in a 19-day test an accu-

racy of 68% for Stock Market and sector level pre-

diction and of 52% for company stock. (Porshnev

et al., 2013) create different types of features: to a

“basic” data set corresponding to the tweets BoW of

the previous day, they add features regarding the num-

ber of tweets containing the words “worry”, “hope”

or “fear” (Basic&HWF), or the words “happy”, “lov-

ing”, “calm”, “energetic”, “fearful”, “angry”, “tired”

and “sad”(Basic&8emo), training a SVM with these

datasets relating at 7 months of 2013. They get a max-

imum baseline accuracy of 65.17% for the DJIA, 57%

for the S&P 500 and 50.67% for NASDAQ. In a dif-

ferent way, (Ruiz et al., 2012) extract two types of

features, one concerning the overall activity in twitter

and one measure the properties of an induced inter-

action graph. They found a correlation between these

features and changes in S&P 500 price and volume

traded. (Zhang et al., 2011) found a high negative

correlation (0.726, signiﬁcant at level p < 0.01) be-

tween the Dow Jones index and the presence of the

words “hope”, “fear”, and “worry” in tweets.

http://help.sentiment140.com/

A quantitative analysis is made by (Mao et al.,

2013): using Twitter volume spikes in a 15-month

period (from February 2012 to May 2013) they train

a Bayesian classiﬁer to assist S&P 500 stock trading

and they show that it can provide substantial proﬁt.

(Arias et al., 2014) through extensive testing shows

that adding Twitter-related data (either in term of vol-

ume or public sentiment) to in non-linear time series

(SVMs or neural networks) will improve the predic-

tions of stocks or indexes.

Noise detection is a topic of interest since the

dawn of information retrieval. In the Vector Space

Model representation, the noise removal can be ad-

dressed at two levels. At feature level useless and

non-informative words are removed: normally this

problem is addressed with a lists of stopwords and

feature selection schemes (Yang, 1995; Gabrilovich

and Markovitch, 2004). At instance level are instead

removed non-informative documents, which could be

source of confusion for the classiﬁcation model. Here

can be ideally used the various noise detection tech-

niques proposed in IR, without considering the textual

nature of the single features. There exist in literature

a large amount of proposed methods, for example us-

ing K-nearest neighbors approach, neural networks,

decision trees, SVM or bayesian networks. In-depth

descriptions of all of these techniques have been re-

ported in surveys as (Chandola et al., 2009; Markou

and Singh, 2003).

3 METHODOLOGY

3.1 Benchmark Text Set

To obtain a comparative evaluation than the well-

known work of (Bollen et al., 2011b), we use the

same collection of tweets: that is about 10 million

tweets posted from January 1th to December 19th of

2008, by approximately 2.7M users. Following the

pre-processing applied by Bollen et al., only tweets

in english language that contain explicit statements

of the author’s mood state are taken into considera-

tion, i.e. those that contains one of this expressions

“i fell”, “i am feeling”, “i’m feeling”, “i dont feel”,

“I’m”, “Im”, “I am”, and “makes me”. Tweets that

contain links or that address the tweet content to an-

other user are removed. All tweets are tokenized in

single words and, as done by (Oliveira et al., 2013),

also the emoticons are considered into our model us-

ing three different tokens.

Figure 1 shows the daily closing values of DJIA.

To properly evaluate the models’ ability in the predic-

tion of DJIA prices, we split the benchmark set into i)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

8,000

9,000

10,000

11,000

12,000

13,000

14,000

training

test

validation

DJIA daily closing values

Figure 1: Daily closing values of the DJIA prices and split

of the benchmark set.

a training set with the ﬁrst seven months of the year

(from January 2 to July 31) to create the prediction

models; ii) a test set with two months, August and

September, with which we tune the models and apply

the noise detection; iii) ﬁnally a validation set with

the latest three months, from October 1st to Decem-

ber 19th, larger than the work of Bollen et al, which

refers to only 19 days of December and consequently

to only 15 days of opening stock market.

3.2 Vector Space Model Construction

Tweets are grouped according to the publication date

and will provide the information base to generate fu-

ture predictions on the stock market. As shown by the

experiments of Bollen et al, the higher correlation be-

tween social mood and the DJIA is obtained by group-

ing tweets of several days and shifting the prediction

for a certain time lag. Thus it becomes interesting to

evaluate the accuracy of the predictions considering

these two parameters in the forecasting model:

• Lag (l): temporal translation from the forecast

date, l = 0 means the day before the prediction.

• Aggregation (a): number of days to be aggregated

to make a prediction, a = 0 means only one day.

As a simple example, assume that we consider l = 1

and a = 2, to make the prediction on day t will be con-

sidered tweets published in the days t − 2, t − 3 and

t − 4. The range of days considered for the prediction

of day t will be: [t −1 − l − a,t − 1 − l].

According to the two previous parameters, all the

tweets related to the prediction of a day (in the previ-

ous example, all tweets of t − 2, t − 3 and t − 4) are

collected in a single Bag-of-Words. Given the high

number of tweets available, a dimensionality reduc-

tion is required. Once selected the tweets, stop-words

are removed and a stemming process is performed,

each term is then weighted using the common tf.idf

(Domeniconi et al., 2016). Finally, a number n

them, with greater weight, are selected.

The proposed DJIA prediction process is summa-

rized in Figure 2.

Figure 2: Diagram of the DJIA prediction process through

tweets aggregation. In this example the system predicts

the DJIA trend for 28/03/2008 using the aggregated tweets

posted in the previous four days (l = 0 and a = 3).

3.3 Noise Detection

Twitter provides a great deal of information, but is

necessary to understand what is useful for a given

analysis and what is not. Considering this, we pro-

pose a noise detection method to deﬁne what tweets

to use in the DJIA prediction model. Our idea can be

summarised in few steps:

1. Once created the representation of the data, as de-

scribed in the previous section, we train a classiﬁ-

cation model and we apply it on the test set.

2. We create four prototypes, one for each possi-

ble outcome of the classiﬁcation, i.e. true posi-

tive (TP), predicted days, true negative (TN), false

positive (FP) and false negative (FN). Each proto-

type is a BoW merging all the instances of the test

set, i.e. all the tweets of the a days before each

prediction.

3. We use prototypes to discover the noisy tweets in

the dataset. We propose to apply this method at

two different levels: i) a tweet level: removing

from the dataset all the tweets with cosine similar-

ity less than a threshold τ

with respect to the good

prototypes (TP and TN) or greater then a thresh-

old τ

with respect to the bad prototypes (FP and

FN); ii) a instance level: removing from the train-

ing set instances similar to the bad prototypes.

4. With the cleaned data set we train a new predic-

tion model using the training and test set and we

use it to classify the validation set.

4 RESULTS

We tested the effectiveness of the prediction varying

i) the classiﬁcation algorithm, we tested two different

supervised models using the Weka

implementation:

Decision tree (the J48 C4.5) and SVM (the SMO al-

gorithm), ii) the number n

of features (i.e. words)

www.cs.waikato.ac.nz/ml/weka/

Table 1: Results obtained in tuning the Decision Tree algo-

rithm.

Aggr Lag n

feat fMeasure

3 0 500 0.799

3 1 2000 0.736

3 0 1000 0.700

0 2 500 0.668

0 2 2000 0.660

2 2 500 0.657

3 2 2000 0.653

Table 2: Results obtained in tuning the SVM algorithm.

Aggr Lag n

feat fMeasure

2 1 1000 0.682

1 2 2000 0.668

3 2 1000 0.649

2 1 2000 0.649

1 3 2000 0.643

0 2 2000 0.642

2 1 500 0.642

selected in the dataset, iii) the aggregation a and iv)

the lag l parameters on the data cited above.

Before the application of the noise detection

method, we tested a simple prediction model based

on the VSM built as described in Section 3.2, varying

the parameters in order to discover the best tuning of

them. Tables 1 and 2 show the best results obtained

by the two supervised algorithms with the related pa-

rameters combination. A ﬁrst noteworthy aspect is

the aggregation parameter, that gives best results with

three days gathered, this conﬁrms the analysis done

by Bollen at al. in their work, in which authors ob-

tain the same consideration. This means that there

is a strong correlation between the information ex-

tracted in a couple of days before and the outcome of

a market trading day. In other words, the stock market

seems to be affected to the information, and thus event

or moods and so on, of the previous days. Moreover,

it is evident the best accuracy obtained by the Deci-

sion tree model, that with few features required (just

500), achieves a f-Measure almost of 80%. From now

on, every test is performed using the best combination

of parameters shown in Table 1 and 2.

Once deﬁned the best model, we applied the noise

detection method in order to clean the dataset. The

idea is to analyze the predictions made on the test set

in order to deﬁne four groups of predictions and use

those to ﬁnd only the useful tweets, or aggregations

of tweets, in the dataset.

First, we divided the test set instances based on

the outcome of the predictions. Among all the tested

instances, we selected only the predictions with the

probability given by the classiﬁer greater than 90%, in

(a) Mantaining tweets similar to the good prototypes. (b) Discarding tweets similar to the bad prototypes.

Figure 3: Tweets level noise detection experiments.

Table 3: Comparison with cosine similarities between in-

stances (aggregated tweets) belonging to the different four

groups. Each cell of the table is calculated as average value

of the comparison of all the related couples of instances.

TP TN FP FN

TP 0.819 0.828 0.779 0.772

TP 0.823 0.914 0.776 0.738

FP 0.779 0.776 0.848 0.770

FN 0.772 0.738 0.77 0.912

order to pull out only the surest among them. These

selected instances are then grouped based on the out-

come (i.e. TP, TN, FP, FN). In order to assess the

assumptions and the quality of the groupings made,

we calculated the cosine similarity between both in-

stances of the same group and belonging to different

groups; we expected that the instances belonging to

the same group should have a high similarity, while

should appear dissimilarities comparing instances of

different groups. These comparisons are shown in

the Table 3; the main diagonal contains the compar-

isons between instances belonging to the same group,

noteworthy is that these similarities are signiﬁcantly

greater than the other comparisons and this supports

our hypothesis underlying the noise detection method.

The ﬁrst noise detection experiment has been

made comparing all the single tweets in the dataset

(both training and validation sets) with the four pro-

totypes created aggregating the instances of the four

groups of predictions analyzed above. We conducted

a double experiment: i) keeping only the tweets simi-

lar to the two good prototypes, i.e. tweets whose co-

sine similarity with respect to TP or TN overcomes a

threshold τ

; ii) discarding all the tweets similar to

the bad prototypes, i.e. tweets whose cosine similar-

ity with respect to FP or FN overcomes a threshold

. Figure 3 shows the obtained results in both ex-

periments, varying the thresholds. Unfortunately, the

results do not show an improving trend by using this

noise detection technique.

A further proposal to detect and remove noise is

based on idea of that some training instances could

compromise the accuracy of the prediction model, as

outliers or simply containing noisy tweets. In this ex-

periment, we remove in the training set of the ﬁnal

classiﬁcation model all the instances that are similar

to the bad prototypes and thus could negatively af-

fect the model. Figure 4 shows the results obtained

with the best tuning using both a decision tree and a

SVM algorithm, varying the threshold τ in the noise

detection algorithm. Results show a noteworthy im-

provement using the noise detection method. In par-

ticular, using the Decision tree algorithm, we achieve

a f Measure = 0.889 that is an improvement of 10%

with respect to the results obtained in tests without the

training set cleaning techniques. Similar considera-

tions can be done when using a SVM classiﬁer; in this

case the improvement is even greater, since we started

from a f Measure = 0.682 and, with an improvement

of 27%, we obtain a maximum of f Measure = 0.867

when using the noise detection algorithm. By analyz-

ing the results obtained by the best model, we found

a f Measure related to the prediction of the positive

market day of 0.848 and to the negative day of 0.912.

The precision of the predictions in the validation set

is 88.9%, that is higher than the precision obtained by

Bollen at al. in their work, i.e. 86.7%.

A real comparison with the work of Bollen et

al. can be done considering the same testset of their

works, i.e. considering the 19 trading days in De-

cember 2008. Using this test set and training our

method with the ﬁrst 11 months of the year, we ob-

tain a perfect classiﬁcation (100%) of the 19 trading

days, showing a sharp improvement with respect to

the 86.7% obtained by Bollen et al.

5 CONCLUSION

In this paper, we have investigated whether the DJIA

trend in a trading day is affected by the contents of

tweets posted in the previous days. This correlation

(a) Decision tree supervised model. (b) SVM supervised model.

Figure 4: Instance level noise detection experiments.

was already shown in some works in literature that use

complex techniques to try to understand the semantic

content of the textual documents in order to predict

the stock market trends.

The aim of our work was to use a simple method,

based on the well-known Vector Space Model repre-

sentation and a supervised classiﬁer. We have also

introduced a noise detection technique, both at tweets

and instances (i.e. aggregation of tweets) level, used

to ﬁlter out from the data the large irrelevant corpus

of tweets retrieved. We have tested and compared the

method on the same tweets dataset and DJIA trends in

the whole 2008 used by (Bollen et al., 2011b). Results

shows that even a simple classiﬁcation model based

on the VSM achieves a good accuracy very close to

80%. This work have also demonstrated that our noise

detection technique is able to distinguish the irrele-

vant tweets and instances, thus noise, in the training

data, leading the accuracy to 88.9%, outperforming

both our base classiﬁer and the best prediction method

based on social network posts illustrated in (Bollen

et al., 2011b).

As future works we plan to further investigate pos-

sible correlations among different market indexes and

stock options expanding the analysis to other sources

of unstructured text streams.

REFERENCES

Antweiler, W. and Frank, M. Z. (2004). Is all that talk

just noise? the information content of internet stock

message boards. The Journal of Finance, 59(3):1259–

1294.

Arias, M., Arratia, A., and Xuriguera, R. (2014). Forecast-

ing with twitter data. ACM Trans. Intell. Syst. Tech-

nol., 5(1):8:1–8:24.

Asur, S. and Huberman, B. A. (2010). Predicting the future

with social media. In Web Intelligence and Intelligent

Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM

International Conference on, volume 1, pages 492–

499. IEEE.

Atsalakis, G. S. and Valavanis, K. P. (2009). Surveying

stock market forecasting techniques–part ii: Soft com-

puting methods. Expert Systems with Applications,

36(3):5932–5941.

Bahrammirzaee, A. (2010). A comparative survey of ar-

tiﬁcial intelligence applications in ﬁnance: artiﬁcial

neural networks, expert system and hybrid intelli-

gent systems. Neural Computing and Applications,

19(8):1165–1195.

Bollen, J., Mao, H., and Pepe, A. (2011a). Modeling pub-

lic mood and emotion: Twitter sentiment and socio-

economic phenomena. In ICWSM.

Bollen, J., Mao, H., and Zeng, X. (2011b). Twitter mood

predicts the stock market. Journal of Computational

Science, 2(1):1 – 8.

Butler, K. C. and Malaikah, S. (1992). Efﬁciency and

inefﬁciency in thinly traded stock markets: Kuwait

and saudi arabia. Journal of Banking & Finance,

16(1):197–210.

Chandola, V., Banerjee, A., and Kumar, V. (2009).

Anomaly detection: A survey. ACM Comput. Surv.,

41(3):15:1–15:58.

Chyan, T. P. H. H. A. and Lengerich, C. (2012). A stock-

purchasing agent from sentiment analysis of twitter.

Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C.

(2016). A comparison of term weighting schemes for

text classiﬁcation and sentiment analysis with a super-

vised variant of tf. idf. In Data Management Technolo-

gies and Applications: 4th International Conference,

DATA 2015, Colmar, France, July 20-22, 2015, Re-

vised Selected Papers, volume 584, page 39. Springer.

Fama, E. F. (1965). The behavior of stock-market prices.

The journal of Business, 38(1):34–105.

Fama, E. F. (1991). Efﬁcient capital markets: Ii. The journal

of ﬁnance, 46(5):1575–1617.

Gabrilovich, E. and Markovitch, S. (2004). Text categoriza-

tion with many redundant features: Using aggressive

feature selection to make svms competitive with c4. 5.

In Proceedings of the twenty-ﬁrst international con-

ference on Machine learning, page 41. ACM.

Gayo-Avello, D. (2012). ” i wanted to predict elections with

twitter and all i got was this lousy paper”–a balanced

survey on election prediction using twitter data. arXiv

preprint arXiv:1204.6441.

Ghiassi, M., Skinner, J., and Zimbra, D. (2013). Twitter

brand sentiment analysis: A hybrid system using n-

gram analysis and dynamic artiﬁcial neural network.

Expert Systems with Applications.

Gid

ofalvi, G. and Elkan, C. (2001). Using news articles to

predict stock price movements. Department of Com-

puter Science and Engineering, University of Califor-

nia, San Diego.

Gilbert, E. and Karahalios, K. (2010). Widespread worry

and the stock market. In ICWSM, pages 59–65.

Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A.

(2005). The predictive power of online chatter. In Pro-

ceedings of the eleventh ACM SIGKDD international

conference on Knowledge discovery in data mining,

pages 78–87. ACM.

Kimoto, T., Asakawa, K., Yoda, M., and Takeoka, M.

(1990). Stock market prediction system with modu-

lar neural networks. In Neural Networks, 1990., 1990

IJCNN International Joint Conference on, pages 1–6.

IEEE.

Li, X., Wang, C., Dong, J., Wang, F., Deng, X., and

Zhu, S. (2011). Improving stock market prediction

by integrating both market news and stock prices. In

Hameurlain, A., Liddle, S., Schewe, K.-D., and Zhou,

X., editors, Database and Expert Systems Applica-

tions, volume 6861 of Lecture Notes in Computer Sci-

ence, pages 279–293. Springer Berlin Heidelberg.

Lin, M.-C., Lee, A. J. T., Kao, R.-T., and Chen, K.-T.

(2008). Stock price movement prediction using repre-

sentative prototypes of ﬁnancial reports. ACM Trans.

Manage. Inf. Syst., 2(3):19:1–19:18.

Liu, Y., Huang, X., An, A., and Yu, X. (2007). Arsa:

a sentiment-aware model for predicting sales perfor-

mance using blogs. In Proceedings of the 30th annual

international ACM SIGIR conference on Research and

development in information retrieval, pages 607–614.

ACM.

Lo, A. W. and MacKinlay, A. C. (1988). Stock market

prices do not follow random walks: Evidence from a

simple speciﬁcation test. Review of ﬁnancial studies,

1(1):41–66.

Malkiel, B. G. (2003). The efﬁcient market hypothesis

and its critics. The Journal of Economic Perspectives,

17(1):59–82.

Mao, H., Counts, S., and Bollen, J. (2011). Predicting ﬁ-

nancial markets: Comparing survey, news, twitter and

search engine data. arXiv preprint arXiv:1112.1051.

Mao, Y., Wei, W., and Wang, B. (2013). Twitter volume

spikes: analysis and application in stock trading. In

Proceedings of the 7th Workshop on Social Network

Mining and Analysis, page 4. ACM.

Mao, Y., Wei, W., Wang, B., and Liu, B. (2012). Correlating

s&p 500 stocks with twitter data. In Proceedings of

the First ACM International Workshop on Hot Topics

on Interdisciplinary Social Networks Research, Hot-

Social ’12, pages 69–72, New York, NY, USA. ACM.

Markou, M. and Singh, S. (2003). Novelty detection: a re-

view?part 1: statistical approaches. Signal processing,

83(12):2481–2497.

Mishne, G. and de Rijke, M. (2006). Capturing global mood

levels using blog posts. In AAAI Spring Symposium:

Computational Approaches to Analyzing Weblogs’06,

pages 145–152.

Mittal, A. and Goel, A. (2012). Stock prediction using twit-

ter sentiment analysis.

Mittermayer, M.-A. and Knolmayer, G. (2006). Text mining

systems for market response to news: A survey. Insti-

tut f

ur Wirtschaftsinformatik der Universit

at Bern.

Oliveira, N., Cortez, P., and Areal, N. (2013). Some ex-

periments on modeling stock market behavior using

investor sentiment analysis and posting volume from

twitter. In Proceedings of the 3rd International Con-

ference on Web Intelligence, Mining and Semantics,

WIMS ’13, pages 31:1–31:8, New York, NY, USA.

ACM.

Porshnev, A., Redkin, I., and Shevchenko, A. (2013). Im-

proving prediction of stock market indices by analyz-

ing the psychological states of twitter users. HSE

Working papers WP BRP 22/FE/2013, National Re-

search University Higher School of Economics.

Qian, B. and Rasheed, K. (2007). Stock market predic-

tion with multiple classiﬁers. Applied Intelligence,

26(1):25–33.

Rao, T. and Srivastava, S. (2012). Twitter sentiment anal-

ysis: How to hedge your bets in the stock markets.

CoRR, abs/1212.1107.

Ruiz, E. J., Hristidis, V., Castillo, C., Gionis, A., and

Jaimes, A. (2012). Correlating ﬁnancial time series

with micro-blogging activity. In Proceedings of the

Fifth ACM International Conference on Web Search

and Data Mining, WSDM ’12, pages 513–522, New

York, NY, USA. ACM.

Samant, R. M. and Rao, S. (2011). The effect of noise in

automatic text classiﬁcation. In Proceedings of the In-

ternational Conference & Workshop on Emerging

Trends in Technology, ICWET ’11, pages 557–558,

New York, NY, USA. ACM.

Schumaker, R. P. and Chen, H. (2006). Textual analysis

of stock market prediction using ﬁnancial news. In

Americas Conference on Information Systems.

Sprenger, T. O., Tumasjan, A., Sandner, P. G., and Welpe,

I. M. (2013). Tweets and trades: The information con-

tent of stock microblogs. European Financial Man-

agement.

Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J.,

Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Pat-

wardhan, S. (2005). Opinionﬁnder: A system for sub-

jectivity analysis. In Proceedings of HLT/EMNLP on

Interactive Demonstrations, pages 34–35. Association

for Computational Linguistics.

Yang, Y. (1995). Noise reduction in a statistical approach to

text categorization. In Proceedings of the 18th annual

international ACM SIGIR conference on Research and

development in information retrieval, pages 256–263.

ACM.

Zhang, X., Fuehres, H., and Gloor, P. A. (2011). Predicting

stock market indicators through twitter ”i hope it is

not as bad as i fear”. Procedia-Social and Behavioral

Sciences, 26:55–62.