2 RELATED WORK
In this section, we briefly review the research which
investigates the correlation between the Twitter data
and financial markets. Most of the research used sen-
timent analysis (Ranco et al., 2015; Pagolu et al.,
2016; Oliveira et al., 2017), whereas analyzing the
user’s mood has not been largely explored (Nofer and
Hinz, 2015). This might be because sentiment anal-
ysis has shown to be significantly affecting the stock
market while the users’ mood has shown no signifi-
cant correlation with the stock market. Also, the ma-
jority of the research was conducted on English tweets
(Ranco et al., 2015; Pagolu et al., 2016; Oliveira et al.,
2017). One study has used Germany tweets (Nofer
and Hinz, 2015); however, the tweets were translated
into English before the analysis.
Ranco et al. investigated the relations between
30 stocks of the DJIA index and Twitter data (Ranco
et al., 2015). They collected over 1.5 million En-
glish tweets using the Twitter Streaming API. Three
financial experts have labelled 100,000 tweets using
three sentiment labels: negative, neutral or positive.
They used SVM model to classify 1.5 million tweets
to compare the data of the stock price returns with
tweets polarity. The results have shown a significant
relationship between the abnormal stock returns with
sentiments of tweets.
A more focused analysis of one company was ex-
plored by Pagolu et al. (Pagolu et al., 2016). The
aim of this study was to find if there is a correla-
tion between the public opinions of the company with
stock prices of that company. 2.5 million tweets about
Microsoft were collected using Twitter API, however,
only 3,216 tweets were annotated by a human. A ma-
chine learning model was built using the Random For-
est algorithm with an accuracy of 70.2%. They com-
pared the sentiments of the tweets with stock price
data of Microsoft, and a strong correlation between
them were found.
Oliveira et al. have analyzed the tweets to fore-
cast the stock market behavior (Oliveira et al., 2017).
They collected roughly 31 million tweets using Twit-
ter REST API. The collected tweets contain hashtags
of all stocks traded in US markets. A lexicon-based
model was used to extract the sentiment of the tweets.
Many machine learning algorithms such as Neural
Network, SVM and Random Forest were used to pre-
dict the stock market. The results have shown that the
stock market behavior can be predicted using senti-
ment analysis of twitter data.
Some studies have investigated the effect of users’
mood in the stock market. Nofer and Hinz conducted
an empirical study to explore the correlation between
the people’s mood and the stock market (Nofer and
Hinz, 2015). They collected around 100 million Ger-
man tweets using Twitter API and included only pos-
itive and negative tweets using a dictionary of key-
words. They translated German tweets into English
in order to use the ASTS tool. The DAX intraday
returns of 30 major German companies were used in
the analysis. The result has shown no significant rela-
tionship between the stock market and Twitter users’
mood.
In this study, we aim to analyze Arabic tweets re-
lated to the Saudi stock market. The sentiments can be
further analysed to examine the correlation between
tweets and the Saudi stock index.
3 METHODOLOGY
This section describes the framework that we present
to analyze the sentiment of Arabic tweets for the
Saudi Stock market. In general, there are three ap-
proaches that one can handle this problem: super-
vised learning, lexicons or using a hybrid of both. In
the current work, we adopt supervised learning to ex-
tract the sentiment of tweets. Figure 1 illustrates the
framework of the sentiment analysis process, starting
from collecting the data until the visualisation of the
results.
3.1 Data Collection
As one of the most popular social media platforms in
Saudi Arabia, Twitter has been selected for this study
as the data source for the sentiment analysis. In or-
der to collect the tweets related to the Saudi stock
market, the Twitter search API
2
was queried with a
specific keywords such as "A` rJ¥m", "¤d",
"¨FA", etc. However, Twitter’s standard search API
only searches against a sampling of recent tweets pub-
lished in the past 7 days. Thus, to collect tweets in one
year period, the IDs of users who publish tweets re-
lated to Saudi stock market were extracted. Then, we
extracted the timelines of these users and filter them
based on the keywords and date. The irrelevant tweets
(e.g. advertisement tweets) were eliminated.
A total of 5209 Arabic language tweets over the
period of January 1st, 2019 to December 13th, 2019
related to the Saudi stock market were extracted from
twitter API (after excluding redundant and irrelevant
tweets, e.g. advertisement tweets). Each tweet record
contains: (1) tweet identifier, (2) date/time of cre-
ation, and (3) text. The tweets used in this study are
based on data obtained from public timelines.
2
https://dev.twitter.com/
Saudi Stock Market Sentiment Analysis using Twitter Data
37