an overview of what Transformer is and how Trans-
former is utilized in this study. Results and findings
are presented in Section 5 with conclusions in Section
6.
2 RELATED WORK
Time series modeling is widely utilized across do-
mains like geography (Hu et al., 2018) and eco-
nomics (Nyoni, 2018). Applying various Deep neural
networks (DNN) for forecasting and pattern recog-
nition on BTC price has been popular in recent
years. Ji et al. (Ji et al., 2019) did a forecasting
capability comparison on BTC price within DNN,
LSTM (Hochreiter and Schmidhuber, 1997), Con-
volutional neural networks(CNN), and deep residual
networks (ResNet). Although there is no an over-
all winner in the competition, LSTM slightly outper-
forms in forecasting future BTC prices, while CNN
outperforms others on indicating price moving di-
rection. Facebook has created a regression model
called PROPHET which is optimized for the busi-
ness forecast tasks (Ben Letham, 2017). Yenidogan
et al. (Yenidogan et al., 2018) proved success of this
model in forecasting BTC future price by comparing
PROPHET with ARIMA model. The result shows
PROPHET outperforms ARIMA by 26% on R
2
.
Sentiment has been shown to be a factor that im-
pacts BTC future price. Guerra et al. (Guerra et al.,
2020) proved the correlation between BTC price and
web sentiment (Twitter sentiment, Wikipedia search
queries and Google search queries) by utilizing Sup-
port Vector Machine (SVM) model. By combin-
ing Fuzzy Transform on forecasting BTC price with
Google trend data, the authors’ study showed that web
searches data can help on short-term BTC price pre-
diction. Serafini et al. (Serafini et al., 2021) composed
a dataset which contains daily BTC weighted price,
BTC volume, sentiment from Twitter and tweets vol-
ume, applied Auto-Regressive Integrated Moving Av-
erage with eXogenous input (ARIMAX) and LSTM-
based RNN model on the data. They found that
the linear model ARIMAX performs better than the
LSTM-based RNN model on BTC price prediction.
They also discovered out that the tweets sentiment in-
stead of tweets volume is the most significant factor in
predicting BTC price. Raju and Tarif’s research (Raju
and Tarif, 2020) has also utilized sentiment analy-
sis. They collected sentiment data from two sources:
Twitter and Reddit. By applying both the LSTM and
ARIMA model on a dataset composed of BTC price
data and sentiment data, the authors found LSTM
did better on the BTC price forecasting regression
task. The study also indicated that combined senti-
ment data from different sources can improve the pre-
dicting result. Prajapati’s (Prajapati, 2020) research
compared CNN, (Gated Recursive Unit) GRU and
LSTM model’s performance on a dataset that com-
posed by BTC’s open, high, low, close, volume, Lit-
coin’s close, volume, ETH’s close, volume and senti-
ment data from Google news and Reddit. The result
shows LSTM can give the lowest Root mean squared
error (RMSE) on predicting BTC price.
Instead of looking at the regression problem, Kil-
imci et al. (Kilimci, 2020) is focused on BTC price
moving direction classification problem. A compari-
son between deep learning architectures(CNN, LSTM
and RNN ) and word embedding models(Word2Vec,
GloVe and FastText) on predicting BTC price moving
direction using Twitter sentiment data is done. The
research result shows that the word embedding model
FastText (Joulin et al., 2017) (Mikolov et al., 2019)
achieves the best result with 89.13% accuracy. The
performance order was FastText > LSTM > CNN >
RNN ∼= GloVe > Word2Vect.
Like FastText, Transformer (Vaswani et al., 2017)
is popular for NLP tasks. Transformer is based on
the multi-head attention mechanism (Vaswani et al.,
2017) which allows the model to understand coher-
ent relationships between the past tokens and the cur-
rent token in NLP tasks. Based on the assumption
that a time series is a sentence, a time point is a posi-
tion in the sentence, and the data at the time point can
be considered as the word in the position of the sen-
tence. Under this assumption, the Transformer with
multi-head attention can be utilized as a time series
forecasting tool.
Li et al. (Li et al., 2019) proved the assump-
tion. The authors implemented a model with a dual
attention layer to predict next time point public senti-
ment against P2P companies: Yucheng Group, Kuailu
Group and Zhongjin Group. The time series that the
model was applied on contains data points which are
composed of micro blog post content which contains
0-140 Chinese words, author, pubtime, number of
fans, and user category. LSTM, SVM, CNN and a
model composed by two layers of LSTM with SVM
were compared with the proposed model. The re-
sult shows that the proposed model with a dual atten-
tion layer was the winner. The study also suggested
Transformer can capture long-term dependencies not
captured by LSTM (Li et al., 2019). They also pro-
posed convolutional self-attention and sparse atten-
tion to further improve Transformer’s performance
by incorporating local context and reducing memory
cost.
Attention! Transformer with Sentiment on Cryptocurrencies Price Prediction
99