used an ANN after feature selection. Algorithm 3
describes the steps for generating the predicted data
using an ANN. Among the input data, we prepared
dependent and independent variables as training data
with another time zone because we would predict the
next day of the current pattern. Specifically, given
historical time of similar pattern ht, the time of the
dependent variable is ht + 1 and the time of the inde-
pendent variable is ht. After the independent and de-
pendent variables were bound, we generated an ANN-
based model using the neuralnet function provided by
R. Then, the independent variables at the current time
t in the model were input and the predicted data were
generated.
Algorithm 3: Algorithm for generation of predicted
data.
input : tr dependent represents the total
completion price at historical time
ht + 1, tr independent represents the
remaining variables excluding the total
completion price at historical time ht,
te dependent represents the remaining
variables excluding the total
completion price at current time t
output: predicted is a dataset generated by
ANN
1 run(’training < −
cbind(tr dependent,tr independent)’);
2 run(’colnames(training) < −
c(’output’,’input’)’);
3 run(’ANN result < − neuralnet(output input,
training, hidden=1∼5,act.fct=’tanh’)’);
4 run(predicted < − ’prediction(ANN result,
te dependent)’);
5 return predicted;
Step 5 (Verification using RMSE): To verify the
validity of the proposed model, we selected RMSE
as a measure of prediction accuracy; the function was
also provided in R. The measure was computed from
comparisons between real and predicted data.
6 EVALUATION
In this section, we describe the one-year test data pro-
vided by Koscom and evaluate the accuracy of each
stock item by computing the RMSE.
6.1 Dataset and Test Scenario
To prove the effectiveness of the proposed model, we
used a real historical stock dataset consisting of var-
ious items for the one-year period from August 2014
to July 2015. To measure the prediction accuracy, we
prepared three items (Hyundai Motor Company, KIA
Motors, and Samsung Electronics) as companies rep-
resenting the Republic of Korea, with their stock data
for August 1, 2014, to July 28, 2015, as the training
data, and their stock data for July 29–31, 2015, as the
test data. As a test scenario, first, two predicted stock
data for one day were generated according to the pro-
posed model and feature selection. Then, we checked
the prediction accuracy by using the RMSE values to
compare the predicted and real stock data.
6.2 Evaluation of Prediction Accuracy
We performed experiments to compute the accuracy
of the proposed method. Figure 7 compares the actual
data and two data values predicted by the proposed
model with only feature selection for July 31, 2015.
The x-axis represents the time at five-minute intervals
and the y-axis represents the total completion price,
i.e., stock price according to the time. First, Figure 7
(a) compares the results of Hyundai Motor Company
stock; we can see that the stock movement change
of the proposed model is closer to the real stock data
than that of only feature selection. In particular, this
can be an especially clear view of the rising curve of
the morning and the declining curve of the afternoon.
Figure 7 (b) shows the stock data derived from the
real and predicted data for KIA Motors. In contrast to
Figure 7 (a), there are slight differences between the
stock movement change of the proposed model and
the real data, whereas there is no clear view of the
rising and declining curve in the graph. Lastly, Fig-
ure 7 (c) depicts the stock data derived from the real
and predicted data for Samsung Electronics. As com-
pared with only the feature selection graph, the stock
movement change of the proposed model is similarly
drawn to the real data despite a slight difference in
price.
In this study, we selected RMSE as a measure of
prediction in order to verify the validity of our model
because this measure is frequently used in the stock
domain. Figure 8 shows the experimental results of
the proposed model and only feature selection us-
ing RMSE. In Figure 8 (a) and (b), we can see that
there are good predictions except on July 30, when
the interesting aspect is the same item. For this rea-
son, we can estimate that there are variables affect-
ing the same theme, not variables that affect individ-
Stock Price Prediction based on Stock Big Data and Pattern Graph Analysis
229