where:
Close
n
represents the Closing price at timestep n.
Close
n—30
represents the Closing price at timestep
n—30, which is the Closing price from 30 trade days
in the past.
Table 1: Stock Lists in Dataset.
AAPL, MSFT, AMZN, META, TSLA, SPY,
GOOGL, GOOG, BRK-B, JNJ, JPM, NVDA, V,
DIS, PG, UNH, MA, BAC, NFLX, QQQ
000001.ss, 399001.sz, ^HSCE, ^HSCC, ^GSPC,
^DJI, ^IXIC, ^SP500-20
While the Close Price is set as the output feature to
be predicted, all features including Closing price are
set as the input features for model input.
4.2 Data Preprocessing
In data preprocessing phase, several steps were
implemented to prepare the data for model training
according to the objective of predicting the next
closing price by utilizing data of the past 30 trade days.
Initially, the first 1870 data points in the dataset are set
as the training set and the 130 data points left are set
as testing set. Particularly, the splitting was based on
timestep, and the testing set contained the most recent
130 data points. Consequently, predicted results of the
testing set directly demonstrated the predictability of
model in newest stock price trend.
To avoid the potential bias effect due to scale
difference among feature values, such as Volume and
Close Price, a rescaling process was applied to ensure
the uniformity within dataset. Such a process
transformed all feature columns, in both training and
testing set, into a consistent range of [0,1]. For each
feature, the original value x
i
in time i was transformed
by
x
i_normalized
= (x
i
— min
i
) / (max
i
— min
i
), ()
where:
x
i_normalized
represents the normalized value of x
i
.
min
i
represents the minimum value among feature
values in training set.
max
i
represents the maximum value among feature
values in training set.
It is essential to underscore that normalization
process applied to testing set adheres to the minimum
and maximum values derived from the training set.
This approach prevented the potential data leaking
from future values in the testing set. If normalization
in testing set utilize minimum and maximum values
derived from the testing set itself, the normalized data
will obtain the information for future data and
diminish the effectiveness of evaluating testing set
results.
Next, both the training set and testing set were
processed by time sequence transform with timestep
of 30. For any closing price y
i
at time i, the input data
was constructed by the preceding 30 data points,
ranging from x
i-1
to x
i—30
. Each x contained 6 feature
values. In other words, this step constructed the data
into format such that the model inputs the data of past
30 days and predicts the next closing price. After
finishing all data processing, the datasets obtained
dimensions listed in TABLE II.
4.3 Model Architecture
Mootha’s and Shah’s research showed Long Short-
Term Memory (LSTM) and Bidirectional Long Short-
Term Memory (BiLSTM) models obtained a good fit
for price value predictions with low RMSE (Mootha
et al 2020 & Shah et al 2021). Thus, LSTM and
BiLSTM were used for validating the efficiency of DI-
MSE in this study as well. As Sunny’s research on
hyperparameter tunning suggested that fewer number
of layers in LSTM and BiLSTM algorithm is likely to
improve the model fitting, one layer was applied in
constructing the following model architectures (Sunny
et al 2020).
LSTM Architecture: the LSTM model structure
comprises a single LSTM layer with 200 units. To
prevent potential overfitting, a L2 regularization with
a strength of 1×10
—6
is applied within the LSTM
layer. The following layer is a dropout layer using 0.3
dropout rate. The structure ends with a single dense
layer. The training set is divided into batches of 32 and
processes 200 epochs of training based on the Adam
optimizer.
BiLSTM Architecture: the BiLSTM model
structure comprises a single BiLSTM layer. The
BiLSTM layer is built by one forward LSTM layer
and one backward LSTM layer. Both LSTM layers are
consisted with 200 units and applied with a L2
regularization with a strength of 1×10
—6
. There is a
dropout layer using 0.3 dropout rate after the BiLSTM
layer. This structure ends with a single dense layer.
The training set also is divided into batches of 32 and
processes 200 epochs of training based on the Adam
optimizer.
4.4 Validation Procedure
For each stock in the dataset, the validation procedure
followed these steps:
1) With the training set, utilized the LSTM
Architecture to train two models: one using MSE as
loss function and one using DI-MSE as loss function.
DAML 2023 - International Conference on Data Analysis and Machine Learning
122