4.3 Experiment 2
In an attempt to combine LSTM and Transformer to
yield a superior model, several variations of LSTM +
Transformers were developed and trained on the five
financial instruments. First, there was the base LSTM
with 1 layer of 128 units plus the basic four block
transformer model (here named LSTM128+TX).
Subsequently, LSTM with one layer of 64 units, 32
units and three layers of 128, 64 and 32 units were
combined with the transformer model (there naming
following a similar pattern as LSTM128+TX). Sim-
ilarly, the Bidirectional LSTM was combined with
the transformer model with a familiar naming pattern:
BiLSTM128+TX, BiLSTM64+TX, BiLSTM32+TX
and BiLSTMAll+TX. These combinations and the
number of trainable parameters used are summarized
below in Table 1.
Table 1: Combined Models and Number of Parameters.
Model No. Parameters
LSTMAll + TX 833394
LSTM128 + TX 1523346
LSTM64 + TX 651474
LSTM32 + TX 301554
BiLSTMAll + TX 2083794
BiLSTM128 + TX 3430930
BiLSTM64 + TX 1392274
BiLSTM32 + TX 618706
These combinations of LSTM/BiLSTM and
Transformer were then ranked based on their MAPE,
Run-time and number of parameters. The results of
these tests (Experiment 2) are highlighted in the Re-
sults and Discussion section.
4.4 Experiment 3
For the third experiment, the three best models, (ei-
ther base model or combination models) were then
used to predict the S&P500 index and CF industries.
Prior to experiment 3, a quick summary of the best
performing models, based on experiment 1 and 2 was
done to help identify the 3 best models.
4.5 Experiment 4 and 5
In this final round of experiments, the two financial
instruments (S&P500 and CF Industries) were pre-
dicted with the three best models from experiment
3. Also, this final round of experiment saw an in-
depth review of the prediction of the CF industries
prediction by the best model. Thereafter, this pre-
diction/forecast was compared with the returns of in-
vesting in the S&P500 index in the last month of the
dataset used. In all, this final test was to see whether
the best model from these experiments would chal-
lenge the returns of the S&P500 index in 2024.
5 RESULTS AND DISCUSSIONS
Table 2: Sample of Experiment 1 Results - Gold.
Experiment 1: Gold
Model MAPE Run-Time No. Para
LSTM128 0.0143 84.27s 73265
BiLSTM128 0.0134 51.50s 146225
TX 0.0134 150.24s 17205
The Tables 2 and 3 highlight a sample of the results of
the first and second experiment where all models were
tested with the five financial instruments. From ex-
periment 1, one results that stood out was the lack of
consistency in the results of the Transformer model.
Over a run of multiple training iterations including
some of the same parameters and financial instru-
ments, the Transformer model failed to have similar
or comparable results. Moreover, the first experiment
also broke a notion that was held at the start of the
tests that the fewer number of parameters in the Trans-
former model would results in a shorter run-time. Sur-
prisingly, the Transformer model regularly took the
longest time to run the training but yielded compet-
itive results. On their part, the LSTM and Bidirec-
tional LSTM were more consistent with their results,
including have comparable outcomes in multiple iter-
ations of same financial instruments.
Table 3: Sample of Experiment 2 Results - Gold.
Experiment 2: Gold
Model MAPE Run-Tim No.Para
LSTMAll+TX 0.0140 351.37s 833394
LSTM128+TX 0.0155 333.65s 1523346
LSTM64+TX 0.0183 213.63s 651474
LSTM32+TX 0.0168 151.58s 301554
BiLSTMAll+TX 0.0147 604.48s 2083794
BiLSTM128+TX 0.0142 518.57s 3430930
BiLSTM64+TX 0.0199 280.53s 1392274
BiLSTM32+TX 0.0186 221.33s 618706
On experiment 2, the combination of
LSTM/BiLSTM with Transformer model pro-
duced models that had consistent results. It was
much easier to replicate a prediction with them which
validated their use. Of note was the Run-time of the
BiLSTMAll+TX and BiLSTM128+TX which was
often the longest in any of the categories tested. This
outcome is easily attributed to the total number of
LSTM versus Transformers: A Practical Comparison of Deep Learning Models for Trading Financial Instruments
547