
9. The best (lowest) ask on the LOB at time t.
10. The difference between the current time and the
time of the previous trade.
11. The quantity of all quotes on the LOB at time t.
12. An estimate P
∗
of the competitive equilibrium
price.
13. Smith’s α metric using the P
∗
estimate of the com-
petitive equilibrium price at time t.
14. The target variable: the price of the trade.
When performing inference, our model takes in
the first 13 multivariate inputs to produce the target
variable, item 14, namely the price at which it is will-
ing to trade at a specific time in the market (the quote
placed by the trader).
3.0.1 Data Generation and Preprocessing
TBSE provided five working trading agents that
were used to generate the training data, as
included here: github.com/MichaelRol/Threaded-
Bristol-Stock-Exchange. To create a large and diver-
sified training dataset, the market simulations were
run using 5 types of traders in different proportions,
with a total of 40 traders per simulation. The follow-
ing proportion-groups of 20 traders per side of the ex-
change (buyers or sellers) were used: (5, 5, 5, 5, 0),
(8, 4, 4, 4, 0), (8, 8, 2, 2, 0), (10, 4, 4, 2, 0), (12, 4,
2, 2, 0), (14, 2, 2, 2, 0), (16, 2, 2, 0, 0), (16, 4, 0, 0,
0), (18, 2, 0, 0, 0), and (20, 0, 0, 0, 0). Each number
in a specific position corresponds to a population of
traders of a certain type for a market simulation. For
example, for the specification (12, 4, 2, 2, 0), there
are 12 ZIC, 4 ZIP, 2 GDX, 2 AA, and no Giveaway
traders for both the buyers and sellers sides.
Using all the unique permutations of the propor-
tions resulted in 270 trader schedules, in which the
5 traders participate equally. This ensured that the
model trains to generalise from a varied and rich set
of market scenarios. Each schedule was executed for
44 individual trials, amounting to 270 × 44 = 11880
market sessions. Each simulation represents one mar-
ket hour, requiring roughly one minute of wall-clock
time. If running on a single computer, generating this
amount of data would require approximately 8.6 days
of continuous execution, generating roughly 13 mil-
lion LOB snapshots. To address this time constraint,
the decision was made to use cloud computing to dis-
tribute computation across several worker nodes.
It is generally good practice to normalise the in-
puts of a network due to performance concerns, par-
ticularly for Deep Learning architectures like LSTMs.
Normalising the inputs helps ensure that all features
are contained within a similar range and prevents one
feature from dominating the others. For example,
we have features with different scales, such as the
time, which runs from 0 to 3600, while the quote
type is binary. So by normalising, we only have val-
ues in the [0,1] interval. Doing this ensures improved
convergence of the optimisation algorithm and helps
the model generalise better to new data. The choice
was to use min-max normalisation, given that we are
working with multivariate features derived from fi-
nancial data.
3.0.2 Model Architecture and Training
Contrary to the usual practices for training and vali-
dating a DLNN, which consist of splitting the dataset
into training, validation, and test subsets, we used all
the dataset for training. Markets are a combination
of unique factors, so our trader’s profit is heavily de-
pendent on what is happening in a specific simula-
tion. Considering this, it is without purpose to as-
sess its performance relative to historic data by judg-
ing the absolute values of our target variable. Rather,
as the model produced a good drop in the loss level
during training, the DLNN was validated by quantify-
ing how well DTX performed in live market simula-
tions against other traders in terms of PPT. Our dataset
is large and was generated using unique simulations;
thus, DTX doesn’t learn to replicate specific scenar-
ios; rather, it grows its ability to adapt and generalise
in any condition.
The architecture of the Deep Learning model that
DTX relies on is illustrated in Figure 1. It is com-
prised of three hidden layers, a LSTM with 10 units
(neurons), and two consequent Dense layers with 5
and 3 units, respectively, all using the Rectified Linear
Unit (ReLU) activation function. The output layers
use a ”linear” activation function, chosen as suitable
for a continuous output variable.
When dealing with large datasets, training should
be done in batches to accommodate memory limita-
tions and speed up training. Our network accommo-
dates this with a custom data generator based on the
Sequence class, used by Keras to train a model in
batches. To balance accuracy and training times, a
batch size of 16384 was chosen. To to balance poten-
tial overfitting and long convergence times we chose
a learning rate of µ = 1.5 × 10
−5
. The DLNN uses
the Adam optimizer for its ability to efficiently con-
verge to a good solution, prevention of overfitting,
and incorporation of momentum, speeding up learn-
ing and improving generalisation performance. The
model was trained in approximately 22 hours, lever-
aging the GPU clusters of the Blue Crystal 4 super-
computer.
The model was trained for 20 epochs. An epoch
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
416