odically updating the expert tree to accommodate the
changing relationships.
HGE also suffers from poor efficiency when
adding a new expert to the tree, as all descendants
of siblings have to be checked for masking. More ef-
ficient approaches to building the expert tree may be
possible. The use of autoencoders is also not strictly
necessary and other methods for measuring expert
suitability could be considered.
In this section we provide hyperparameters and other
details crucial to recreating our results.
Datasets. In the PMNIST, SMNIST and MNIST-
KMNIST scenarios, the 28x28 images are flattened
into 784-dimensional vectors. In all other scenarios,
all images are first resized to 32x32 before further
