
odically updating the expert tree to accommodate the
changing relationships.
HGE also suffers from poor efficiency when
adding a new expert to the tree, as all descendants
of siblings have to be checked for masking. More ef-
ficient approaches to building the expert tree may be
possible. The use of autoencoders is also not strictly
necessary and other methods for measuring expert
suitability could be considered.
REFERENCES
Aljundi, R., Chakravarty, P., and Tuytelaars, T. (2017). Ex-
pert gate: Lifelong learning with a network of experts.
In 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 7120–7129.
Chaudhry, A., Rohrbach, M., Elhoseiny, M., Ajanthan, T.,
Dokania, P. K., Torr, P. H. S., and Ranzato, M. (2019).
On tiny episodic memories in continual learning.
Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A.,
Yamamoto, K., and Ha, D. (2018). Deep learning for
classical japanese literature. CoRR, abs/1812.01718.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition.
Hemati, H., Lomonaco, V., Bacciu, D., and Borth, D.
(2023). Partial hypernetworks for continual learning.
Hihn, H. and Braun, D. A. (2023). Hierarchically structured
task-agnostic continual learning. Machine Learning,
112(2):655–686.
Kang, H., Mina, R. J. L., et al. (2022). Forget-free continual
learning with winning subnetworks. In Proceedings of
the 39th International Conference on Machine Learn-
ing, pages 10734–10750. PMLR.
Kirichenko, P., Farajtabar, M., Rao, D., Lakshminarayanan,
B., Levine, N., Li, A., Hu, H., Wilson, A. G., and Pas-
canu, R. (2021). Task-agnostic continual learning with
hybrid probabilistic models. CoRR, abs/2106.12772.
Kirkpatrick, J., Pascanu, R., et al. (2017). Over-
coming catastrophic forgetting in neural networks.
Proceedings of the National Academy of Sciences,
114(13):3521–3526.
Krizhevsky, A. (2012). Learning multiple layers of features
from tiny images. University of Toronto.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Lee, S., Ha, J., Zhang, D., and Kim, G. (2020). A neural
dirichlet process mixture model for task-free continual
learning. CoRR, abs/2001.00689.
Li, Z. and Hoiem, D. (2017). Learning without forgetting.
Lin, D. (2013). Online learning of nonparametric mixture
models via sequential variational approximation. In
Advances in Neural Information Processing Systems,
volume 26. Curran Associates, Inc.
Lopez-Paz, D. and Ranzato, M. (2017). Gradient
episodic memory for continuum learning. CoRR,
abs/1706.08840.
Nguyen, C. V., Li, Y., Bui, T. D., and Turner, R. E. (2018).
Variational continual learning.
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and
Wermter, S. (2018). Continual lifelong learning with
neural networks: A review. CoRR, abs/1802.07569.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., Berg, A. C., and Fei-Fei, L. (2015). Ima-
genet large scale visual recognition challenge.
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le,
Q., Hinton, G., and Dean, J. (2017). Outrageously
large neural networks: The sparsely-gated mixture-of-
experts layer.
Shin, H., Lee, J. K., Kim, J., and Kim, J. (2017). Continual
learning with deep generative replay.
Wang, L., Zhang, X., Su, H., and Zhu, J. (2024). A compre-
hensive survey of continual learning: Theory, method
and application.
Wortsman, M., Ramanujan, V., Liu, R., Kembhavi, A.,
Rastegari, M., Yosinski, J., and Farhadi, A. (2020).
Supermasks in superposition.
Xu, J. and Zhu, Z. (2018). Reinforced continual learning.
Yan, Q., Gong, D., Liu, Y., van den Hengel, A., and Shi,
J. Q. (2022). Learning bayesian sparse networks with
full experience replay for continual learning. In 2022
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 109–118.
Yan, S., Xie, J., and He, X. (2021). DER: Dynamically Ex-
pandable Representation for Class Incremental Learn-
ing . In 2021 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 3013–
3022, Los Alamitos, CA, USA. IEEE Computer Soci-
ety.
Yoon, J., Yang, E., Lee, J., and Hwang, S. J. (2018). Life-
long learning with dynamically expandable networks.
In 6th International Conference on Learning Repre-
sentations, ICLR 2018.
Zenke, F., Poole, B., and Ganguli, S. (2017). Continual
learning through synaptic intelligence. In Proceed-
ings of the 34th International Conference on Machine
Learning - Volume 70, ICML’17, page 3987–3995.
JMLR.org.
Zeno, C., Golan, I., Hoffer, E., and Soudry, D. (2018). Task
agnostic continual learning using online variational
bayes. arXiv preprint arXiv:1803.10123.
Zhu, H., Majzoubi, M., Jain, A., and Choromanska, A.
(2022). Tame: Task agnostic continual learning us-
ing multiple experts.
APPENDIX
In this section we provide hyperparameters and other
details crucial to recreating our results.
Datasets. In the PMNIST, SMNIST and MNIST-
KMNIST scenarios, the 28x28 images are flattened
into 784-dimensional vectors. In all other scenarios,
all images are first resized to 32x32 before further
Hierarchically Gated Experts for Efficient Online Continual Learning
517