method retains information from both tasks with a
loss in performance up to 2%.
5 CONCLUSIONS
In this paper, we proposed a novel and simple proof-
of-concept transfer learning approach that inherently
allows for selective forgetting. Our training method
enables for network aggregation at test time, i.e. the
weights of two networks (trained on two different
datasets) are aggregated together, such that the result-
ing network can work on both datasets without any
further training/adaptation step.
We achieve that by introducing an aggregation
regulariser, that enables the networks to also learn
the aggregation operation in an end-to-end training
framework. We used the sum as aggregation operator,
as it is invertible and differentiable. VGG-like archi-
tectures were used as feature extractors, using Group
Normalisation in lieu of Batch Normalisation.
Our experimental results demonstrated that the
proposed approach allows for test-time transfer learn-
ing without any further training steps. Furthermore,
we showed that our training procedure is commuta-
tive: the aggregated network N
1
⊕ N
2
obtains the same
performance of N
2
⊕ N
1
. Moreover, we demonstrated
that our method allows for selective forgetting (at the
cost of up to 2% testing performance).
The proposed method has some limitations: (i) it
requires that all the networks involved in the training
share the same architecture; (ii) the selective forget-
ting does not allow to forget a subset of the dataset;
(iii) we evaluated it on just two benchmark datasets
(although the proposed framework can easily accom-
modate for multiple datasets). As future work, we will
generalise our approach exploring the training with N
i
deep neural networks, for i = 1, ...,n, with n being the
number of datasets, in a federated learning scenario.
ACKNOWLEDGEMENTS
This work was funded by the Edinburgh Napier Uni-
versity internally funded project “Li.Ne.Co.”
REFERENCES
Chen, X., Wang, S., Fu, B., Long, M., and Wang, J. (2019).
Catastrophic forgetting meets negative transfer: Batch
spectral shrinkage for safe transfer learning. In Wal-
lach, H., Larochelle, H., Beygelzimer, A., d'Alch
´
e-
Buc, F., Fox, E., and Garnett, R., editors, Advances
in Neural Information Processing Systems 32, pages
1906–1916. Curran Associates, Inc.
Chen, Y., Qin, X., Wang, J., Yu, C., and Gao, W. (2020).
Fedhealth: A federated transfer learning framework
for wearable healthcare. IEEE Intelligent Systems,
35(4):83–93.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 248–255.
Geyer, R., Corinzia, L., and Wegmayr, V. (2019). Transfer
learning by adaptive merging of multiple models. In
Cardoso, M. J., Feragen, A., Glocker, B., Konukoglu,
E., Oguz, I., Unal, G., and Vercauteren, T., editors,
Proceedings of The 2nd International Conference on
Medical Imaging with Deep Learning, volume 102
of Proceedings of Machine Learning Research, pages
185–196. PMLR.
Golatkar, A., Achille, A., Ravichandran, A., Polito, M.,
and Soatto, S. (2021). Mixed-privacy forgetting in
deep networks. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 792–801.
Golatkar, A., Achille, A., and Soatto, S. (2020). Eternal
sunshine of the spotless net: Selective forgetting in
deep networks. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and
Bengio, Y. (2013). An empirical investigation of
catastrophic forgetting in gradient-based neural net-
works. arXiv preprint arXiv:1312.6211.
Han, X., Huang, Z., An, B., and Bai, J. (2021). Adaptive
transfer learning on graph neural networks.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling
the knowledge in a neural network. arXiv preprint
arXiv:1503.02531.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. In International conference on ma-
chine learning, pages 448–456. PMLR.
LeCun, Y., Cortes, C., and Burges, C. (2010). Mnist hand-
written digit database. ATT Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist, 2.
Lee, S.-W., Kim, J.-H., Jun, J., Ha, J.-W., and Zhang, B.-
T. (2017). Overcoming catastrophic forgetting by in-
cremental moment matching. In 31st Conference on
Neural Information Processing Systems (NIPS 2017),
Long Beach, CA, USA.
Litrico, M., Battiato, S., Tsaftaris, S. A., and Giuffrida,
M. V. (2021). Semi-supervised domain adaptation for
holistic counting under label gap. Journal of Imaging,
7(10).
Loh, A., Karthikesalingam, A., Mustafa, B., Freyberg, J.,
Houlsby, N., MacWilliams, P., Natarajan, V., Wilson,
M., McKinney, S. M., Sieniek, M., Winkens, J., Liu,
Y., Bui, P., Prabhakara, S., and Telang, U. (2021). Su-
pervised transfer learning at scale for medical imag-
ing.
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
648