Figure 4: Training time for each task from 0 to 9. Note that
it is not cumulative time. As going for next task, the gener-
ative model of VCL took much more time while VCAE is
only taking less than 300 sec for every task.
ble training. This allows multiple tasks to be learned
without increasing network capacity nor introducing
new parameters to the networks.
Future works should extend this framework for
high dimensional random variables. Also, VCAE can
be adopted to deep reinforcement learning tasks to
support continual learning at large scale so that agents
can learn new knowledge by itself in real-time.
ACKNOWLEDGEMENTS
This research was supported by the MSIT(Ministry
of Science, ICT), Korea, under the ITRC(Information
Technology Research Center) support program(IITP-
2018-2014-1-00639) supervised by the IITP(Institute
for Information & communications Technology Pro-
motion)
REFERENCES
Cichon, J. and Gan, W.-B. (2015). Branch-specific den-
dritic ca2+ spikes cause persistent synaptic plasticity.
Nature, 520(7546):180–185.
Fagot, J. and Cook, R. G. (2006). Evidence for large long-
term memory capacities in baboons and pigeons and
its implications for learning and the evolution of cog-
nition. Proceedings of the National Academy of Sci-
ences, 103(46):17564–17567.
French, R. M. (1999). Catastrophic forgetting in con-
nectionist networks. Trends in cognitive sciences,
3(4):128–135.
Ghahramani, Z. and Attias, H. (2000). Online variational
bayesian learning. In Slides from talk presented at
NIPS workshop on Online Learning.
Glorot, X. and Bengio, Y. (2010). Understanding the diffi-
culty of training deep feedforward neural networks.
In Proceedings of the thirteenth international con-
ference on artificial intelligence and statistics, pages
249–256.
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and
Bengio, Y. (2013). An empirical investigation of
catastrophic forgetting in gradient-based neural net-
works. arXiv preprint arXiv:1312.6211.
Hassabis, D., Kumaran, D., Summerfield, C., and
Botvinick, M. (2017). Neuroscience-inspired artificial
intelligence. Neuron, 95(2):245–258.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hinton, G. E. and Zemel, R. S. (1994). Autoencoders, mini-
mum description length and helmholtz free energy. In
Advances in neural information processing systems,
pages 3–10.
Hubel, D. H. and Wiesel, T. N. (1968). Receptive fields and
functional architecture of monkey striate cortex. The
Journal of physiology, 195(1):215–243.
Kingma, D. P. and Welling, M. (2013). Auto-encoding vari-
ational bayes. arXiv preprint arXiv:1312.6114.
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J.,
Desjardins, G., Rusu, A. A., Milan, K., Quan, J.,
Ramalho, T., Grabska-Barwinska, A., et al. (2017).
Overcoming catastrophic forgetting in neural net-
works. Proceedings of the national academy of sci-
ences, page 201611835.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Li, Z. and Hoiem, D. (2018). Learning without forgetting.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(12):2935–2947.
Lin, M., Chen, Q., and Yan, S. (2013). Network in network.
arXiv preprint arXiv:1312.4400.
McCloskey, M. and Cohen, N. J. (1989). Catastrophic in-
terference in connectionist networks: The sequential
learning problem. In Psychology of learning and mo-
tivation, volume 24, pages 109–165. Elsevier.
Nguyen, C. V., Li, Y., Bui, T. D., and Turner, R. E. (2018).
Variational continual learning. In International Con-
ference on Learning Representations.
Radford, A., Metz, L., and Chintala, S. (2015). Unsu-
pervised representation learning with deep convolu-
tional generative adversarial networks. arXiv preprint
arXiv:1511.06434.
Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-
miller, M. (2014). Striving for simplicity: The all con-
volutional net. arXiv preprint arXiv:1412.6806.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2014). Dropout: a simple way
to prevent neural networks from overfitting. The Jour-
nal of Machine Learning Research, 15(1):1929–1958.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.
(2017). Inception-v4, inception-resnet and the impact
of residual connections on learning. In AAAI, vol-
ume 4, page 12.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
372