layer is set to the same value as the real trained neural
network. Then, pruning unimportant connections fol-
lowing q. Although a network with 5 layers is calcu-
lated, to set the conditions the same as the real trained
neural network, nodes in the dense layers are attached
for deleting links that are not reachable to the out-
put layer. The number of motifs is calculated against
10 weight randomized neural networks. The average
value of the result is used as a reference value. The
value shown in Fig. 5 are values that are subtracted
by the reference value. Note that, since the struc-
ture of neural networks obtained by FG and CFVG is
the same, the same weight randomized networks are
used. MFINDER1.21 is used for detecting network
motifs (Milo et al., 2004).
From the result of Fig. 5, we can see a tendency
that neural network obtained with CFVG have a rela-
tively large number of bifan and diamond motif, while
they have less number of those motifs with one link
removed from bifan and diamond motif especially
when q is 20. Moreover, although FG has the same
tendency, the tendency is even stronger in CFVG ob-
tained neural networks. This suggest that neural net-
works obtained by CFVG are more modular than that
by FG. And, the reason for CFVG having the advan-
tage in mitigating catastrophic forgetting could be be-
cause of the such structure the networks gained.
6 CONCLUSION AND FUTURE
WORK
In this paper, we explore the application of MVG to
mitigate catastrophic forgetting in a slightly practical
situation, that is applying it to classification of small
sized real images and applying it to the increment of
goals. From the result, we find that varying goals
can improve catastrophic forgetting using SGD in a
CIFAR-10 based classification problem. We find that,
when learning a large set of goals, a relatively small
switching interval is required to have the advantage
of mitigating catastrophic forgetting. On the other
hand, when learning a small set of goals, an appro-
priate large switching interval is preferred since this
less worsens the advantage and also can improve ac-
curacy. Also, from exploring the obtained neural net-
work structure, we find that, after pruning some unim-
portant connections, it shows strong motif of bifan
and diamond motifs, which suggest that the obtained
neural networks are modular, and this could be the
reason of mitigating catastrophic forgetting.
For future work, the proposed approach should be
examined in other tasks and other layer structures, and
theoretical analysis is needed.
ACKNOWLEDGMENT
Thanks to Nadav Kashtan for providing his source
code. This work was supported by JSPS KAKENHI
Grant Number JP19K20415. The computational re-
source was partially provided by large-scale computer
systems at the Cybermedia Center, Osaka University.
REFERENCES
Alon, U. (2003). Biological networks: The tinkerer as an
engineer. Science, 301(5641):1866–1867.
Amer, M. and Maul, T. (2019). A review of modulariza-
tion techniques in artificial neural networks. Artificial
Intelligence Review, 52(1):527–561.
Ellefsen, K. O., Mouret, J.-B., and Clune, J. (2015). Neural
modularity helps organisms evolve to learn new skills
without forgetting old skills. PLoS Computational Bi-
ology, 11(4):e1004128.
French, R. M. (1999). Catastrophic forgetting in con-
nectionist networks. Trends in Cognitive Sciences,
3(4):128–135.
Hou, L. and Kwok, J. T. (2018). Power law in spar-
sified deep neural networks. arXiv e-prints, page
arXiv:1805.01891.
Kashtan, N. and Alon, U. (2005). Spontaneous evolution
of modularity and network motifs. Proceedings of
the National Academy of Sciences, 102(39):13773–
13778.
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J.,
Desjardins, G., Rusu, A. A., Milan, K., Quan, J.,
Ramalho, T., Grabska-Barwinska, A., Hassabis, D.,
Clopath, C., Kumaran, D., and Hadsell, R. (2017).
Overcoming catastrophic forgetting in neural net-
works. Proceedings of the National Academy of Sci-
ences, 114(13):3521–3526.
Li, Y., Wang, S., Tian, Q., and Ding, X. (2015). A survey
of recent advances in visual feature detection. Neuro-
computing, 149:736–751.
Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr,
S., Ayzenshtat, I., Sheffer, M., and Alon, U. (2004).
Superfamilies of evolved and designed networks. Sci-
ence, 303(5663):1538–1542.
Newman, M. E. (2006). Modularity and community
structure in networks. Proceedings of The National
Academy of Sciences, 103(23):8577–8582.
Parter, M., Kashtan, N., and Alon, U. (2008). Facilitated
variation: How evolution learns from past environ-
ments to generalize to new environments. PLoS Com-
putational Biology, 4(11):e1000206.
Ramesh, B., Yang, H., Orchard, G. M., Le Thi, N. A.,
Zhang, S., and Xiang, C. (2019). Dart: Distribu-
tion aware retinal transform for event-based cameras.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, pages 1–1.
Sporns, O. and K¨otter, R. (2004). Motifs in brain networks.
PLoS Biology, 2(11):e369.