generalization performance reached the highest value
(0.992) when the number of learning steps in proto-
type learning was 1000. Even in other multiple activa-
tion learning cases, generalization (0.984 and 0.985)
was significantly better than that of the single activa-
tion models (0.941 and 0.917).
The results confirm that detecting and combining
prototype and non-prototype learning can contribute
to improved generalization and interpretation. In par-
ticular, interpretation is greatly facilitated because the
prototype, with its minimal network configuration,
has a considerable effect on learning.
5 CONCLUSION
The present paper aimed to demonstrate that neural
learning should begin with the extraction of the pro-
totype, the simplest network within the given network
resources, followed by non-prototype learning on de-
tailed input information. The prototype is intended to
be determined as independently as possible from any
inputs, though ideally. The importance of the pro-
totype can be demonstrated by comparing a network
that easily acquires the prototype with one that does
not, using multi-activation techniques. By changing
the activation function from the hyperbolic tangent
at the beginning to the ReLU function in later learn-
ing steps, we observed a significant improvement in
generalization performance. Additionally, the final
weights retain the trace of the prototype learning from
the beginning and are easily understood. The extrac-
tion of the prototype should play a critical role in
training neural networks, making their internal repre-
sentations more comprehensible and enhancing gen-
eralization.
Finally, we should address several future direc-
tions. First, we need to resolve issues inherent to po-
tentiality and ratio potentiality. Due to its simplicity
and stability, potentiality is limited to the absolute val-
ues of connection weights. The ratio potentiality at-
tempts to estimate how much the estimated individual
potentiality exceeds the supposed potentiality. This
simplification is used to highlight the importance of
inputs for easier interpretation, as larger weights are
considered more important. However, in actual sce-
narios discussed in this paper, negative weights ap-
pear to play significant roles in some cases. Thus, it is
necessary to incorporate the negative effect or “nega-
tive potentiality” to make the potentiality framework
more general. Second, exploring different types of
activation functions for prototype extraction is pos-
sible. In the multiple activation learning, only two
standard activation functions were used for ease of
reproduction, but many other activation functions ex-
ist. It would be interesting to use them for extracting
the ideal prototype. Additionally, it may be possible
to identify a single activation function or an idealized
activation function that captures the properties of both
prototype and non-prototype learning. Finally, apply-
ing the method to larger and more practical datasets is
crucial to determine if our approach can address prac-
tical problems that require not only improved general-
ization but also enhanced interpretation. Understand-
ing the inner workings of neural networks is consid-
ered more important than merely improving general-
ization.
REFERENCES
Apicella, A., Donnarumma, F., Isgr
`
o, F., and Prevete, R.
(2021). A survey on modern trainable activation func-
tions. Neural Networks, 138:14–32.
Bucilu
ˇ
a, C., Caruana, R., and Niculescu-Mizil, A. (2006).
Model compression. In Proceedings of the 12th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, pages 535–541. ACM.
Carlucci, L. and Case, J. (2013). On the necessity of u-
shaped learning. Topics in cognitive Science, 5(1):56–
88.
Case, J. and K
¨
otzing, T. (2016). Strongly non-u-shaped lan-
guage learning results by general techniques. Informa-
tion and Computation, 251:1–15.
Chao, W.-L., Changpinyo, S., Gong, B., and Sha, F. (2016).
An empirical study and analysis of generalized zero-
shot learning for object recognition in the wild. In
Computer Vision–ECCV 2016: 14th European Con-
ference, Amsterdam, The Netherlands, October 11-14,
2016, Proceedings, Part II 14, pages 52–68. Springer.
Corkery, M., Matusevych, Y., and Goldwater, S. (2019).
Are we there yet? encoder-decoder neural networks
as cognitive models of english past tense inflection.
arXiv preprint arXiv:1906.01280.
Emanuel, R. H., Docherty, P. D., Lunt, H., and M
¨
oller, K.
(2024). The effect of activation functions on accu-
racy, convergence speed, and misclassification confi-
dence in cnn text classification: a comprehensive ex-
ploration. The Journal of Supercomputing, 80(1):292–
312.
Jagtap, A. D. and Karniadakis, G. E. (2023). How impor-
tant are activation functions in regression and classifi-
cation? a survey, performance comparison, and future
directions. Journal of Machine Learning for Modeling
and Computing, 4(1).
Kim, J., Oh, T.-H., Lee, S., Pan, F., and Kweon, I. S.
(2019). Variational prototyping-encoder: One-shot
learning with prototypical images. In Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 9462–9470.
Kirov, C. and Cotterell, R. (2018). Recurrent neural net-
works in linguistic theory: Revisiting pinker and
Searching for Idealized Prototype Learning for Interpreting Multi-Layered Neural Networks
485