0.292
ProbBelAct (element-
wise trainable)
0.278
3.6 Results on Reduced CIFAR 100
A similar experiment as described in Section 3.5 was
also performed for CIFAR 100. The CIFAR 100
dataset was split into 10,000 training images and
60,000 test images. Table 5 shows the results for the
reduced CIFAR 100. When using the ProbBelAct
with a single trainable parameter, we achieved the
best performance in this case. The scores increased by
1.5% and 0.7% compared to ReLU and CReLU,
respectively.
4 DISCUSSION
In Section 3.3 and 3.4, it is observed that the
improved score on CIFAR 100 is larger than that of
CIFAR 10. It has been proposed that ProbAct can be
considered an augmentation operation (Shridhar et
al., 2019). BelAct is also viewed as an augmentation
technique; however, the operation is essentially
different from that of ProbAct (see Section 2).
ProbBelAct achieved the best score in the cases of
CIFAR 100 (original case and reduced case), and it
should be noted that the size of training data per class
of CIFAR 100 is ten per cent of the case of CIFAR
10. It could be suggested that ProbBelAct further
extends the diversity of representation space
compared to the ProbAct.
There are other methods for training neural
networks which use random distributions. One well-
known method is the dropout layer, which generalises
by vanishing the units at random during training and
can be interpreted as a model ensemble method.
Moreover, several studies on the effects of adding
noise to weights, inputs, and gradients have been
conducted. Conversely, the BelAct and ProbBelAct
follow the concept proposed by Bengio et al. (2013),
like ProbAct: stochastic neurones with sparse
representations allow internal regularisation.
5 CONCLUSIONS
We proposed a novel activation function that has a
probabilistic Beltrami coefficient, called BelAct.
Adding the operation of ProbAct, ProbBelAct was
also presented. The proposed activation function
shows better performance compared with the baseline
models on both CIFAR 10 and CIFAR 100 datasets.
In particular, ProbBelAct achieved the best score
on the CIFAR 100 dataset. It could be suggested that
ProbBelAct brings a richer representation of features
compared with ProbAct and BelAct on small datasets.
In future, we intend to apply our method to smaller
image classification tasks. Furthermore, we will
verify the effectiveness of BelAct and ProbBelAct in
natural language processing tasks.
ACKNOWLEDGEMENTS
We would like to thank the reviewers for their time
spent reviewing our manuscript. This work was
supported by JSPS KAKENHI (Grant Number
20K23330).
REFERENCES
Ahlfors, L.V. (2006). Lectures on quasiconformal
mappings: second edition, University Lecture Series,
Vol. 38, American Mathematical Society, Providence.
Arjovsky, M., Shah, A., Bengio, Y. (2016). Unitary
evolution recurrent neural networks. arXiv preprint
arXiv:1511.06464.
Barrachina, J. A. (2019). Complex-valued neural networks
(CVNN), Available: https://github.com/NEGU93/cvnn.
Bengio, Y., Léonard, N., Courville, A. (2013). Estimating
or propagating gradients stochastic neurons for
conditional computation. arXiv preprint
arXiv:1308.3432.
Clevert, D.-A., Unterthiner, T., Hochreiter, S. (2015). Fast
and accurate deep network learning by exponential
linear units (elus). arXiv preprint arXiv:1511.07289.
Guberman, N. (2016). On complex valued convolutional
neural networks. arXiv preprint arXiv:1602.09046.