Table 2: Mean training and validation loss for sequential and parallel architectures and various determination ratios Q inter-
vals.
Q ∈ [0, 1) Q ∈ [1,3) Q ∈ [3, 10) Q ∈ [10, ∞)
train val train val train val train val
MNIST
sequential 0.00013 0.05201 0.01702 0.12449 0.03620 0.11743 0.11246 0.13550
parallel 0.00009 0.07551 0.02679 0.11468 0.05238 0.11467 0.13310 0.14900
CIFAR10
sequential 0.25326 2.03107 0.72510 1.31691 1.07333 1.34721 1.58608 1.65354
parallel 0.52658 1.32386 0.88701 1.24884 1.17085 1.34227 1.63449 1.68879
bers of network parameters) is close to each other,
with a slight advance of shallow architectures in terms
of loss on the validation set.
While the deep architecture performed marginally
better on the training set, the cause of its underperfor-
mance on the validation set remains an open question.
It is plausible that the deep architecture’s ability to
capture abrupt nonlinearities may also make it prone
to overfitting to noise. In contrast, the shallow net-
work, due to its inherent smoothness, might exhibit a
higher tolerance towards training set noise.
In conclusion, our results suggest a potential par-
ity in the performance of deep and shallow architec-
tures. It is important to note that the optimization
algorithm utilized in this study is a first-order one,
which lacks guaranteed convergence properties. Fu-
ture research could explore the application of more ro-
bust second-order algorithms, which, while not com-
monly implemented in prevalent software packages,
could yield more pronounced results. This work
serves as a preliminary step towards reevaluating ar-
chitectural decisions in the field of neural networks,
urging further exploration into the comparative effi-
cacy of shallow and deep architectures.
REFERENCES
Fukushima, K. (1980). Neocognitron: A self-organizing
neural network model for a mechanism of pattern
recognition unaffected by shift in position. Biologi-
cal Cybernetics, 36(4):193–202.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recog-
nition (CVPR), pages 770–778, Las Vegas, NV, USA.
IEEE.
Hinton, G. (2012). Neural Networks for Machine Learning.
Hrycej, T., Bermeitinger, B., Cetto, M., and Handschuh,
S. (2023). Mathematical Foundations of Data Sci-
ence. Texts in Computer Science. Springer Interna-
tional Publishing, Cham.
Hrycej, T., Bermeitinger, B., and Handschuh, S.
(2022). Number of Attention Heads vs. Number of
Transformer-encoders in Computer Vision. In Pro-
ceedings of the 14th International Joint Conference on
Knowledge Discovery, Knowledge Engineering and
Knowledge Management, pages 315–321, Valletta,
Malta. SCITEPRESS.
Krizhevsky, A. (2009). Learning Multiple Layers of Fea-
tures from Tiny Images. Dataset, University of
Toronto.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. In Advances in Neural Information Pro-
cessing Systems, volume 25. Curran Associates, Inc.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard,
R. E., Hubbard, W., and Jackel, L. D. (1989). Back-
propagation Applied to Handwritten Zip Code Recog-
nition. Neural Computation, 1(4):541–551.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Meir, Y., Tevet, O., Tzach, Y., Hodassman, S., Gross, R. D.,
and Kanter, I. (2023). Efficient shallow learning as
an alternative to deep learning. Scientific Reports,
13(1):5423.
Mhaskar, H., Liao, Q., and Poggio, T. (2017). When and
Why Are Deep Networks Better Than Shallow Ones?
Proceedings of the AAAI Conference on Artificial In-
telligence, 31(1).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., Berg, A. C., and Fei-Fei, L. (2015). Ima-
geNet Large Scale Visual Recognition Challenge. In-
ternational Journal of Computer Vision, 115(3):211–
252.
Simonyan, K. and Zisserman, A. (2015). Very Deep Con-
volutional Networks for Large-Scale Image Recogni-
tion.
Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015).
Training very deep networks. In Advances in Neural
Information Processing Systems. Curran Associates,
Inc.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. In Chaud-
huri, K. and Salakhutdinov, R., editors, Proceedings of
the 36th International Conference on Machine Learn-
ing, volume 97 of Proceedings of Machine Learning
Research, pages 6105–6114. PMLR.
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
346