Table 2: Mean training and validation loss for sequential and parallel architectures and various determination ratios Q inter-
Q ∈ [0, 1) Q ∈ [1,3) Q ∈ [3, 10) Q ∈ [10, ∞)
train val train val train val train val
sequential 0.00013 0.05201 0.01702 0.12449 0.03620 0.11743 0.11246 0.13550
parallel 0.00009 0.07551 0.02679 0.11468 0.05238 0.11467 0.13310 0.14900
sequential 0.25326 2.03107 0.72510 1.31691 1.07333 1.34721 1.58608 1.65354
parallel 0.52658 1.32386 0.88701 1.24884 1.17085 1.34227 1.63449 1.68879
bers of network parameters) is close to each other,
with a slight advance of shallow architectures in terms
of loss on the validation set.
While the deep architecture performed marginally
better on the training set, the cause of its underperfor-
mance on the validation set remains an open question.
It is plausible that the deep architecture’s ability to
capture abrupt nonlinearities may also make it prone
to overfitting to noise. In contrast, the shallow net-
work, due to its inherent smoothness, might exhibit a
higher tolerance towards training set noise.
In conclusion, our results suggest a potential par-
ity in the performance of deep and shallow architec-
tures. It is important to note that the optimization
algorithm utilized in this study is a first-order one,
which lacks guaranteed convergence properties. Fu-
ture research could explore the application of more ro-
bust second-order algorithms, which, while not com-
monly implemented in prevalent software packages,
could yield more pronounced results. This work
serves as a preliminary step towards reevaluating ar-
chitectural decisions in the field of neural networks,
urging further exploration into the comparative effi-
cacy of shallow and deep architectures.
