classification task.
The remainder of this paper is organized as fol-
lows. Section 2 describes the relevant related works.
Section 3 briefly presents the two aforementioned
works giving an outline of the methodologies and the
underlying theory. Section 4 introduces the experi-
mental work, describing the problems on which our
networks have been trained, and explaining how the
tools have been applied on these models. Section 5
presents the numerical results of both the test perfor-
mance of the pruned networks, and the similarity be-
tween layers of pruned and unpruned networks, elab-
orating an interpretation of these ones; moreover, it
formulates some prompts for future work.
2 RELATED WORK
The contribution of the present paper is the follow-
ing: by combining two recent advances addressing
over-parametrization (Frankle and Carbin, 2019) and
hidden layers representation and comparison (Raghu
et al., 2017) in ANNs, we aim at providing a layer-
wise analysis of similarity in the representation of
CNNs for vision tasks. We here review previous
works that are somehow related to our aim.
2.1 Pruning Techniques for ANNs
Pruning techniques for ANNs have been proposed for
decades. Early attempts include L1 regularization on
the loss function (Goodfellow et al., 2016) in order to
induce sparsity in the parameters, or operating pool-
ing on the fully-connected layer(s) (Lin et al., 2013).
(Han et al., 2015) introduced a new technique
based pruning parameter with small magnitude, on
which IMP (Frankle and Carbin, 2019) is based upon.
More recently, a plethora of other techniques has
been proposed, like ADMM (Zhang et al., 2018),
or techniques for structured (block) pruning, sum-
marised in (Crowley et al., 2018).
Our paper focuses solely on IMP, while considera-
tions on other pruning techniques is left for the future.
2.2 Comparison of ANNs
Despite being a recent work, (Raghu et al., 2017) has
already prompted a number of researches utilizing
CCA in order to gain some knowledge on the simi-
larities between neural networks: for instance, (Wang
et al., 2018) use it to compare, layer-wise, the same
network when initialized differently, finding that “sur-
prisingly, representations learned by the same convo-
lutional layer of networks trained from different ini-
tializations are not similar [...] at least in terms of
subspace match”.
(Morcos et al., 2018) argued about weaknesses of
Mean CCA Similarity, instead proposing a new sim-
ilarity metric for layers, called Projection Weighted
Canonical Correlation Analysis.
On the other hand, other researches have intro-
duced different methodologies to achieve the compar-
ison: (Yu et al., 2018), for example, proposed a tech-
nique based upon the Riemann curvature information
of the “manifolds composed of activation vectors in
each fully-connected layer” of two deep neural net-
works. This technique is still at an early stage since
it enables comparison on fully-connected layers only,
and cannot be used for an analysis like ours.
(Kornblith et al., 2019), instead, offered some
considerations on CCA as a tool for layers compar-
ison in neural networks, arguing that it cannot “mea-
sure meaningful similarities between representations
of higher dimensions than the number of data points”,
hence proposingyet another methodology called Cen-
tered Kernel Alignment.
2.3 Pruned vs. Unpruned ANNs
To our knowledge, ours is the first work concerning
an in-depth, layer-by-layer analysis of the similarities
for pruned ANNs.
(Frankle and Bau, 2019) delved into the mechan-
ics of IMP (and other related magnitude pruning tech-
niques) by analyzing the interpretability (computed
through the identification of “convolutional units that
recognize particular human-interpretable concepts”)
of those networks, finding that pruning does not re-
duce it and prompting the conclusion that “parame-
ters that pruning considers to be superfluous for ac-
curacy are also superfluous for interpretability”. This
work does discuss the topic of pruned networks com-
parison, but it is rather a global analysis, not going
into the detail of the single layers. This work may be
thought of as an attempt, akin ours, to combine prun-
ing techniques with other recent advances in order to
gain additional insights on what may be called “prun-
ing dynamics".
(Morcos et al., 2018), instead, use CCA to com-
pare output layers of fully-trained, dense CNNs hav-
ing different number of filters in their convolutional
layers. The authors of the cited work attempt to
corroborate the Lottery Ticket Hypothesis (see Sec-
tion 3.1), but, in doing this, they do not actually oper-
ate any pruning, neither they compare hidden layers,
focusing solely on the output representation.