compared to the 18-layer ResNet architecture. In
future work it could be investigated whether image
patches can be used to improve existing methods for
network initialization. This might also lead to a more
explainable training process.
REFERENCES
Angelov, P. and Soares, E. (2019). Towards explainable
deep neural networks (xdnn). ArXiv, abs/1912.02523.
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H.
(2007). Greedy layer-wise training of deep networks.
In Sch
¨
olkopf, B., Platt, J. C., and Hoffman, T., editors,
Advances in Neural Information Processing Systems,
volume 19, pages 153–160. MIT Press.
Castillo Camacho, I. and Wang, K. (2019). A sim-
ple and effective initialization of CNN for forensics
of image processing operations. In Proceedings of
the ACM Workshop on IH&MMSec, IH&MMSec’19,
pages 107–112, New York, NY, USA. ACM.
Dauphin, Y. N. and Schoenholz, S. (2019). Metainit: Ini-
tializing learning by learning to initialize. In Wallach,
H., Larochelle, H., Beygelzimer, A., d Alche-Buc, F.,
Fox, E., and Garnett, R., editors, Adv Neural Inf Pro-
cess Syst 32, pages 12645–12657. Curran Associates.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In CVPR09, pages 248–255, Miami,
Florida. IEEE.
Dietterich, T. G. (1998). Approximate statistical tests
for comparing supervised classification learning algo-
rithms. Neural Computation, 10(7):1895–1923.
Frankle, J. and Carbin, M. (2018). The lottery ticket hy-
pothesis: Finding sparse, trainable neural networks.
ArXiv, abs/1803.03635.
Glorot, X. and Bengio, Y. (2010). Understanding the dif-
ficulty of training deep feedforward neural networks.
In JMLR W&CP: Proceedings of the 13th Int Conf on
AI and Statistics, volume 9, pages 249–256.
Gray, S., Radford, A., and Kingma, D. P. (2017). GPU
kernels for block-sparse weights.
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2019).
Momentum contrast for unsupervised visual represen-
tation learning. ArXiv, abs/1911.05722.
He, K., Zhang, X., Ren, S., and Sun, J. (2015a). Deep
residual learning for image recognition. ArXiv,
abs/1512.03385.
He, K., Zhang, X., Ren, S., and Sun, J. (2015b). Delving
deep into rectifiers: Surpassing human-level perfor-
mance on imagenet classification. In Proceedings of
the 2015 IEEE ICCV, ICCV 2015, pages 1026–1034,
USA. IEEE Computer Society.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. ArXiv, abs/1502.03167.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko,
C. D., Silverman, R., and Wu, A. Y. (2002). An effi-
cient k-Means clustering algorithm: Analysis and im-
plementation. IEEE PAMI, 24(7):881–892.
Koturwar, S. and Merchant, S. (2017). Weight initialization
of deep neural networks (DNNs) using data statistics.
ArXiv, 1710.10570.
Kr
¨
ahenb
¨
uhl, P., Doersch, C., Donahue, J., and Darrell, T.
(2015). Data-dependent initializations of convolu-
tional neural networks. ArXiv, abs/1511.06856.
Krizhevsky, A. and Hinton, G. (2009). Learning multiple
layers of features from tiny images. Master’s thesis,
Department of Computer Science, Uni of Toronto.
LeCun, Y. and Cortes, C. (1998). The MNIST database of
handwritten digits.
Li, O., Liu, H., Chen, C., and Rudin, C. (2017). Deep learn-
ing for case-based reasoning through prototypes: A
neural network that explains its predictions. ArXiv,
abs/1710.04806.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. Int J Comput Vis, 60(2):91–110.
McInnes, L., Healy, J., and Melville, J. (2018). UMAP:
Uniform manifold approximation and projection for
dimension reduction. ArXiv, 1802.03426.
Mishkin, D. and Matas, J. (2015). All you need is a good
init. ArXiv, abs/1511.06422.
Misra, I. and van der Maaten, L. (2019). Self-supervised
learning of pretext-invariant representations. ArXiv,
abs/1912.01991.
¨
Ozbulak, G. and Ekenel, H. K. (2018). Initialization of
convolutional neural networks by gabor filters. 26th
Signal Processing and Communications Applications
Conference (SIU), pages 1–4.
Pearson, K. (1901). LIII. On lines and planes of closest fit to
systems of points in space. The London, Edinburgh,
and Dublin Philosophical Magazine and Journal of
Science, 2(11):559–572.
Robbins, H. and Monro, S. (1951). A stochastic approxi-
mation method. Ann Math Stat, 22:400–407.
Ruder, S. (2016). An overview of gradient descent opti-
mization algorithms. ArXiv, abs/1609.04747.
Saxe, A. M., McClelland, J. L., and Ganguli, S. (2013). Ex-
act solutions to the nonlinear dynamics of learning in
deep linear neural networks. ArXiv, abs/1312.6120.
Seuret, M., Alberti, M., Liwicki, M., and Ingold, R. (2017).
PCA-initialized deep neural networks applied to docu-
ment image analysis. In 14th IAPR International Con-
ference on Document Analysis and Recognition (IC-
DAR), volume 01, pages 877–882.
Xie, N., Ras, G., van Gerven, M., and Doran, D. (2020).
Explainable deep learning: A field guide for the unini-
tiated. ArXiv, abs/2004.14545.
Zeiler, M. D. and Fergus, R. (2013). Visualizing
and understanding convolutional networks. ArXiv,
abs/1311.2901.
Zhang, H., Dauphin, Y. N., and Ma, T. (2019). Fixup ini-
tialization: Residual learning without normalization.
ArXiv, abs/1901.09321.
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H.,
Xiong, H., and He, Q. (2019). A comprehensive sur-
vey on transfer learning. ArXiv, abs/1512.03385.
Are Image Patches Beneficial for Initializing Convolutional Neural Network Models?
353