unsupervised learning in deep neural networks. We
described both the traditional energy-based method,
which is based on a linear representation of neural
units, and the proposed approach, which is based on
nonlinear representation of neurons. We have proved
that maximization of the log-likelihood input data
distribution of restricted Boltzmann machine is
equivalent to minimizing the cross-entropy and to
special case of minimizing the mean squared error.
Thus using MSE training criterion we can get both
conventional and novel learning rules.
REFERENCES
Hinton, G., Osindero, S., Teh, Y., 2006. A fast learning
algorithm for deep belief nets. Neural Computation,
18, 1527-1554.
Hinton, G., 2002. Training products of experts by
minimizing contrastive divergence. Neural
Computation, 14, 1771-1800.
Hinton, G., Salakhutdinov, R., 2006. Reducing the
dimensionality of data with neural networks. Science,
313 (5786), 504-507.
Hinton, G. E., 2010. A practical guide to training restricted
Boltzmann machines. (Tech. Rep. 2010-000). Toronto:
Machine Learning Group, University of Toronto.
Krizhevsky, A., Sutskever, L., Hinton, G., 2012. ImageNet
classification with deep convolutional neural networs.
In Proc. Advances in Neural information Processing
Systems, 25, 1090-1098.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning
Nature, 521 (7553), 436-444.
Mikolov, T, Deoras, A., Povey, D., Burget, L., Cernocky,
J., 2011. Strategies for training large scale neural
network language models. In Automatic Speech
Recognition and Understanding, 195-201.
Hinton, G. at al., 2012. Deep neural network for acoustic
modeling in speech recognition. IEEE Signal
Processing Magazine, 29, 82-97.
Bengio, Y., 2009. Learning deep architectures for AI.
Foundations and Trends in Machine Learning, 2(1), 1-
127.
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.,
2007. Greedy layer-wise training of deep networks. In
B. Sch\"olkopf, J. C. Platt, T. Hoffman (Eds.),
Advances in neural information processing systems,
11, pp. 153-160. MA: MIT Press, Cambridge
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A.,
Vincent, P., Bengio, S., 2010. Why does unsupervised
pre-training help deep learning? Journal of Machine
Learning Research, 11:625-660.
Larochelle H., Bengio Y., Louradour J., Lamblin P., 2009
Exploring strategies for training deep neural
networks//Journal of Machine Learning Research 1, 1-
40.
Bengio, Y., Courville, A., Vincent, P., 2013.
Representation learning a review and new
percpectives. IEEE Trans. Pattern Anal. Machine
Intell. 35, 1798-1828.
Glorot, X., Bordes, A., & Bengio, Y., 2011. Deep sparse
rectifier networks. In Proceedings of the 14th
International Conference on Artificial Intelligence and
Statistics. JMLR W&CP Volume (Vol. 15, pp. 315-
323).
Golovko, V., Kroshchanka A., Rubanau U., Jankowski S.,
2014. A Learning Technique for Deep Belief Neural
Networks. In book Neural Networks and Artificial
Intelligence, Springer, 2014. – Vol. 440.
Communication in Computer and Information
Science. – P. 136-146.
Golovko, V., Kroshchanka, A., Turchenko, V., Jankowski,
S., Treadwell, D., 2015. A New Technique for
Restricted Boltzmann Machine Learning. Proceedings
of the 8th IEEE International Conference IDAACS-
2015, Warsaw 24-26 September 2015. – Warsaw,
2015 –P.182-186.
Golovko, V., From multilayers perceptrons to deep belief
neural networks: training paradigms and application,
Lections on Neuroinformatics, Golovko, V.A., Ed.,
Moscow: NRNU MEPhI, 2015, pp. 47–84 [in
Russian].
Golik, P. Cross-Entropy vs. Squared Error Training: a
Theoretical and Experimental Comparison / P. Golik,
P. Doetsch, H. Ney // In Interspeech. - Lyon, France,
2013. – P. 1756-1760.
Glorot, X. and Bengio, Y.. 2010. Understanding the
difficulty of training deep feed-forward neural
networks. in Proc. of Int. Conf. on Artificial
Intelligence and Statistics, vol. 9, Chia Laguna Resort,
Italy, 2010, pp. 249–256.