Foundations and Trends in Machine Learning, 2(1):1–
127.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu,
R., Desjardins, G., Turian, J., Warde-Farley, D., and
Bengio, Y. (2010). Theano: a CPU and GPU math
expression compiler. In Proceedings of the Python for
Scientific Computing Conference (SciPy).
Cho, K., Ilin, A., and Raiko, T. (2011). Improved learning
of gaussian-bernoulli restricted boltzmann machines.
In Proceedings of the 21th international conference
on Artificial neural networks - Volume Part I, pages
10–17, Berlin, Heidelberg. Springer-Verlag.
Dahl, G. (2012). Deep learning how I did it:
Merck 1st place interview. http://blog.kaggle.
com/2012/11/01/deep-learning-how-i-did-it-merck -
1st-place-interview/. Accessed September 29, 2014.
Dahl, G., Yu, D., Deng, L., and Acero, A. (2012).
Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition. Audio,
Speech, and Language Processing, IEEE Transactions
on, 20(1):30–42.
Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin,
V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F.,
and Bengio, Y. (2013). Pylearn2: a machine learning
research library. arXiv preprint arXiv:1308.4214.
Hinton, G. and Salakhutdinov, R. (2006). Reducing the di-
mensionality of data with neural networks. Science,
313(5786):504–507.
Hinton, G. E. (2002). Training products of experts by min-
imizing contrastive divergence. Neural Computation,
14(8):1771–1800.
Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast
learning algorithm for deep belief nets. Neural Com-
putation, 18(7):1527–1554.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. R. (2012). Improving neural
networks by preventing co-adaptation of feature de-
tectors. arXiv preprint arXiv:1207.0580.
Hyv
¨
arinen, A. (2005). Estimation of non-normalized statis-
tical models by score matching. In Journal of Machine
Learning Research, pages 695–709.
Jaitly, N., Nguyen, P., Senior, A. W., and Vanhoucke, V.
(2012). Application of pretrained deep neural net-
works to large vocabulary speech recognition. In IN-
TERSPEECH.
Kaggle (2010). Machine learning competitions.
http://www.kaggle.com. Accessed September
29, 2014.
Kaggle (2013). Facial keypoints detection.
http://www.kaggle.com/c/facial-keypoints-detection.
Accessed September 29, 2014.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in Neural Information Pro-
cessing Systems 25, pages 1097–1105. Curran Asso-
ciates, Inc.
Lecun, Y., Jie, F., and Jhuangfu (2005). Loss functions
for discriminative training of energy-based models. In
Proc. of the 10-th International Workshop on Artificial
Intelligence and Statistics.
Lee, T. S., Mumford, D., Romero, R., and Lamme, V. A.
(1998). The role of the primary visual cortex in higher
level vision. Vision Research, 38(15/16):2429–2454.
Luo, P., Wang, X., and Tang, X. (2012). Hierarchical face
parsing via deep learning. In Conference on Com-
puter Vision and Pattern Recognition, pages 2480–
2487. IEEE.
Mnih, V. (2013). Q&A with job salary pre-
diction first prize winner Vlad Mnih.
http://blog.kaggle.com/2013/05/06/qa-with-job-
salary-prediction-first-prize-winner-vlad-mnih/.
Accessed September 29, 2014.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1988).
Learning representations by back-propagating errors.
In Neurocomputing: Foundations of Research, pages
696–699. MIT Press, Cambridge, MA, USA.
Salakhutdinov, R. and Hinton, G. (2009). Semantic hash-
ing. International Journal of Approximate Reasoning,
50(7):969–978.
Smolensky, P. (1986). Information processing in dynami-
cal systems: foundations of harmony theory. In Par-
allel distributed processing: explorations in the mi-
crostructure of cognition, vol. 1, pages 194–281. MIT
Press, Cambridge, MA, USA.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2014). Dropout: A Simple
Way to Prevent Neural Networks from Overfitting.
Journal of Machine Learning, 15:1929–1958.
Sun, Y., Wang, X., and Tang, X. (2013). Deep convolutional
network cascade for facial point detection. In Com-
puter Vision and Pattern Recognition (CVPR), 2013
IEEE Conference on, pages 3476–3483. IEEE.
Sutskever, I., Martens, J., Dahl, G. E., and Hinton, G. E.
(2013). On the importance of initialization and mo-
mentum in deep learning. In Proceedings of the
30th International Conference on Machine Learning,
pages 1139–1147.
Vincent, P. (2011). A connection between score match-
ing and denoising autoencoders. Neural Computation,
23(7):1661–1674.
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-
A. (2008). Extracting and composing robust features
with denoising autoencoders. In Proceedings of the
25th International Conference on Machine Learning,
pages 1096–1103, New York, NY, USA. ACM.
Wang, N., Melchior, J., and Wiskott, L. (2012). An analy-
sis of gaussian-binary restricted boltzmann machines
for natural images. In Proceedings of the 20th Euro-
pean Symposium on Artificial Neural Networks, Com-
putational Intelligence and Machine Learning, pages
287–292.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
296