tion, OSDI 2016, Savannah, GA, USA, November 2-4,
2016., pages 265–283.
Apache MXNet (2018). Apache MXNet. https://mxnet.
apache.org/. Accessed: 2018-09-28.
Ben-Nun, T. and Hoefler, T. (2018). Demystifying Parallel
and Distributed Deep Learning: An In-Depth Concur-
rency Analysis. CoRR, abs/1802.0.
Campos, V., Sastre, F., Yag
¨
ues, M., Bellver, M., Gir
´
o-I-
Nieto, X., and Torres, J. (2017). Distributed training
strategies for a computer vision deep learning algo-
rithm on a distributed GPU cluster. Procedia Com-
puter Science, 108:315–324.
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M.,
Mao, M., Senior, A., Tucker, P., Yang, K., and Le,
Q. V. (2012). Large scale distributed deep networks.
Advances in Neural Information Processing Systems,
pages 1223–1231.
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, and Li
Fei-Fei (2009). ImageNet: A large-scale hierarchical
image database. In 2009 IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 248–255.
IEEE.
Eggensperger, K., Feurer, M., Hutter, F., Bergstra,
J., Snoek, J., Hoos, H., and Leyton-Brown, K.
(2013). Towards an empirical foundation for assessing
bayesian optimization of hyperparameters. In NIPS
workshop on Bayesian Optimization in Theory and
Practice.
Google AutoML (2018). Google AutoML.
https://cloud.google.com/automl/. Accessed: 2018-
09-28.
Hadjis, S., , Zhang, C., Mitliagkas, I., et al. (2016). An
optimizer for multi-device deep learning on cpus and
gpus. In CoRR, page abs/1606.04487.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing
the dimensionality of data with neural networks. Sci-
ence (New York, N.Y.), 313(5786):504–7.
Hutter, F. et al. (2011). Sequential model-based optimiza-
tion for general algorithm configuration. In Proceed-
ings of the 5th International Conference on Learning
and Intelligent Optimization, LION’05.
Jin, P. H., Yuan, Q., Iandola, F., and Keutzer, K. (2016).
How to scale distributed deep learning? CoRR.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. Advances In Neural Information Pro-
cessing Systems, pages 1–9.
Machine Learning Group at the University of Waikato
(2018). Weka 3: Data Mining Software in Java.
http://www.cs.waikato.ac.nz/ml/weka/. Accessed:
2018-09-28.
Math Pages (2018). Probability of intersecting inter-
vals. http://www.mathpages.com/home/kmath580/
kmath580.htm. Accessed: 2018-09-24.
Microsoft CNTK (2018). The Microsoft Cognitive
Toolkit. https://www.microsoft.com/en-us/cognitive-
toolkit/. Accessed: 2018-09-28.
Minsky, M. and Papert, S. (1969). Perceptrons; an intro-
duction to computational geometry. MIT Press.
O’Reilly Podcast (2018). How to train and deploy deep
learning at scale. https://www.oreilly.com/ideas/how-
to-train-and-deploy-deep-learning-at-scale/. Ac-
cessed: 2018-09-28.
Rosenblatt, F. (1958). The perceptron: A probabilistic
model for information storage and organization in the
brain. Psychological Review, 65(6):386–408.
Sculley, D. et al. (2015). Hidden technical debt in machine
learning systems. In Advances in Neural Information
Processing Systems 28: Annual Conference on Neural
Information Processing Systems 2015, December 7-
12, 2015, Montreal, Quebec, Canada, pages 2503–
2511.
Sergeev, A. and Del Balso, M. (2018). Horovod: fast and
easy distributed deep learning in TensorFlow. CoRR.
Sridharan, S., Vaidyanathan, K., Kalamkar, D., Das, D.,
Smorkalov, M. E., Shiryaev, M., Mudigere, D.,
Mellempudi, N., Avancha, S., Kaul, B., and Dubey,
P. (2018). On Scale-out Deep Learning Training for
Cloud and HPC. CoRR, pages 16–18.
Thornton, C. et al. (2013). Auto-weka: combined selec-
tion and hyperparameter optimization of classification
algorithms. In SIGKDD, pages 847–855.
DATA 2019 - 8th International Conference on Data Science, Technology and Applications
120