larger number of classes overall. This give CNNs an
opportunity to take advantage of having been trained
directly for classification when they are presented a
similar task. Although HiGSFA takes advantage of
class labels, it suffers in comparison for not taking
into account the downstream task during training.
For future work, a complete extension of the ex-
periments here could include an analysis on the effect
that different type of data would have on the perfor-
mance. This would yield further insight than vary-
ing the number of rather homogeneous data used for
training. Additionally, the performance of a wider ar-
ray of popular methods can be compared.
More types of benchmarks for comparing differ-
ent models over varying training set sizes would be
helpful for this kind of research. Knowledge gained
from them would as well allow practitioners to choose
the right model for the scale and type of the problem
they wish to solve. These experiments give rise to the
question: how can these methods with their different
strengths and weaknesses profit from each other?
REFERENCES
Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis,
G. K., and Taha, K. (2015). Efficient machine learning
for big data: A review. Big Data Research, 2(3):87–
93.
Beleites, C., Neugebauer, U., Bocklitz, T., Krafft, C., and
Popp, J. (2013). Sample size planning for classifica-
tion models. Analytica chimica acta, 760:25–33.
Bengio, Y., Courville, A., and Vincent, P. (2013). Represen-
tation learning: A review and new perspectives. IEEE
transactions on pattern analysis and machine intelli-
gence, 35(8):1798–1828.
Bertinetto, L., Henriques, J. F., Valmadre, J., Torr, P., and
Vedaldi, A. (2016). Learning feed-forward one-shot
learners. In Advances in Neural Information Process-
ing Systems, pages 523–531.
Chollet, F. et al. (2015). Keras. https://keras.io.
Creutzig, F. and Sprekeler, H. (2008). Predictive coding
and the slowness principle: An information-theoretic
approach. Neural Computation, 20(4):1026–1041.
CS231n, S. (2017). Convolutional neural networks for vi-
sual recognition.
Edwards, H. and Storkey, A. (2016). Towards a neural
statistician. arXiv preprint arXiv:1606.02185.
Escalante, A. N. and Wiskott, L. (2013). How to
solve classification and regression problems on high-
dimensional data with a supervised extension of slow
feature analysis. Journal of Machine Learning Re-
search, 14(1):3683–3719.
Escalante-B, A. N. and Wiskott, L. (2016). Improved
graph-based SFA: Information preservation comple-
ments the slowness principle. CoRR.
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., and Ngo,
L. H. (2012). Predicting sample size required for clas-
sification performance. BMC medical informatics and
decision making, 12(1):8.
Franzius, M., Sprekeler, H., and Wiskott, L. (2007). Slow-
ness and sparseness lead to place, head-direction,
and spatial-view cells. PLoS computational biology,
3(8):e166.
Hadji, I. and Wildes, R. P. (2018). What do we under-
stand about convolutional networks? arXiv preprint
arXiv:1803.08834.
Kamthe, S. and Deisenroth, M. P. (2017). Data-efficient
reinforcement learning with probabilistic model pre-
dictive control. arXiv preprint arXiv:1706.06491.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Krishnaiah, P. R. (1980). Handbook of statistics, volume 31.
Motilal Banarsidass Publishe.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B.
(2015). Human-level concept learning through proba-
bilistic program induction. Science, 350(6266):1332–
1338.
Lange, S. and Riedmiller, M. (2010). Deep auto-encoder
neural networks in reinforcement learning. In The
2010 International Joint Conference on Neural Net-
works (IJCNN), pages 1–8. IEEE.
Lawrence, S., Giles, C. L., and Tsoi, A. C. (1998). What
size neural network gives optimal generalization?
convergence properties of backpropagation. Techni-
cal report.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529.
Moore, A. W. and Atkeson, C. G. (1993). Prioritized sweep-
ing: Reinforcement learning with less data and less
time. Machine learning, 13(1):103–130.
Oquab, M., Bottou, L., Laptev, I., and Sivic, J. (2014).
Learning and transferring mid-level image represen-
tations using convolutional neural networks. In Pro-
ceedings of the IEEE conference on computer vision
and pattern recognition, pages 1717–1724.
Pan, S. J. and Yang, Q. (2010). A survey on transfer learn-
ing. IEEE Transactions on knowledge and data engi-
neering, 22(10):1345–1359.
Pfau, D., Petersen, S., Agarwal, A., Barrett, D., and
Stachenfeld, K. (2018). Spectral inference networks:
Measuring the Data Efficiency of Deep Learning Methods
697