resentation and high-level features extracted from a
pretrained convolutional neural network, as well as
several popular data-level techniques for alleviating
the negative impact of data imbalance. Presented re-
sults indicate that using the original image represen-
tation with simple random oversampling leads to the
best results on the considered benchmark datasets.
Contrary to the results that could be expected for
tabular data, using other resampling techniques usu-
ally led to a deteriorated performance. Furthermore,
while using SMOTE improved performance for the
features extracted from the pretrained network, the
overall performance of that approach was still sig-
nificantly worse than simply using the original image
representation. Observed results suggest feasibility of
several further research directions: first of all, since
applying SMOTE actually produced a better perfor-
mance for the features extracted from the convolu-
tional neural network, it is possible that proposing a
better feature representation would preserve this ef-
fect, while improving the overall performance. Such
additional feature representations could include: fea-
tures extracted from a finetuned neural network, ear-
lier layers of the network, different neural architec-
tures, or autoencoders. Secondly, the observed results
suggest that convolutional neural networks may be
more resilient to the presence of data imbalance than
traditional learning algorithms, such as SVM. The
observed improvement in performance due to using
dedicated data preprocessing algorithm was also rel-
atively smaller than for the SVM. This, if confirmed
with further studies, could indicate that either dealing
with data imbalance is less pressing problem in the
image recognition task, or the data-level strategies are
not a suitable approach for solving it.
ACKNOWLEDGEMENTS
This work was supported by the Polish Na-
tional Science Center under the grant no.
2017/27/N/ST6/01705.
REFERENCES
Branco, P., Torgo, L., and Ribeiro, R. P. (2017). Relevance-
based evaluation metrics for multi-class imbalanced
domains. In Advances in Knowledge Discovery and
Data Mining - 21st Pacific-Asia Conference, PAKDD
2017, Jeju, South Korea, May 23-26, 2017, Proceed-
ings, Part I, pages 698–710.
Buda, M., Maki, A., and Mazurowski, M. A. (2018). A
systematic study of the class imbalance problem in
convolutional neural networks. Neural Networks,
106:249–259.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). SMOTE: synthetic minority over-
sampling technique. Journal of artificial intelligence
research, 16:321–357.
Chollet, F. et al. (2015). Keras. https://keras.io.
Coates, A., Ng, A., and Lee, H. (2011). An analy-
sis of single-layer networks in unsupervised feature
learning. In Proceedings of the fourteenth interna-
tional conference on artificial intelligence and statis-
tics, pages 215–223.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). ImageNet: A large-scale hierarchical im-
age database. In 2009 IEEE conference on computer
vision and pattern recognition, pages 248–255. Ieee.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). MobileNets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Japkowicz, N. and Stephen, S. (2002). The class imbalance
problem: A systematic study. Intelligent data analy-
sis, 6(5):429–449.
Koziarski, M. and Cyganek, B. (2018). Impact of low res-
olution on image recognition with deep neural net-
works: An experimental study. International Journal
of Applied Mathematics and Computer Science, 28(4).
Krawczyk, B. (2016). Learning from imbalanced data: open
challenges and future directions. Progress in Artificial
Intelligence, 5(4):221–232.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple
layers of features from tiny images. Technical report,
Citeseer.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Lema
ˆ
ıtre, G., Nogueira, F., and Aridas, C. K. (2017).
Imbalanced-learn: A python toolbox to tackle the
curse of imbalanced datasets in machine learning. The
Journal of Machine Learning Research, 18(1):559–
563.
Lusa, L. et al. (2012). Evaluation of SMOTE for
high-dimensional class-imbalanced microarray data.
In 2012 11th International Conference on Machine
Learning and Applications, volume 2, pages 89–94.
IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., et al. (2011). Scikit-
learn: Machine learning in python. Journal of ma-
chine learning research, 12(Oct):2825–2830.
Smith, M. R., Martinez, T., and Giraud-Carrier, C. (2014).
An instance level analysis of data complexity. Ma-
chine learning, 95(2):225–256.
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
638