have different performance even with the same level
of noise. To have the desired accuracy with low-cost,
we have also shown that it is more important to have
a huge dataset even with high level of noise as op-
posed to small clean dataset. This is because most
ML algorithms need bigger data to train in order to
perform well. We have further shown that desirable
ML performance can be achieved with a low labeling
cost using ensemble learning since it is more resilient
and robust to label noise.
REFERENCES
Alsubhi, J., Gharawi, A., and Alahmadi, M. (2021). A per-
formance study of membership inference attacks on
different machine learning algorithms. Journal of In-
formation Hiding and Privacy Protection, 3(4):193.
Aslan, S., Mete, S. E., Okur, E., Oktay, E., Alyuz, N.,
Genc, U. E., Stanhill, D., and Esme, A. A. (2017).
Human expert labeling process (help): towards a reli-
able higher-order user state labeling process and tool
to assess student engagement. Educational Technol-
ogy, pages 53–59.
Deng, L. (2012). The mnist database of handwritten digit
images for machine learning research. IEEE Signal
Processing Magazine, 29(6):141–142.
Dong, X., Yu, Z., Cao, W., Shi, Y., and Ma, Q. (2020). A
survey on ensemble learning. Frontiers of Computer
Science, 14(2):241–258.
Dua, D. and Graff, C. (2017). UCI machine learning repos-
itory.
Durga, S., Nag, R., and Daniel, E. (2019). Survey on ma-
chine learning and deep learning algorithms used in
internet of things (iot) healthcare. In 2019 3rd inter-
national conference on computing methodologies and
communication (ICCMC), pages 1018–1022. IEEE.
Feng, D., Besana, S., and Zajac, R. (2009). Acquiring
high quality non-expert knowledge from on-demand
workforce. In Proceedings of the 2009 Workshop on
The People’s Web Meets NLP: Collaboratively Con-
structed Semantic Resources (People’s Web), pages
51–56.
Garcia, L. P., de Carvalho, A. C., and Lorena, A. C. (2015).
Effect of label noise in the complexity of classification
problems. Neurocomputing, 160:108–119.
Gardner, M. W. and Dorling, S. (1998). Artificial neural
networks (the multilayer perceptron)—a review of ap-
plications in the atmospheric sciences. Atmospheric
environment, 32(14-15):2627–2636.
Huang, J., Qu, L., Jia, R., and Zhao, B. (2019). O2u-net:
A simple noisy label detection approach for deep neu-
ral networks. In Proceedings of the IEEE/CVF inter-
national conference on computer vision, pages 3326–
3334.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-
agenet classification with deep convolutional neural
networks. Communications of the ACM, 60(6):84–90.
Krogh, A. and Hertz, J. (1991). A simple weight decay can
improve generalization. Advances in neural informa-
tion processing systems, 4.
Lee, K.-H., He, X., Zhang, L., and Yang, L. (2018). Clean-
net: Transfer learning for scalable image classifier
training with label noise. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 5447–5456.
Li, W., Wang, L., Li, W., Agustsson, E., and Van Gool,
L. (2017). Webvision database: Visual learning
and understanding from web data. arXiv preprint
arXiv:1708.02862.
Moon, W. K., Lee, Y.-W., Ke, H.-H., Lee, S. H., Huang, C.-
S., and Chang, R.-F. (2020). Computer-aided diagno-
sis of breast ultrasound images using ensemble learn-
ing from convolutional neural networks. Computer
methods and programs in biomedicine, 190:105361.
Nguyen, A. T., Wallace, B. C., and Lease, M. (2015). Com-
bining crowd and expert labels using decision theo-
retic active learning. In Third AAAI conference on hu-
man computation and crowdsourcing.
Perez, L. and Wang, J. (2017). The effectiveness of data
augmentation in image classification using deep learn-
ing. arXiv preprint arXiv:1712.04621.
Ren, Q., Cheng, H., and Han, H. (2017). Research on
machine learning framework based on random forest
algorithm. In AIP conference proceedings, volume
1820, page 080020. AIP Publishing LLC.
Sagi, O. and Rokach, L. (2018). Ensemble learning: A sur-
vey. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 8(4):e1249.
Sitawarin, C. and Wagner, D. (2019). On the robustness of
deep k-nearest neighbors. In 2019 IEEE Security and
Privacy Workshops (SPW), pages 1–7. IEEE.
Song, H., Kim, M., and Lee, J.-G. (2019). Selfie: Re-
furbishing unclean samples for robust deep learning.
In International Conference on Machine Learning,
pages 5907–5915. PMLR.
Su, J. and Zhang, H. (2006). A fast decision tree learning
algorithm. In Aaai, volume 6, pages 500–505.
Wang, X.-Z., Xing, H.-J., Li, Y., Hua, Q., Dong, C.-R.,
and Pedrycz, W. (2014). A study on relationship be-
tween generalization abilities and fuzziness of base
classifiers in ensemble learning. IEEE Transactions
on Fuzzy Systems, 23(5):1638–1654.
Xiao, T., Xia, T., Yang, Y., Huang, C., and Wang, X. (2015).
Learning from massive noisy labeled data for image
classification. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages
2691–2699.
Yu, L., Wang, S., and Lai, K. K. (2008). Forecasting crude
oil price with an emd-based neural network ensemble
learning paradigm. Energy economics, 30(5):2623–
2635.
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
388