Authors:
Abdulrahman Gharawi
;
Jumana Alsubhi
and
Lakshmish Ramaswamy
Affiliation:
School of Computing, University of Georgia, Athens, U.S.A.
Keyword(s):
Machine Learning, Deep Learning, Ensemble Learning, Label Noise, Class Label Noise, Labeling Cost Optimization, Mislabeled Data.
Abstract:
Machine learning models have demonstrated exceptional performance in various applications as a result of the emergence of large labeled datasets. Although there are many available datasets, acquiring high-quality labeled datasets is challenging since it involves huge human supervision or expert annotation, which are extremely labor-intensive and time-consuming. Since noisy datasets can affect the performance of machine learning models, acquiring high-quality datasets without label noise becomes a critical problem. However, it is challenging to significantly decrease label noise in real-world datasets without hiring expensive expert annotators. Based on extensive testing and research, this study examines the impact of different levels of label noise on the accuracy of machine learning models. It also investigates ways to cut labeling expenses without sacrificing required accuracy.