Clustering-based Under-sampling for Software Defect Prediction
Moheb M. R. Henein, Doaa M. Shawky, Salwa K. Abd-El-Hafiz
2018
Abstract
Detection of software defective modules is important for reducing the time and resources consumed by software testing. Software defect data sets usually suffer from imbalance, where the number of defective modules is fewer than the number of defect-free modules. Imbalanced data sets make the machine learning algorithms to be biased toward the majority class. Clustering-based under-sampling shows its ability to find good representatives of the majority data in different applications. This paper presents an approach for software defect prediction based on clustering-based under-sampling and Artificial Neural Network (ANN). Firstly, clustering-based under-sampling is used for selecting a subset of the majority samples, which is then combined with the minority samples to produce a balanced data set. Secondly, an ANN model is built and trained using the resulted balanced data set. The used ANN is trained to classify the software modules into defective or defect-free. In addition, a sensitivity analysis is conducted to choose the number of majority samples that yields the best performance measures. Results show the high prediction capability for the detection of defective modules while maintaining the ability of detecting defect-free modules.
DownloadPaper Citation
in Harvard Style
Shawky D. and Abd-El-Hafiz S. (2018). Clustering-based Under-sampling for Software Defect Prediction.In Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-320-9, pages 185-193. DOI: 10.5220/0006911401850193
in Bibtex Style
@conference{icsoft18,
author={Doaa M. Shawky and Salwa K. Abd-El-Hafiz},
title={Clustering-based Under-sampling for Software Defect Prediction},
booktitle={Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2018},
pages={185-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006911401850193},
isbn={978-989-758-320-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - Clustering-based Under-sampling for Software Defect Prediction
SN - 978-989-758-320-9
AU - Shawky D.
AU - Abd-El-Hafiz S.
PY - 2018
SP - 185
EP - 193
DO - 10.5220/0006911401850193