Clustering-based Under-sampling for Software Defect Prediction

Moheb M. R. Henein, Doaa M. Shawky, Salwa K. Abd-El-Hafiz

Abstract

Detection of software defective modules is important for reducing the time and resources consumed by software testing. Software defect data sets usually suffer from imbalance, where the number of defective modules is fewer than the number of defect-free modules. Imbalanced data sets make the machine learning algorithms to be biased toward the majority class. Clustering-based under-sampling shows its ability to find good representatives of the majority data in different applications. This paper presents an approach for software defect prediction based on clustering-based under-sampling and Artificial Neural Network (ANN). Firstly, clustering-based under-sampling is used for selecting a subset of the majority samples, which is then combined with the minority samples to produce a balanced data set. Secondly, an ANN model is built and trained using the resulted balanced data set. The used ANN is trained to classify the software modules into defective or defect-free. In addition, a sensitivity analysis is conducted to choose the number of majority samples that yields the best performance measures. Results show the high prediction capability for the detection of defective modules while maintaining the ability of detecting defect-free modules.

Download


Paper Citation


in Harvard Style

Shawky D. and Abd-El-Hafiz S. (2018). Clustering-based Under-sampling for Software Defect Prediction.In Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-320-9, pages 185-193. DOI: 10.5220/0006911401850193


in Bibtex Style

@conference{icsoft18,
author={Doaa M. Shawky and Salwa K. Abd-El-Hafiz},
title={Clustering-based Under-sampling for Software Defect Prediction},
booktitle={Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2018},
pages={185-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006911401850193},
isbn={978-989-758-320-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - Clustering-based Under-sampling for Software Defect Prediction
SN - 978-989-758-320-9
AU - Shawky D.
AU - Abd-El-Hafiz S.
PY - 2018
SP - 185
EP - 193
DO - 10.5220/0006911401850193