self-organizing map for software fault prediction.
Knowledge-Based Systems, 74:28–39.
Arar,
¨
O. F. and Ayan, K. (2015). Software defect predic-
tion using cost-sensitive neural network. Applied Soft
Computing, 33:263–277.
Bradley, A. P. (1997). The use of the area under the
roc curve in the evaluation of machine learning algo-
rithms. Pattern recognition, 30(7):1145–1159.
Chapman, M., Callis, P., and Jackson, W. (2004). Metrics
data program. NASA IV and V Facility, http://mdp. ivv.
nasa. gov.
Gray, D., Bowes, D., Davey, N., Sun, Y., and Christianson,
B. (2012). Reflections on the nasa mdp data sets. IET
software, 6(6):549–558.
Halstead, M. H. (1977). Elements of software science, vol-
ume 7. Elsevier New York.
Hartigan, J. A. and Wong, M. A. (1979). Algorithm as
136: A k-means clustering algorithm. Journal of the
Royal Statistical Society. Series C (Applied Statistics),
28(1):100–108.
He, H. and Garcia, E. A. (2009). Learning from imbalanced
data. IEEE Transactions on knowledge and data engi-
neering, 21(9):1263–1284.
Huang, J. and Ling, C. X. (2005). Using auc and accuracy
in evaluating learning algorithms. IEEE Transactions
on knowledge and Data Engineering, 17(3):299–310.
Jin, C. and Jin, S.-W. (2015). Prediction approach of soft-
ware fault-proneness based on hybrid artificial neu-
ral network and quantum particle swarm optimization.
Applied Soft Computing, 35:717–725.
Kamei, Y., Monden, A., Matsumoto, S., Kakimoto, T., and
Matsumoto, K.-i. (2007). The effects of over and
under sampling on fault-prone module detection. In
Empirical Software Engineering and Measurement,
2007. ESEM 2007. First International Symposium on,
pages 196–204. IEEE.
Khoshgoftaar, T. M., Gao, K., and Seliya, N. (2010). At-
tribute selection and imbalanced data: Problems in
software defect prediction. In Tools with Artificial
Intelligence (ICTAI), 2010 22nd IEEE International
Conference on, volume 1, pages 137–144. IEEE.
Kim, H.-J., Jo, N.-O., and Shin, K.-S. (2016). Optimiza-
tion of cluster-based evolutionary undersampling for
the artificial neural networks in corporate bankruptcy
prediction. Expert Systems with Applications, 59:226–
234.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Kumudha, P. and Venkatesan, R. (2016). Cost-sensitive ra-
dial basis function neural network classifier for soft-
ware defect prediction. The Scientific World Journal,
2016.
Li, W., Huang, Z., and Li, Q. (2016). Three-way decisions
based software defect prediction. Knowledge-Based
Systems, 91:263–274.
Lin, W.-C., Tsai, C.-F., Hu, Y.-H., and Jhang, J.-S. (2017).
Clustering-based undersampling in class-imbalanced
data. Information Sciences, 409:17–26.
Liu, M., Miao, L., and Zhang, D. (2014). Two-stage cost-
sensitive learning for software defect prediction. IEEE
Transactions on Reliability, 63(2):676–686.
L
´
opez, V., Fern
´
andez, A., Moreno-Torres, J. G., and Her-
rera, F. (2012). Analysis of preprocessing vs. cost-
sensitive learning for imbalanced classification. open
problems on intrinsic data characteristics. Expert Sys-
tems with Applications, 39(7):6585–6608.
McCabe, T. J. (1976). A complexity measure. IEEE Trans-
actions on software Engineering, (4):308–320.
Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., and
Jiang, Y. (2008). Implications of ceiling effects in de-
fect predictors. In Proceedings of the 4th international
workshop on Predictor models in software engineer-
ing, pages 47–54. ACM.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Riquelme, J., Ruiz, R., Rodr
´
ıguez, D., and Moreno, J.
(2008). Finding defective modules from highly unbal-
anced datasets. Actas de los Talleres de las Jornadas
de Ingenier
´
ıa del Software y Bases de Datos, 2(1):67–
74.
Shepperd, M., Song, Q., Sun, Z., and Mair, C. (2013). Data
quality: Some comments on the nasa software defect
datasets. IEEE Transactions on Software Engineering,
39(9):1208–1215.
Wang, S. and Yao, X. (2013). Using class imbalance learn-
ing for software defect prediction. IEEE Transactions
on Reliability, 62(2):434–443.
Zheng, J. (2010). Cost-sensitive boosting neural networks
for software defect prediction. Expert Systems with
Applications, 37(6):4537–4543.
Clustering-based Under-sampling for Software Defect Prediction
193